Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference disks in GCP Batch [WX-1819] #7502

Merged
merged 15 commits into from
Sep 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ be found [here](https://cromwell.readthedocs.io/en/stable/backends/HPC/#optional
- Fixes the preemption error handling, now, the correct error message is printed, this also handles the other potential exit codes.
- Fixes error message reporting for failed jobs.
- Fixes the "retry with more memory" feature.
- Fixes the reference disk feature.
- Fixes pulling Docker image metadata from private GCR repositories.
- Fixed `google_project` and `google_compute_service_account` workflow options not taking effect when using GCP Batch backend
- Added a way to use a custom LogsPolicy for the job execution, setting `backend.providers.batch.config.batch.logs-policy` to "CLOUD_LOGGING" (default) keeps the current behavior, or, set it to "PATH" to save the logs into the the mounted disk, at the end, this log file gets copied to the google cloud storage bucket with "task.log" as the name.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
name: gcpbatch_reference_disk_test_false_option
testFormat: workflowsuccess
# see https://github.com/broadinstitute/cromwell/pull/7502
backends: [GCPBATCH-Reference-Disk-Localization]

files {
workflow: reference_disk/gcpbatch_reference_disk_test.wdl
inputs: reference_disk/reference_disk_test.inputs
options: reference_disk/reference_disk_test_false.options.json
}

metadata {
workflowName: wf_reference_disk_test
status: Succeeded
"outputs.wf_reference_disk_test.is_broad_input_file_a_valid_symlink": false
"outputs.wf_reference_disk_test.is_nirvana_input_file_a_valid_symlink": false
"outputs.wf_reference_disk_test.is_nirvana_metachar_input_file_a_valid_symlink": false
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
name: gcpbatch_reference_disk_mounted_only_if_requested_false
testFormat: workflowsuccess
backends: [GCPBATCH-Reference-Disk-Localization]

files {
workflow: reference_disk/gcpbatch_reference_disk_mounted_only_if_requested.wdl
options: reference_disk/reference_disk_test_false.options.json
}

metadata {
workflowName: ReferenceDiskMountedOnlyIfRequested
status: Succeeded
"outputs.ReferenceDiskMountedOnlyIfRequested.disk_mounted": false
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: gcpbatch_reference_disk_mounted_only_if_requested_true
testFormat: workflowsuccess
# see https://github.com/broadinstitute/cromwell/pull/7502
backends: [GCPBATCH-Reference-Disk-Localization]

files {
workflow: reference_disk/gcpbatch_reference_disk_mounted_only_if_requested.wdl
options: reference_disk/reference_disk_test_true.options.json
}

metadata {
workflowName: ReferenceDiskMountedOnlyIfRequested
status: Succeeded
"outputs.ReferenceDiskMountedOnlyIfRequested.disk_mounted": true
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
name: gcpbatch_reference_disk_mounted_only_if_requested_unspecified
testFormat: workflowsuccess
# see https://github.com/broadinstitute/cromwell/pull/7502
backends: [GCPBATCH-Reference-Disk-Localization]

files {
workflow: reference_disk/gcpbatch_reference_disk_mounted_only_if_requested.wdl
options: reference_disk/reference_disk_test_unspecified.options.json
}

metadata {
workflowName: ReferenceDiskMountedOnlyIfRequested
status: Succeeded
"outputs.ReferenceDiskMountedOnlyIfRequested.disk_mounted": false
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: gcpbatch_reference_disk_test_true_option
testFormat: workflowsuccess
backends: [GCPBATCH-Reference-Disk-Localization]

files {
workflow: reference_disk/gcpbatch_reference_disk_test.wdl
inputs: reference_disk/reference_disk_test.inputs
options: reference_disk/reference_disk_test_true.options.json
}

metadata {
workflowName: wf_reference_disk_test
status: Succeeded
"outputs.wf_reference_disk_test.is_broad_input_file_a_valid_symlink": true
"outputs.wf_reference_disk_test.is_nirvana_input_file_a_valid_symlink": true
"outputs.wf_reference_disk_test.is_nirvana_metachar_input_file_a_valid_symlink": true
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
name: gcpbatch_reference_disk_test_unspecified_option
testFormat: workflowsuccess
# see https://github.com/broadinstitute/cromwell/pull/7502
backends: [GCPBATCH-Reference-Disk-Localization]

files {
workflow: reference_disk/gcpbatch_reference_disk_test.wdl
inputs: reference_disk/reference_disk_test.inputs
options: reference_disk/reference_disk_test_unspecified.options.json
}

metadata {
workflowName: wf_reference_disk_test
status: Succeeded
"outputs.wf_reference_disk_test.is_broad_input_file_a_valid_symlink": false
"outputs.wf_reference_disk_test.is_nirvana_input_file_a_valid_symlink": false
"outputs.wf_reference_disk_test.is_nirvana_metachar_input_file_a_valid_symlink": false
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
version 1.0


workflow ReferenceDiskMountedOnlyIfRequested {
call MentionsNirvanaReference {}
output {
Boolean disk_mounted = MentionsNirvanaReference.disk_mounted
}
}


task MentionsNirvanaReference {
input {
# Tiny 116 byte reference file, certainly not worth attaching a 55 GiB Nirvana reference disk for this.
File mention =
"gs://broad-public-datasets/gvs/vat-annotations/Nirvana/3.18.1/SupplementaryAnnotation/GRCh38/MITOMAP_20200819.nsa.idx"
}
command <<<
PS4='\D{+%F %T} \w $ '
set -o nounset -o xtrace

# Debug output
lsblk > lsblk.out

# Look for mounted disks other than the main /mnt/disks/cromwell_root volume
CANDIDATE_MOUNT_POINT=$(lsblk | grep -E -v '/mnt/disks/cromwell_root\>' | sed -E -n 's!.*(/mnt/disks/[^/]+).*!\1!p')

if [[ ! -z ${CANDIDATE_MOUNT_POINT} ]]; then
echo "Found unexpected mounted disk, investigating further."
find ${CANDIDATE_MOUNT_POINT} -print | tee find.out

if grep -i nirvana find.out; then
echo "Found what appears to be a Nirvana reference disk."
else
echo "Found unknown volume mounted, see 'find.out' for manifest."
fi
echo true > disk_mounted.out
else
echo false > disk_mounted.out
fi
>>>
runtime {
docker: "ubuntu:latest"
backend: "GCPBATCH-Reference-Disk-Localization"
}
output {
File lsblk = "lsblk.out"
Boolean disk_mounted = read_boolean("disk_mounted.out")
File? find_out = "find.out"
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
version 1.0

task check_if_localized_with_valid_symlink {
input {
File broad_reference_file_input
File nirvana_reference_file_input
File nirvana_reference_file_metachar_input
}
String broad_input_valid_symlink = "broad_input_valid_symlink.txt"
String nirvana_input_valid_symlink = "nirvana_input_valid_symlink.txt"
String nirvana_metachar_input_valid_symlink = "nirvana_metachar_input_valid_symlink.txt"
command <<<
PS4='\D{+%F %T} \w $ '
set -o nounset -o pipefail -o xtrace

# Echo true to stdout if the argument is a symlink pointing to an extant file, otherwise echo false.
check_if_valid_symlink() {
local reference_input="$1"

if [[ -h "${reference_input}" && -f $(readlink "${reference_input}") ]]; then
echo true
else
echo false
fi
}

check_if_valid_symlink "~{broad_reference_file_input}" > ~{broad_input_valid_symlink}
check_if_valid_symlink "~{nirvana_reference_file_input}" > ~{nirvana_input_valid_symlink}
check_if_valid_symlink "~{nirvana_reference_file_metachar_input}" > ~{nirvana_metachar_input_valid_symlink}

>>>
output {
Boolean is_broad_input_valid_symlink = read_boolean("~{broad_input_valid_symlink}")
Boolean is_nirvana_input_valid_symlink = read_boolean("~{nirvana_input_valid_symlink}")
Boolean is_nirvana_metachar_input_valid_symlink = read_boolean("~{nirvana_metachar_input_valid_symlink}")
}
runtime {
docker: "ubuntu:latest"
backend: "GCPBATCH-Reference-Disk-Localization"
}
}

workflow wf_reference_disk_test {
input {
File broad_reference_file_input
File nirvana_reference_file_input
File nirvana_reference_file_metachar_input
}
call check_if_localized_with_valid_symlink {
input:
broad_reference_file_input = broad_reference_file_input,
nirvana_reference_file_input = nirvana_reference_file_input,
nirvana_reference_file_metachar_input = nirvana_reference_file_metachar_input
}
output {
Boolean is_broad_input_file_a_valid_symlink = check_if_localized_with_valid_symlink.is_broad_input_valid_symlink
Boolean is_nirvana_input_file_a_valid_symlink = check_if_localized_with_valid_symlink.is_nirvana_input_valid_symlink
Boolean is_nirvana_metachar_input_file_a_valid_symlink = check_if_localized_with_valid_symlink.is_nirvana_metachar_input_valid_symlink
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,9 @@ task MentionsNirvanaReference {
set -o nounset -o xtrace

# Debug output
lsblk
lsblk > lsblk.out

CANDIDATE_MOUNT_POINT=$(lsblk | sed -E -n 's!.*(/mnt/[a-f0-9]+).*!\1!p')
CANDIDATE_MOUNT_POINT=$(lsblk | sed -E -n 's!.*(/mnt/[^/]+).*!\1!p')

if [[ ! -z ${CANDIDATE_MOUNT_POINT} ]]; then
echo "Found unexpected mounted disk, investigating further."
Expand All @@ -43,6 +43,7 @@ task MentionsNirvanaReference {
backend: "Papiv2-Reference-Disk-Localization"
}
output {
File lsblk = "lsblk.out"
Boolean disk_mounted = read_boolean("disk_mounted.out")
File? find_out = "find.out"
}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: reference_disk_test_false_option
testFormat: workflowsuccess
# see https://github.com/broadinstitute/cromwell/pull/7502
backends: [Papiv2-Reference-Disk-Localization, GCPBATCH_FAIL]
backends: [Papiv2-Reference-Disk-Localization, GCPBATCH_ALT]

files {
workflow: reference_disk/reference_disk_test.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
name: reference_disk_mounted_only_if_requested_false
testFormat: workflowsuccess
# failing alt PR here https://github.com/broadinstitute/cromwell/pull/7502
backends: [Papiv2-Reference-Disk-Localization, GCPBATCH_ALT]

files {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: reference_disk_mounted_only_if_requested_true
testFormat: workflowsuccess
# see https://github.com/broadinstitute/cromwell/pull/7502
backends: [Papiv2-Reference-Disk-Localization, GCPBATCH_FAIL]
backends: [Papiv2-Reference-Disk-Localization, GCPBATCH_ALT]

files {
workflow: reference_disk/reference_disk_mounted_only_if_requested.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: reference_disk_mounted_only_if_requested_unspecified
testFormat: workflowsuccess
# see https://github.com/broadinstitute/cromwell/pull/7502
backends: [Papiv2-Reference-Disk-Localization, GCPBATCH_NEEDS_ALT, GCPBATCH_FAIL]
backends: [Papiv2-Reference-Disk-Localization, GCPBATCH_ALT]

files {
workflow: reference_disk/reference_disk_mounted_only_if_requested.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: reference_disk_test_true_option
testFormat: workflowsuccess
# see https://github.com/broadinstitute/cromwell/pull/7502
backends: [Papiv2-Reference-Disk-Localization, GCPBATCH_FAIL]
backends: [Papiv2-Reference-Disk-Localization, GCPBATCH_ALT]

files {
workflow: reference_disk/reference_disk_test.wdl
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: reference_disk_test_unspecified_option
testFormat: workflowsuccess
# see https://github.com/broadinstitute/cromwell/pull/7502
backends: [Papiv2-Reference-Disk-Localization, GCPBATCH_FAIL]
backends: [Papiv2-Reference-Disk-Localization, GCPBATCH_ALT]

files {
workflow: reference_disk/reference_disk_test.wdl
Expand Down
2 changes: 1 addition & 1 deletion cromwell.example.backends/PAPIv2.conf
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ backend {
zones: ["us-central1-a", "us-central1-b"]
}

include "papi_v2_reference_image_manifest.conf"
include "google_reference_image_manifest.conf"
}
}
}
Expand Down
6 changes: 6 additions & 0 deletions src/ci/resources/gcp_batch_application.conf
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,12 @@ backend {
include "gcp_batch_provider_config.inc.conf"
}
}
GCPBATCH-Reference-Disk-Localization {
actor-factory = "cromwell.backend.google.batch.GcpBatchBackendLifecycleActorFactory"
config {
include "gcp_batch_provider_config.inc.conf"
}
}
GCPBATCHRequesterPays {
actor-factory = "cromwell.backend.google.batch.GcpBatchBackendLifecycleActorFactory"
config {
Expand Down
15 changes: 14 additions & 1 deletion src/ci/resources/gcp_batch_shared_application.inc.conf
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ backend {
# GCPBATCH_NEEDS_ALT: a GCPBATCH alt of a PAPI v2 test is needed but does not yet exist.
# GCPBATCH_FAIL: a test is failing on GCPBATCH, reasons may or may not be understood yet.
# GCPBATCH_SKIP: test is not going to be run on GCPBATCH for reasons explained in test comments.
enabled = ["GCPBATCH", "GCPBATCHRequesterPays"]
enabled = ["GCPBATCH", "GCPBATCHRequesterPays", "GCPBATCH-Reference-Disk-Localization"]
providers {
# Default gcp batch backend
GCPBATCH {
Expand All @@ -35,6 +35,19 @@ backend {
filesystems.http {}
}
}
GCPBATCH-Reference-Disk-Localization {
actor-factory = "REPLACEME!"
config {
# When importing: Remember to also include an appropriate provider_config.inc.conf here.

include "dockerhub_provider_config_v2.inc.conf"
batch.compute-service-account = "centaur@broad-dsde-cromwell-dev.iam.gserviceaccount.com"

filesystems.http {}

include "google_reference_image_manifest.conf"
}
}
GCPBATCHRequesterPays {
actor-factory = "REPLACEME!"
config {
Expand Down
2 changes: 1 addition & 1 deletion src/ci/resources/papi_v2_shared_application.inc.conf
Original file line number Diff line number Diff line change
Expand Up @@ -206,7 +206,7 @@ backend {
filesystems.http {}

# Cromwell 57+ reference disk manifest specification included here.
include "papi_v2_reference_image_manifest.conf"
include "google_reference_image_manifest.conf"

# Have the engine authenticate to docker.io. See BT-141 for more info.
include "dockerhub_provider_config_v1.inc.conf"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -570,7 +570,10 @@
val inputFilePaths = inputOutputParameters.jobInputParameters.map(_.cloudPath.pathAsString).toSet

val referenceDisksToMount =
batchAttributes.referenceFileToDiskImageMappingOpt.map(getReferenceDisksToMount(_, inputFilePaths))
if (useReferenceDisks)
batchAttributes.referenceFileToDiskImageMappingOpt.map(getReferenceDisksToMount(_, inputFilePaths))

Check warning on line 574 in supportedBackends/google/batch/src/main/scala/cromwell/backend/google/batch/actors/GcpBatchAsyncBackendJobExecutionActor.scala

View check run for this annotation

Codecov / codecov/patch

supportedBackends/google/batch/src/main/scala/cromwell/backend/google/batch/actors/GcpBatchAsyncBackendJobExecutionActor.scala#L573-L574

Added lines #L573 - L574 were not covered by tests
else
None

Check warning on line 576 in supportedBackends/google/batch/src/main/scala/cromwell/backend/google/batch/actors/GcpBatchAsyncBackendJobExecutionActor.scala

View check run for this annotation

Codecov / codecov/patch

supportedBackends/google/batch/src/main/scala/cromwell/backend/google/batch/actors/GcpBatchAsyncBackendJobExecutionActor.scala#L576

Added line #L576 was not covered by tests

val dockerhubCredentials: (String, String) =
new String(Base64.getDecoder.decode(batchAttributes.dockerhubToken), "UTF-8").split(":", 2) match {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ case class GcpBatchWorkingDisk(diskType: DiskType, sizeGb: Int) extends GcpBatch
}

case class GcpBatchReferenceFilesDisk(image: String, sizeGb: Int) extends GcpBatchAttachedDisk {
val mountPoint: Path = DefaultPathBuilder.get(s"/mnt/${image.md5Sum}")
val mountPoint: Path = DefaultPathBuilder.get(s"/mnt/disks/${image.md5Sum}")
val name: String = s"d-${mountPoint.pathAsString.md5Sum}"
val diskType: DiskType = DiskType.HDD
}
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ trait BatchUtilityConversions {
disk match {
case _: GcpBatchReferenceFilesDisk =>
volume
.addMountOptions("async, rw")
.addMountOptions("ro")
.build
case _ =>
volume.build
Expand Down
Loading