Memory Retry not working, Cromwell 87 #7451

GregoryDougherty · 2024-06-14T01:40:52Z

We can not get memory retry to work. Have not not found anywhere a complete example showing it working, including what should be int eh .conf file. If such an example exists, please point us to it,

Command:
nohup java -Dconfig.file=My.conf -jar cromwell-87-5448b85-SNAP-pre-edits.jar run ~/MemoryRetryTest.wdl 2>&1 > nohup.out

MemoryRetryTest.wdl:
workflow MemoryRetryTest {
String message = "Killed"

call TestOutOfMemoryRetry {}
call TestBadCommandRetry {}

}

task TestOutOfMemoryRetry {
command <<<
free -h
df -h
cat /proc/cpuinfo

	echo "Killed" >&2
	tail /dev/zero
>>>

runtime {
	cpu: "1"
	memory: "1 GB"
	maxRetries: 4
	continueOnReturnCode: 0
}

}

task TestBadCommandRetry {
command <<<
free -h
df -h
cat /proc/cpuinfo

	echo "Killed" >&2
	bedtools intersect nothing with nothing
>>>

runtime {
	cpu: "1"
	memory: "1 GB"
	maxRetries: 4
	continueOnReturnCode: 0
}

}

My.conf:

include required(classpath("application"))

system {
memory-retry-error-keys = ["OutOfMemory", "Killed", "Error:"]
}

backend {
default = PAPIv2

providers {
PAPIv2 {
actor-factory = "cromwell.backend.google.pipelines.v2beta.PipelinesApiLifecycleActorFactory"

  system {
    memory-retry-error-keys = ["OutOfMemory", "Killed", "Error:"]
  }
  config {
    project = "$my_project"
    root = "$my_bucket"
    name-for-call-caching-purposes: PAPI
    slow-job-warning-time: 24 hours
    genomics-api-queries-per-100-seconds = 1000
    maximum-polling-interval = 600

    # Setup GCP to give more memory with each retry
    system {
      memory-retry-error-keys = ["OutOfMemory", "Killed", "Error:"]
    }
    system.memory-retry-error-keys = ["OutOfMemory", "Killed", "Error:"]
    memory_retry_multiplier = 4
    
    # Number of workers to assign to PAPI requests
    request-workers = 3

    virtual-private-cloud {
      network-label-key = "network-key"
      network-name = "network-name"
      subnetwork-name = "subnetwork-name"
      auth = "auth"
      }
    pipeline-timeout = 7 days
    genomics {
      auth = "auth"
      compute-service-account = "$my_account"
      endpoint-url = "https://lifesciences.googleapis.com/"
      location = "us-central1"
      restrict-metadata-access = false
      localization-attempts = 3
      parallel-composite-upload-threshold="150M"
    }
    filesystems {
      gcs {
        auth = "auth"
        project = "$my_project"
        caching {
          duplication-strategy = "copy"
        }
      }
    }
    system {
      memory-retry-error-keys = ["OutOfMemory", "Killed", "Error:"]
    }
    runtime {
      cpuPlatform: "Intel Cascade Lake"
    }
    default-runtime-attributes {
      cpu: 1
      failOnStderr: false
      continueOnReturnCode: 0
      memory: "2048 MB"
      bootDiskSizeGb: 10
      disks: "local-disk 375 SSD"
      noAddress: true
      preemptible: 1
      maxRetries: 3
      system.memory-retry-error-keys = ["OutOfMemory", "Killed", "Error:"]
      memory_retry_multiplier = 4
      zones: ["us-central1-a", "us-central1-b"]
    }

    include "papi_v2_reference_image_manifest.conf"
  }
}

}
}

gustily ls gs://cromwell-executions/MemoryRetryTest/d54a5a39-4d3b-4ac7-9bb1-97043d761b56/call-TestOutOfMemoryRetry
TestOutOfMemoryRetry.log
gcs_delocalization.sh
gcs_localization.sh
gcs_transfer.sh
rc
script
stderr
stdout
pipelines-logs

stderr:
Killed
/cromwell_root/script: line 32: 17 Killed tail /dev/zero

rc:
137

The text was updated successfully, but these errors were encountered:

sicotteh · 2024-08-12T15:56:47Z

I would like to add my support to Greg's question.

The

memory_retry_multiplier

config option would be a super-super useful feature to use for genomic workflows with varying data sises.

If it is working on GCP, ould you please document it's use better? Or let us know if it is an abandonned feature.. Or even better send us working examples :)

Thanks for all the work you do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory Retry not working, Cromwell 87 #7451

Memory Retry not working, Cromwell 87 #7451

GregoryDougherty commented Jun 14, 2024

sicotteh commented Aug 12, 2024

Memory Retry not working, Cromwell 87 #7451

Memory Retry not working, Cromwell 87 #7451

Comments

GregoryDougherty commented Jun 14, 2024

sicotteh commented Aug 12, 2024