Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot configure optional runtime-attributes for SFS backend on SLURM #7455

Open
pettyalex opened this issue Jun 18, 2024 · 0 comments
Open

Comments

@pettyalex
Copy link

OS: Centos 7
Cromwell version: cromwell 86 installed from conda-forge
Backend: SFS

Hello,

I'm configuring Cromwell to run on my group's SLURM cluster, and struggling to make cpu/memory runtime attributes optional. I believe this is supported because the "Getting started on HPC Clusters" documentation shows memory_gb as an optional runtime-attribute: https://cromwell.readthedocs.io/en/develop/tutorials/HPCIntro/

backend.providers.SGE.config {
  runtime-attributes = """
  Int cpu = 1
  Float? memory_gb
  String? sge_queue
  String? sge_project
  """
}

My intent is for the user to only provide arguments for cpu, memory, runtime_minutes, and partition if they intend to override the SLURM cluster's defaults. I do not want to have cromwell supply defaults, because if these arguments are omitted from the call to sbatch then the cluster's defaults will be used. My understanding is that making them optional like String? memory_mb and then using syntax like ${"--mem " + round(memory_mb) + "m"} \ in the submit script means that argument will only be added if memory is defined, and will be omitted if memory is not defined. I've followed the documentation as closely as I can.

However, when I try to submit a test job without cpu and memory set as a runtime attribute, I get a failure with these exceptions:

cromwell.core.CromwellAggregatedException: Initialization Failure:
Runtime validation failed:
	Task myTask has an invalid runtime attribute cpu = !! NOT FOUND !!
	Task myTask has an invalid runtime attribute memory = !! NOT FOUND !!
	at cromwell.engine.workflow.WorkflowActor$$anonfun$3.applyOrElse(WorkflowActor.scala:356)
	at cromwell.engine.workflow.WorkflowActor$$anonfun$3.applyOrElse(WorkflowActor.scala:339)
	at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35)
	at akka.actor.FSM.processEvent(FSM.scala:707)
	at akka.actor.FSM.processEvent$(FSM.scala:704)

Here is the test WDL I'm using:

# Example workflow
# Declare WDL version 1.0 if working in Terra
version 1.0
workflow myWorkflow {
    call myTask

}

task myTask {
    command <<<
        echo "hello world"
    >>>
    output {
        String out = read_string(stdout())
    }
}

And my complete configuration for this backend:

backend {
  default = slurm

  providers {
    slurm {
      actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"                                                                                     
      config {
        runtime-attributes = """
        Int? runtime_minutes
        Int? cpu
        Float? memory_mb
        String? docker
        String? partition
        """

        submit = """
            sbatch \
              --wait \
              -J ${job_name} \
              -D ${cwd} \
              -o ${out} \
              -e ${err} \
              ${"-t " + runtime_minutes} \
              ${"-c " + cpu} \
              ${"--mem " + round(memory_mb) + "m"} \
              ${"-p " + partition} \
              --wrap "/bin/bash ${script}"
        """

        kill = "scancel ${job_id}"
        check-alive = "squeue -j ${job_id}"
        job-id-regex = "Submitted batch job (\\d+).*"
      }
    }
  }
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant