Skip to content

Commit

Permalink
tes backend prototype with centuar setup
Browse files Browse the repository at this point in the history
added -elocaldockertest to centaur command

print TES logs after run

fixed unmarshalling bug

inputs in read-only volume; matched TES case classes to schema
  • Loading branch information
adamstruck committed Feb 16, 2017
1 parent c398490 commit b9faa69
Show file tree
Hide file tree
Showing 27 changed files with 1,582 additions and 3 deletions.
1 change: 1 addition & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ env:
- BUILD_TYPE=checkPublish
- BUILD_TYPE=centaurJes
- BUILD_TYPE=centaurLocal
- BUILD_TYPE=centaurTes
script:
- src/bin/travis/test.sh
after_success:
Expand Down
68 changes: 67 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,11 @@ A [Workflow Management System](https://en.wikipedia.org/wiki/Workflow_management
* [Refresh Token](#refresh-token)
* [Docker](#docker)
* [Monitoring](#monitoring)
* [GA4GH TES Backend](#ga4gh-tes-backend)
* [Configuring](#configuring)
* [Supported File Systems](#supported-file-systems)
* [Docker](#docker)
* [CPU, Memory and Disk](#cpu-memory-and-disk)
* [Runtime Attributes](#runtime-attributes)
* [Specifying Default Values](#specifying-default-values)
* [continueOnReturnCode](#continueonreturncode)
Expand All @@ -75,7 +80,8 @@ A [Workflow Management System](https://en.wikipedia.org/wiki/Workflow_management
* [Local Filesystem Options](#local-filesystem-options)
* [Imports](#imports)
* [Sub Workflows](#sub-workflows)
* [Meta blocks](#meta-blocks)
* [Execution](#execution)
* [Metadata](#metadata)
* [REST API](#rest-api)
* [REST API Versions](#rest-api-versions)
* [POST /api/workflows/:version](#post-apiworkflowsversion)
Expand Down Expand Up @@ -509,6 +515,7 @@ Cromwell distribution:
* Google JES - Launch jobs on Google Compute Engine through the Job Execution Service (JES).
* HtCondor - Allows to execute jobs using HTCondor.
* Spark - Adds support for execution of spark jobs.
* TES - Launch jobs on servers that support the GA4GH Task Execution Schema (TES).

Backends are specified in the `backend` configuration block under `providers`. Each backend has a configuration that looks like:

Expand Down Expand Up @@ -1392,6 +1399,65 @@ In order to monitor metrics (CPU, Memory, Disk usage...) about the VM during Cal

The output of this script will be written to a `monitoring.log` file that will be available in the call gcs bucket when the call completes. This feature is meant to run a script in the background during long-running processes. It's possible that if the task is very short that the log file does not flush before de-localization happens and you will end up with a zero byte file.

## GA4GH TES Backend
The TES backend submit jobs to a server that complies with the protocol described by the [GA4GH schema](https://github.com/ga4gh/task-execution-schemas).

This backend creates three files in the `<call_dir>`:

* `script` - A shell script of the job to be run. This contains the user's command from the `command` section of the WDL code.
* `stdout` - The standard output of the process
* `stderr` - The standard error of the process

The `script` file contains:

```
#!/bin/sh
cd <container_call_root>
<user_command>
echo $? > rc
```

`<container_call_root>` would be equal to the runtime attribute `dockerWorkingDir` or `/cromwell-executions/<workflow_uuid>/call-<call_name>/execution` if this attribute is not supplied.

### Configuring
Configuring the TES backend is straightforward; one must only provide the TES API endpoint for the service.

```hocon
backend {
default = "TES"
providers {
TES {
actor-factory = "cromwell.backend.impl.tes.TesBackendLifecycleActorFactory"
config {
endpoint = "https://<some-url>/v1/jobs"
root = "cromwell-executions"
dockerRoot = "/cromwell-executions"
concurrent-job-limit = 1000
}
}
}
}
```

### Supported File Systems
Currently this backend only works with files on a Local or Shared File System.

### Docker
This backend supports the following optional runtime attributes / workflow options for working with Docker:
* docker: Docker image to use such as "Ubuntu".
* dockerWorkingDir: defines the working directory in the container.

Outputs:
It will use `dockerOutputDir` runtime attribute / workflow option to resolve the folder in which the execution results will placed. If there is no `dockerWorkingDir` defined it will use `/cromwell-executions/<workflow_uuid>/call-<call_name>/execution`.

### CPU, Memory and Disk
This backend supports CPU, memory and disk size configuration through the use of the following runtime attributes / workflow options:
* cpu: defines the amount of CPU to use. Default value: 1. Type: Integer. Ex: 4.
* memory: defines the amount of memory to use. Default value: "512 MB". Type: String. Ex: "4 GB" or "4096 MB"
* disk: defines the amount of disk to use. Default value: "1024 MB". Type: String. Ex: "1 GB" or "1024 MB"

It they are not set, the TES backend will use default values.

# Runtime Attributes

Runtime attributes are used to customize tasks. Within a task one can specify runtime attributes to customize the environment for the call.
Expand Down
8 changes: 8 additions & 0 deletions build.sbt
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,12 @@ lazy val sfsBackend = (project in backendRoot / "sfs")
.dependsOn(gcsFileSystem)
.dependsOn(backend % "test->test")

lazy val tesBackend = (project in backendRoot / "tes")
.settings(tesBackendSettings:_*)
.withTestSettings
.dependsOn(sfsBackend)
.dependsOn(backend % "test->test")

lazy val htCondorBackend = (project in backendRoot / "htcondor")
.settings(htCondorBackendSettings:_*)
.withTestSettings
Expand Down Expand Up @@ -93,10 +99,12 @@ lazy val root = (project in file("."))
.aggregate(htCondorBackend)
.aggregate(sparkBackend)
.aggregate(jesBackend)
.aggregate(tesBackend)
.aggregate(engine)
// Next level of projects to include in the fat jar (their dependsOn will be transitively included)
.dependsOn(engine)
.dependsOn(jesBackend)
.dependsOn(tesBackend)
.dependsOn(htCondorBackend)
.dependsOn(sparkBackend)
// Dependencies for tests
Expand Down
13 changes: 11 additions & 2 deletions core/src/main/resources/reference.conf
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ workflow-options {
#workflow-failure-mode: "ContinueWhilePossible"
}

// Optional call-caching configuration.
# Optional call-caching configuration.
call-caching {
# Allows re-use of existing results for jobs you've already run
# (default: false)
Expand Down Expand Up @@ -175,7 +175,7 @@ backend {
# Root directory where Cromwell writes job results. This directory must be
# visible and writeable by the Cromwell process as well as the jobs that Cromwell
# launches.
root: "cromwell-executions"
root = "cromwell-executions"

filesystems {
local {
Expand Down Expand Up @@ -204,6 +204,15 @@ backend {
}
}

#TES {
# actor-factory = "cromwell.backend.impl.tes.TesBackendLifecycleActorFactory"
# config {
# root = "cromwell-executions"
# dockerRoot = "/cromwell-executions"
# endpoint = "http://127.0.0.1:9000/v1/jobs"
# }
#}

#SGE {
# actor-factory = "cromwell.backend.impl.sfs.config.ConfigBackendLifecycleActorFactory"
# config {
Expand Down
4 changes: 4 additions & 0 deletions project/Dependencies.scala
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,10 @@ object Dependencies {
"org.mongodb" %% "casbah" % "3.0.0"
)

val tesBackendDependencies = List(
"io.spray" %% "spray-client" % sprayV
) ++ sprayServerDependencies

val sparkBackendDependencies = List(
"io.spray" %% "spray-client" % sprayV
) ++ sprayServerDependencies
Expand Down
5 changes: 5 additions & 0 deletions project/Settings.scala
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,11 @@ object Settings {
name := "cromwell-sfs-backend"
) ++ commonSettings

val tesBackendSettings = List(
name := "cromwell-tes-backend",
libraryDependencies ++= tesBackendDependencies
) ++ commonSettings

val htCondorBackendSettings = List(
name := "cromwell-htcondor-backend",
libraryDependencies ++= htCondorBackendDependencies
Expand Down
13 changes: 13 additions & 0 deletions src/bin/travis/resources/tes.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
HttpPort: 9000
Storage:
- Local:
AllowedDirs:
- /home/
- /cromwell-executions
- /tmp/
DBPath: /tmp/tes_task.db
Schedulers:
Local:
NumWorkers: 4
Worker:
Timeout: 1
32 changes: 32 additions & 0 deletions src/bin/travis/resources/tes_centaur.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
akka {
loggers = ["akka.event.slf4j.Slf4jLogger"]
logging-filter = "akka.event.slf4j.Slf4jLoggingFilter"
}

spray.can {
server {
request-timeout = 40s
}
client {
request-timeout = 40s
connecting-timeout = 40s
}
}

call-caching {
enabled = true
}

backend {
default = "TES"
providers {
TES {
actor-factory = "cromwell.backend.impl.tes.TesBackendLifecycleActorFactory"
config {
root = "cromwell-executions"
dockerRoot = "/cromwell-executions"
endpoint = "http://127.0.0.1:9000/v1/jobs"
}
}
}
}
2 changes: 2 additions & 0 deletions src/bin/travis/test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ SCRIPT_DIR=src/bin/travis
# BUILD_TYPE is coming in from the Travis build matrix
if [ "$BUILD_TYPE" = "centaurJes" ]; then
"${SCRIPT_DIR}"/testCentaurJes.sh
elif [ "$BUILD_TYPE" = "centaurTes" ]; then
"${SCRIPT_DIR}"/testCentaurTes.sh
elif [ "$BUILD_TYPE" = "centaurLocal" ]; then
"${SCRIPT_DIR}"/testCentaurLocal.sh
elif [ "$BUILD_TYPE" = "sbt" ]; then
Expand Down
55 changes: 55 additions & 0 deletions src/bin/travis/testCentaurTes.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#!/usr/bin/env bash

printTravisHeartbeat() {
# Sleep one minute between printouts, but don't zombie for more than two hours
for ((i=0; i < 120; i++)); do
sleep 60
printf ""
done &
TRAVIS_HEARTBEAT_PID=$!
}

killTravisHeartbeat() {
if [ -n "${TRAVIS_HEARTBEAT_PID+set}" ]; then
kill ${TRAVIS_HEARTBEAT_PID} || true
fi
}

exitScript() {
echo "TES LOG"
cat logs/tes.log
echo "CROMWELL LOG"
cat logs/cromwell.log
echo "CENTAUR LOG"
cat logs/centaur.log
killTravisHeartbeat
}

trap exitScript EXIT
printTravisHeartbeat

set -x
set -e

sbt assembly
CROMWELL_JAR=$(find "$(pwd)/target/scala-2.11" -name "cromwell-*.jar")
TES_CENTAUR_CONF="$(pwd)/src/bin/travis/resources/tes_centaur.conf"
git clone https://github.com/broadinstitute/centaur.git
cd centaur
git checkout ${CENTAUR_BRANCH}
cd ..

TES_CONF="$(pwd)/src/bin/travis/resources/tes.conf"
git clone https://github.com/ohsu-comp-bio/funnel.git
cd funnel
make
cd ..
mkdir logs
nohup funnel/bin/tes-server -config ${TES_CONF} > logs/tes.log 2>&1 &


# All tests use ubuntu:latest - make sure it's there before starting the tests
# because pulling the image during some of the tests would cause them to fail
# (specifically output_redirection which expects a specific value in stderr)
docker pull ubuntu:latest
centaur/test_cromwell.sh -j"${CROMWELL_JAR}" -c${TES_CENTAUR_CONF} -elocaldockertest
Loading

0 comments on commit b9faa69

Please sign in to comment.