Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version and flags #182

Merged
merged 21 commits into from
Jun 17, 2024
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
188 changes: 107 additions & 81 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,35 @@
# Arcaflow Engine
# Arcaflow: The Noble Workflow Engine
<img align="left" width="200px" style="padding-right: 2em; padding-bottom: 2em" alt="Arcaflow logo showing
a waterfall and a river with 3 trees symbolizing the various plugins"
src="https://github.com/arcalot/.github/raw/main/branding/arcaflow.png">
webbnh marked this conversation as resolved.
Show resolved Hide resolved

The Arcaflow Engine allows you to run workflows using container engines, such as Docker or Kubernetes. The plugins must be built with the [Arcaflow SDK](https://arcalot.io/arcaflow/creating-plugins/python/).
**Arcaflow** is a highly-flexible and portable **workflow system** that helps
you to build simple or complex pipelines of actions with data passing between the
actions. The **data is validated** according to schemas along the way to make sure
dbutenhof marked this conversation as resolved.
Show resolved Hide resolved
there is no corrupt data, catching potential failure conditions before they happen.
Arcaflow runs on your laptop, a jump host, or in a CI system requiring only the Arcaflow
engine binary, a workflow definition in YAML, and a compatible container runtime.

## Pre-built binaries
[Complete Arcaflow Documentation](https://arcalot.io/arcaflow)

If you want to use our pre-built binaries, you can find them in the [releases section](https://github.com/arcalot/arcaflow-engine/releases).
![image](arcaflow-basic-demo.gif)

# The Arcaflow Engine

The Arcaflow Engine is the core execution component for workflows. It allows you to run
workflows using container engines including Docker, Podman, and Kubernetes. The Arcaflow
dbutenhof marked this conversation as resolved.
Show resolved Hide resolved
SDK is available for [Python](https://arcalot.io/arcaflow/creating-plugins/python/) and
dbutenhof marked this conversation as resolved.
Show resolved Hide resolved
[Golang](https://github.com/arcalot/arcaflow-plugin-sdk-go) to aid with the development
of plugins, from which workflows are constructed.
[Official plugins](https://github.com/orgs/arcalot/repositories?q=arcaflow-plugin-) are
maintained within the Arcalot organization and are available as
[versioned containers from Quay.io](https://quay.io/organization/arcalot).

## Pre-built engine binaries

Our pre-built engine binaries are available in the
[releases section](https://github.com/arcalot/arcaflow-engine/releases) for multiple
platforms and architectures.

## Building from source

Expand All @@ -14,15 +39,16 @@ This system requires at least Go 1.18 to run and can be built from source:
go build -o arcaflow cmd/arcaflow/main.go
```

This binary can then be used to run Arcaflow workflows.
This self-contained engine binary can then be used to run Arcaflow workflows.

## Building a simple workflow
## Running a simple workflow

The simplest workflow is the example plugin workflow using the workflow schema version `v0.2.0`: (save it to workflow.yaml)
A set of [example workflows](https://github.com/arcalot/arcaflow-workflows) is available
to demonstrate workflow features. A basic example `workflow.yaml` may look like this:

```yaml
version: v0.2.0
input:
version: v0.2.0 # The compatible workflow schema version
input: # The input schema for the workflow
root: RootObject
objects:
RootObject:
Expand All @@ -31,108 +57,108 @@ input:
name:
type:
type_id: string
steps:
steps: # The individual steps of the workflow
example:
plugin: ghcr.io/janosdebugs/arcaflow-example-plugin
# step: step-id if the plugin has more than one step
# deploy:
# type: docker|kubernetes
# ... more options
plugin:
deployment_type: image
src: quay.io/arcalot/arcaflow-plugin-example
input:
name: !expr $.input.name
output:
message: !expr $.steps.example.outputs.success.message
outputs: # The expected output schema and data for the workflow
success:
message: !expr $.steps.example.outputs.success.message
```

As you can see, it has a `version`, `input`, a list of `steps`, and an `output` definition. Each of these keys is required in a workflow. These can be linked together using JSONPath expressions (not all features are supported). The expressions also determine the execution order of plugins.
As you can see, a workflow has the root keys of `version`, `input`, `steps`, and
`outputs`. Each of these keys is required in a workflow. Output values and inputs to
steps can be specified using the Arcaflow
[expression language](https://arcalot.io/arcaflow/workflows/expressions/). Input and
output references create dependencies between the workflow steps which determine their
execution order.

You can now create an input YAML for this workflow: (save it to input.yaml)
An input YAML file for this basic workflow may look like this:

```yaml
name: Arca Lot
```

If you have a local Docker / Moby setup installed, you can run it immediately:
The Arcaflow engine uses a configuration to define the standard behaviors for deploying
plugins within the workflow. The default configuration will use Docker as the container
runtime and will set the log outputs to the `info` level.

```
./arcaflow -input input.yaml
If you have a local Docker / Moby setup installed, you can simply run the workflow like
this:

```bash
arcaflow --input input.yaml
```

If you don't have a local Docker setup, you can also create a `config.yaml` with the following structure:
This results in the default behavior of using the built-in configuration and reading the
workflow from the `workflow.yaml` file in the current working directory.

If you don't have a local Docker setup, or if you want to use another deployer or any
custom configuration parameters, you can create a `config.yaml` with your desired
parameters. For example:

```yaml
deployers:
image:
deployer_name: docker|podman|kubernetes
python:
deployer_name: python
# More deployer options
deployer_name: podman
log:
level: debug|info|warning|error
level: debug
logged_outputs:
error:
level: debug
```

You can load this config by passing the `-config` flag to Arcaflow.

### Supported Workflow Schema Versions
You can load this config by passing the `--config` flag to Arcaflow.

- v0.2.0
```bash
arcaflow --input input.yaml --config config.yaml
```

## Deployer options
The default workflow file name is `workflow.yaml`, but you can override this with the
`--workflow` input parameter.

Currently, the two deployer types supported are Docker and Kubernetes.
Arcaflow also accepts a `--context` parameter that defines the base directory for all
input files. All relative file paths are from the context directory, and absolute paths
are also supported. The default context is the current working directory (`.`).

### The Docker deployer
### A few command examples...

This deployer uses the Docker socket to launch containers. It has the following config structure:
Use the built-in configuration and run the `workflow.yaml` file from the `/my-workflow`
context directory with no input:

```yaml
image:
deployer_name: docker
connection:
host: # Docker connection string
cacert: # CA certificate for engine connection in PEM format
cert: # Client cert in PEM format
key: # Client key in PEM format
deployment:
container: # Container options, see https://docs.docker.com/engine/api/v1.41/#tag/Container/operation/ContainerCreate
host: # Host options, see https://docs.docker.com/engine/api/v1.41/#tag/Container/operation/ContainerCreate
network: # Network options, see https://docs.docker.com/engine/api/v1.41/#tag/Container/operation/ContainerCreate
platform: # Platform options, see https://docs.docker.com/engine/api/v1.41/#tag/Container/operation/ContainerCreate

# Pull policy, similar to Kubernetes
imagePullPolicy: Always|IfNotPresent|Never
timeouts:
http: 15s
```bash
arcaflow --context /my-workflow
```

**Note:** not all container options are supported. STDIN/STDOUT-related options are disabled. Some other options may not be implemented yet, but you will always get an error message explaining missing options.
Use a custom `my-config.yaml` configuration file and run the `my-workflow.yaml` workflow
using the `my-input.yaml` input file from the current directory:

## The Kubernetes deployer
```bash
arcaflow --config my-config.yaml --workflow my-workflow.yaml --input my-input.yaml
```

The Kubernetes deployer deploys on a Kubernetes cluster. It has the following config structure:
Use a custom `config.yaml` configuration file and the default `workflow.yaml` file from
the `/my-workflow` context directory, and an `input.yaml` file from the current working
directory:

```yaml
image:
deployer_name: kubernetes
connection:
host: api.server.host
path: /api
username: foo
password: bar
serverName: tls.server.name
cert: PEM-encoded certificate
key: PEM-encoded key
cacert: PEM-encoded CA certificate
bearerToken: Bearer token for access
qps: queries per second
burst: burst value
deployment:
metadata:
# Add pod metadata here
spec:
# Add a normal pod spec here, plus the following option here:
pluginContainer:
# A single container configuration the plugin will run in. Do not specify the image, the engine will fill that.
timeouts:
http: 15s
```bash
arcaflow --context /my-workflow --config config.yaml --input ${PWD}/input.yaml
```

## Deployers

Image-based deployers are used to deploy plugins to container platforms. Each deployer
has configuraiton parameters specific to its platform. These deployers are:

- [Docker](https://github.com/arcalot/arcaflow-engine-deployer-docker)
- [Podman](https://github.com/arcalot/arcaflow-engine-deployer-podman)
- [Kubernetes](https://github.com/arcalot/arcaflow-engine-deployer-kubernetes)

There is also a
[Python deployer](https://github.com/arcalot/arcaflow-engine-deployer-python) that
allows for running Python plugins directly instead of containerized. *Note that not all
Python plugins may work with the Python deployer, and any plugin dependencies must be
present on the target system.*
Binary file added arcaflow-basic-demo.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading