Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/main'
Browse files Browse the repository at this point in the history
  • Loading branch information
mattrasmus committed Sep 3, 2024
2 parents c82dda2 + 05f68c6 commit f40bc65
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 11 deletions.
8 changes: 4 additions & 4 deletions docs/source/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -378,7 +378,7 @@ We call File an example of an *External Value*, because Files can change in an e

redun naturally understands which Files are used for input vs output based on whether they are passed as arguments or returned as results, respectively. Note, it can lead to confusing behavior to pass a File as input to a Task, alter it, and then return it as a result. That would lead to the recorded call node to be immediately out of date (its input File hash doesn't match anymore). The user should be careful to avoid this pattern.

See the [Validity](values.md#Validity) section below for additional discussion on the general feature.
See the [Validity](values.md#validity) section below for additional discussion on the general feature.

### Shell scripting

Expand Down Expand Up @@ -575,7 +575,7 @@ def main():

Depicted above is an example call graph and job tree (left) for an execution of a workflow (right). When each task is called, a CallNode is recorded along with all the Values used as arguments and return values. As tasks (`main`) call child tasks, children CallNodes are recorded (`task1` and `task2`). "Horizontal" dataflow is also recorded between sibling tasks, such as `task1` and `task2`. Each node in the call graph is identified by a unique hash and each Job and Execution is identified by a unique UUID. This information is stored by default in the redun database `.redun/redun.db`.

The redun backend database provides a durable record of these call graphs for every execution redun performs. This not only provides the backend storage for caching, it also is queryable by users to explore the call graph, using the `redun log`, `redun console`, and `redun repl` commands. For example, if we know that a file `/tmp/data` was produced by redun, we can find out exactly which execution did so, and hence can retrieve information about the code and inputs used to do so. See [querying call graphs](db.md#Querying-call-graphs) for more.
The redun backend database provides a durable record of these call graphs for every execution redun performs. This not only provides the backend storage for caching, it also is queryable by users to explore the call graph, using the `redun log`, `redun console`, and `redun repl` commands. For example, if we know that a file `/tmp/data` was produced by redun, we can find out exactly which execution did so, and hence can retrieve information about the code and inputs used to do so. See [querying call graphs](db.md#querying-call-graphs) for more.

## Advanced topics

Expand All @@ -588,7 +588,7 @@ It's common to use workflow engines to implement Extract Transform Load (ETL) pi
- With files, we were able to double check if their current state was consistent with our cache by hashing them. With a database or API, it's typically not feasible to hash a whole database. Is there something else we could do?
- The redun cache contains cached results from all previous runs. Conveniently, that allows for fast reverting to old results if code or input data is changed back to the old state. However, for a stateful system like a database, we likely can't just re-execute arbitrary tasks in any order. Similar to database migration frameworks (South, Alembic, etc), we may need to roll back past tasks before applying new ones.

redun provides solutions to several of these challenges using a concept called (Handles)[values.md#Handles-for-ephemeral-and-stateful-values].
redun provides solutions to several of these challenges using a concept called [Handles](values.md#handles-for-ephemeral-and-stateful-values).

### Running without a scheduler

Expand Down Expand Up @@ -618,7 +618,7 @@ task is far more independent of the parent scheduler, able to interact with the
resolve complex expressions or recursive tasks.

Third, federated task for submitting to a REST proxy is fire-and-forget; see
[Federated task](tasks.md#Federated-task) It will trigger a
[Federated task](tasks.md#federated-task) It will trigger a
completely separate redun execution to occur, but it only provides the execution id back to the
caller. It doesn't make sense for the REST proxy to be a full executor, since it's not
capable enough to handle arbitrary tasks, by design it only handles federated tasks. Plus,
Expand Down
6 changes: 3 additions & 3 deletions docs/source/scheduler.md
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@ with CSE, that the cached value is appropriate to use.
Task caching operates at the granularity of a single
call to a `Task` with concrete arguments. Recall that the result of a `Task` might be a value,
or another expression that needs further evaluation. In its normal mode, caching uses single
reductions, stepping through the evaluation. See the [Results caching](design.md#Result-caching)
reductions, stepping through the evaluation. See the [Results caching](design.md#result-caching)
section, for more information on how this recursive checking works.

Consider the following example:
Expand Down Expand Up @@ -206,9 +206,9 @@ To evaluate `out`, the following three task executions might be considered for c

For CSE, we could simply assume that the code was identical for a task, but for caching,
need to actually check that the code is identical, as defined by the
[hash of the Task](tasks.md#Task-hashing). Since `Value` objects can represent state in addition
[hash of the Task](tasks.md#task-hashing). Since `Value` objects can represent state in addition
to their natural values, we need to check that the output is actually valid before using a cache
result; see [Validity](values.md#Validity).
result; see [Validity](values.md#validity).

The normal caching mode (so-called "full") is fully recursive (i.e., uses single reductions),
hence the scheduler must visit every node in the entire call graph produced by an expression,
Expand Down
8 changes: 4 additions & 4 deletions docs/source/tasks.md
Original file line number Diff line number Diff line change
Expand Up @@ -240,12 +240,12 @@ Lastly, several task options, such as [`image`](config.md) or [`memory`](config.
Generally not a user-facing option, this is a `Optional[Set[CacheResult]]` specifying an upper bound on which kind of cache results are may be used (default: `None`, indicating that any are allowed).

### `cache`
A bool (default: `true`) that defines whether the backend cache can be used to fast-forward through the task's execution. See [Scheduler](scheduler.md#Configuration-options) for more explanation.
A bool (default: `true`) that defines whether the backend cache can be used to fast-forward through the task's execution. See [Scheduler](scheduler.md#configuration-options) for more explanation.
A value of `true` is implemented by setting `cache_scope=CacheScope.BACKEND` and `false` by setting `cache_scope=CacheScope.CSE`.

### `cache_scope`

A `CacheScope` enum value (default: `CacheScope.BACKEND`) that indicates the upper bound on what scope a cache result may come from. See [Scheduler](scheduler.md#Configuration-options) for more explanation.
A `CacheScope` enum value (default: `CacheScope.BACKEND`) that indicates the upper bound on what scope a cache result may come from. See [Scheduler](scheduler.md#configuration-options) for more explanation.

* `NONE`: Disable both CSE and cache hits
* `CSE`: Only reuse computations from within this execution
Expand All @@ -254,7 +254,7 @@ A `CacheScope` enum value (default: `CacheScope.BACKEND`) that indicates the upp
### `check_valid`
An enum value `CacheCheckValid` (or a string that can be coerced, default: `"full"`) that defines whether the entire subtree
of results is checked for validity (`"full"`) or whether just this task's ultimate results need to be valid (`"shallow"`). This can be used to dramatically speed up resuming large workflows.
See [Scheduler](scheduler.md#Configuration-options) for more explanation.
See [Scheduler](scheduler.md#configuration-options) for more explanation.

### `config_args`

Expand Down Expand Up @@ -716,7 +716,7 @@ which are additional config files that are allowed to specify additional `federa
In addition to primary federated tasks, we provide tools to support REST-based proxy.
See `redun.federated_tasks.rest_federated_task` and `redun.federated_tasks.launch_federated_task`.
The proxy has two main features. First, it is designed to help facilitate a fire-and-forget approach
to launching jobs (see [Running without a scheduler](design.md#Running-without-a-scheduler) ),
to launching jobs (see [Running without a scheduler](design.md#running-without-a-scheduler) ),
which is useful in implementing a UI. Second, it can help arrange for permissions, such as
facilitating AWS role switches.

Expand Down

0 comments on commit f40bc65

Please sign in to comment.