Allow a step to run only when ALL upstream/ancestor steps have run

joe · December 9, 2024, 12:33pm

Look at the workflow below:

When does workflow C run?

Well, today it’ll run twice: once after step A finishes, and once after step B finishes.

Sometimes this is exactly what we want. Hurrah!

But sometimes we want C to only run when everything else has run - this is actually quite common at the end of a workflow to tidy up, or to upload a bunch of data that we’ve created throughout the workflow.

You can think of this a bit like a logical AND/OR kind of relationship. Does a step run when ALL of its parent steps executes (AND) or when EACH of its parent steps executes (sort an OR)?

So we need some kind of option to control when the step executes.

We would have to represent this visually in the workflow diagram. We also have to work out where to set the setting: is it one flag that runs workflow-wide? Is it a setting on the step (execute: all/each), or even on the link (but setting it once affects all sibling links)?

A complicated aspect of this is state. When a workflow branches, the state ALSO branches. So steps A and B receive a different state object. That means today, when step C runs twice, each run will receive a different state object as input. After B has run, step C doesn’t see the changes made in step A.

We would need some logic to resolve the state objects from each parenting branch.

This could be a simple earliest-to-latest squash, where keys from the latest object replace keys from the earliest. This is likely to be the default behaviour. And this is fine so long as each branch only mutates a unique slice of state (changes to a common state property, like counter, would be replaced and overridden, not incremented).

An alternative would be to provide a custom resolution function, which is passed the state object from each upstream step and must return a single state object. This allows users to do what they want - increment counters, merge arrays.