Using data from previous workflow run during it's second iteration

Kiefer · November 28, 2024, 6:16pm

What is the most effective way to implement this workflow in , where:

Job 1 makes an API call to retrieve data on the first iteration.
Job 2 pushes the data to another application via API.
On subsequent iterations, Job 1 pulls new data via the API and compares it with the previously retrieved data to ensure only updated information is pushed by Job 2?

Initially I designed the work flow to pull the data from the application where the data is being pushed to and use this to compare to the data being pulled. I am just wondering if there is a more effective to this as i mentioned above.

aleksa-krolls · November 29, 2024, 6:49pm

Hmm… well is this a cron-triggered workflow? If yes, then you can pass state between successful runs (see docs), so in theory you could write the dataRetrieved to the final state of the job, to then access and compare new data with with in the subsequent run. However, you’ll need to be careful about managing the state over time.

hey @joe @taylordowns2000 any recommendations or feedback on this approach? And maybe could this be a use case for collections?
(Kiefer, collections is a new feature just released that provides a temporary data store in OpenFn. @ayodele should be sharing more about this new feature in the community next week!)

Kiefer · November 29, 2024, 7:42pm

The use of the collections sounds perfect for this use case.

Thank you.

joe · November 30, 2024, 10:38am

Yeah collections sounds perfect for this! You can “cache” a list of ids of the data you’ve already processed, and compare and update that on every run to ensure you avoid duplicates.

We’ve got a few outstanding issues on collections which I’ll be fixing next week - but it should work well for you already. I’m happy to help here if you have any questions or difficulties!

Kiefer · December 2, 2024, 7:21pm

Are there any issues currently preventing the collections from functioning as expected running Lightning locally via Docker?

I am currently attempting to test the use of this.

ayodele · December 2, 2024, 11:13pm

Hello Kiefer,

Collections was released officially in v2.10.0. Please can you confirm you’re running v2.10.* upwards?

If you’re not running a recent version, you should consider upgrading your local version to latest (v2.10.4) to use collections.

Thanks

joe · December 3, 2024, 10:25am

Hey @Kiefer - I’m not aware of any problems you’ll have with a local Docker build. It should all just work as usual. There’s an endpoint at /collections that needs to be exposed but I don’t think you’d need any special handling for that.

But of course do let us know if you encounter any difficulties!

aleksa-krolls · December 3, 2024, 12:30pm

@Kiefer you also need to be a “super admin” user in order to access and create new collections - again see docs. If you’re running into issues, let us know what you are/aren’t seeing and what version of lightning you’re running locally.

Kiefer · December 3, 2024, 5:33pm

I am running v2.10.4, i was able to create the collection successfully.

I attempted to add some test data into the collection . Please find attached a screenshot of the job.

The credentials was created and configured according to the documentation.

joe · December 3, 2024, 6:09pm

Curious! Everything looks alright at first glance. I’ve just run a couple of local tests and can’t see anything obviously wrong

Can you share the contents of your run log with us? You can either copy & paste the text or drop a screenshot

joe · December 3, 2024, 6:29pm

Oh hang on - I just re-read your post an my eyes pricked up on "The credentials was created and configured according to the documentation.

You don’t need to create any credential for collections - Lightning takes care of that for you. So if you’ve created a special credential, the next step would be to remove it and run again.

Kiefer · December 3, 2024, 6:57pm

I removed the credential created and still get the same issue.
Attached is a screenshot showing the run log.