Dakar Ruby Conference 2022

elias · May 5, 2022, 10:01am

Happy to see such a strong turnout for the OpenFn presentation at Dakar Ruby Conf. You can find all the materials to the talk in this Github repo. I’ll reply on this thread with the official video link when it’s published.

In the meantime, some fantastic questions came up during the event, and I’ve got a list here… thought it would be useful to post this publicly!

What is an adaptor and how different is it to core ?

An adaptor is a wrapper for some target application’s API — it includes functions like “createPatient” which serve some special purpose in the “OpenMRS” application, for example. Core is the program which is used to execute jobs on OpenFn… given a job expression (some instructions), an adaptor, and an initial “state” (some data/configuration) core is able to execute the operations defined in that expression.

How do we handle runtime errors (infinite loops or very complex code) ?
Assuming that this question is about the hosted platform, we control the execution of OpenFn jobs by handling a special isolated Node virtual machine from Elixir. Our Elixir application starts up this VM, runs a client’s jobs, and then decides how and when to kill this VM if it’s taking too much time, using too many resources, etc. If you’re running your own stuff locally, you might consider changing the resource limits for your own NodeVM. See Limits | OpenFn/docs

How do we make sure people don’t execute malicious code on Platform ?

See above! In a funny sense, OpenFn offers “code injection attacks as a service!”

We’re specifically telling customers that we’re happy to execute their code on our systems. With that in mind, most of the interesting design behind the OpenFn hosted solution is related to how we create completely isolated “sandboxed runtimes” to execute a customer’s code with only the data/access required for that specific customer’s job. In fact, we decide which standard NodeJS objects are available in that sandbox and specifically pass them in.

How do we write a job that integrates two different systems ?

We covered this later in the talk, but on the OpenFn iPaaS you can chain together as many jobs as you’d like in a flow. See how state is passed between jobs on the platform here: Initial and final state for runs | OpenFn/docs

Why don’t we allow people to import adaptor helper functions directly in the job expression code instead of passing the adaptor to core at execution time?

It’s a great question, and we’re considering the implications of allowing imports in the upcoming version of our DSL! Stuart Corbishley, care to chime in on the history and roadmap here ?!

stu · May 9, 2022, 6:49am

Thanks Elias, this is a great outline of some of the questions that curious developers have about how OpenFn works!

Regarding importing modules, or allowing a user to declare their own imports comes down to historical reasons, but most importantly for user experience.

The technical vision for OpenFn came at a time where people wanted to move data between two known systems - ETL in this space was mostly custom written for each and every implementation; and our goal was (and still is) to make this as simple as possible.

By enforcing a single adaptor and therefore a single import, we can bring in functions without namespaces.
So in the context of Salesforce, every function is at the root level of the script context - to create an object it’s create, to map over an array it’s map.

Allowing multiple imports, jobs would need to reference which module the function belongs to, so in the case of having both Postgres and Salesforce on the same job, every function call would need to be prepended with the import name, so Salesforce.create, or Postgres.select.

We felt that the ability to only think about singular verbs when writing a job was a really clean way to write jobs.

All that said, allowing imports to be controlled has been something we’ve spoken and thought about a lot over the last few years - and we are aiming to introduce custom imports in the future. We still want to maintain the original style of root context functions for users that don’t want to think about namespacing - but also offer more advanced users the ability to change those imports and have more than one.

The hard part with inlining adaptor import (while maintaining root level functions), is that our adaptors can have a lot of functions.
All adaptors export all of language-commons functions, bringing in more than 20 functions!

import { each, map, dataValue, create, update, … } from ‘@openfn/language-salesforce’;

With code formatting the import block for a single adaptor would occupy more than 20 lines.

While this presents a perfect amount of control, we want to make sure users can still focus on writing jobs; and not worry about dependencies. It’s not an easy compromise to make.

Last but not least, since we automatically configure adaptors in runtime - this was built with the assumption that there would be only one. Our runtime would need to be smart enough to configure every adaptor that was imported. Not a deal breaker by any means - but in order to make sure we keep things as simple as possible for users who don’t want more than one adaptor; when we introduce custom imports we want keep the experience as seamless as possible.

I am envisioning that the specified adaptor becomes part of the jobs code; instead of a setting for the job itself.

By selecting an adaptor, it would inject an import at the top of the code; and perhaps leveraging Monaco (the editor that VSCode uses under the hood) we can hide imports for users that don’t want to see all of that all the time.

We’re making a lot of large under the hood changes to how we introspect job code, all of which will be used to make this kind of power-user functionality a thing.

If anyone in the community has any suggestions, or ideas (or experience with the Typescript compiler internals) we’d love to hear from you!