[DMP 2024] Automatically generate job expressions from prompts

ayodele · June 13, 2024, 11:23am

We are excited to participate in the 2024 Direct Mentoring Program by supporting two software engineers interested in contributing to Open Source Gov4Tech programs. This public channel is dedicated to discussing our sponsored feature: ** Automatically generate job expressions from prompts**.

Mentors, contributors, and community members are welcome to ask and answer questions related to this topic here.

Feel free to engage and share your insights!

github.com/OpenFn/kit

[DMP 2024] Generate Job expressions

opened 12:31AM - 05 Mar 24 UTC

christad92

DMP 2024

## How to apply Do not ask process related questions about how to apply and w…ho to contact in the above ticket. The only questions allowed are about technical aspects of the project itself. If you want help with the process, you can refer instructions listed on [Unstop](https://unstop.com/competitions/dedicated-mentoring-program-dmp-2024-code-for-govtech-932803?lb=7gWXVfKv) and any further queries can be taken up on our [Discord channel](https://discord.com/invite/VPrXf7Jxpr) titled DMP queries. Here's a [Video Tutorial](https://youtu.be/nMW-nD8WzHY) on how to submit a proposal for a project. --- name: Generate a job expression.js file from 3 sample inputs and a desired output about: OpenFn's submission for the Code for GovTech program title: Generate a job expression.js file from 3 sample inputs and a desired output labels: `OpenFn`, `CLI`, `History`, `AI` assignees: '' --- OpenFn is an open source platform for data integration and workflow automation accessible to users thorough a [CLI](https://github.com/OpenFn/kit/tree/main/packages/cli) or a [web UI](http://github.com/openfn/lightning). To use OpenFn, users build `workflows` which are made up of one or more steps—at the time of writing these are all JavaScript-based "jobs" (the JS code itself is called a "job expression"). These `jobs` make use of adaptors to perform their tasks, e.g. Make a request to an API endpoint, update a record in a database, aggregate data, send data to an external platform. Here is an example of a job that uses the [common adaptor](https://docs.openfn.org/adaptors/packages/common-docs) to transform an input data (in state) into a new object transformedPatient ```js fn(state => { const transformedPatient = { ...state.data.patient, status: "enrolled" } return { ...state, transformedPatient }; }) ``` And here is another job expression that uses the [dhis2 adaptor](https://docs.openfn.org/adaptors/packages/dhis2-docs) to create a new patient record referred to as trackedEntityInstance. ```js create('trackedEntityInstances', { orgUnit: "dWOAzMcK2Wt" /*Alkalia CHP*/, trackedEntityType: 'nEenWmSyUEp' /*Person*/, attributes: [ { attribute: 'w75KJ2mc4zz', value: state.person.age.first_name }, { attribute: 'zDhUuAYrxNC', value: state.person.age.last_name }, { attribute: 'cejWyOfXge6', value: state.person.age.gender }, ], }); ``` Learn more about [adaptors here](https://docs.openfn.org/adaptors) and how they are used in OpenFn workflows. When building workflows, users spend most of their time writing simple to advanced jobs on OpenFn. We'd like to harness AI to write job expressions based on english text requirements (maybe as code comments, like GitHub Co-Pilot?) This feature is able to generate a job expression given some sample input data, the adaptor specification (name and version), and a text description of the desired output/instruction. It is expected that the generated job expression can be executed in the CLI. **Inputs for the AI engine:** 1. One or more sample inputs (valid JSON) which can serve as the initial `state` for the job ```json { "data": { "name": "bukayo saka", "gender": "male" } } ``` 2. The adaptor specification will be in the form of “@openfn/language-dhis2@1.2.3” or “@openfn/language-common@1.2.3” ``` @openfn/language-dhis2@4.0.3 ``` 3. The text instructions will be in the form of: “Create a new object based on patient object, and set it’s status attribute “enrolled” or “Create a trackedEntityInstance record in DHIS2 using the data from state.person” ``` Create a new trackedEntityInstance "person" in dhis2 for the "dWOAzMcK2Wt" orgUnit. ``` **Sample Output** Given the inputs above, we'd expect the output code to be: ```js create('trackedEntityInstances', { orgUnit: "dWOAzMcK2Wt", trackedEntityType: 'nEenWmSyUEp', attributes: [ { attribute: 'w75KJ2mc4zz', value: state.data.name.split(' ')[0] }, { attribute: 'zDhUuAYrxNC', value: state.data.name.split(' ')[1] }, { attribute: 'cejWyOfXge6', value: state.data.gender }, ], }); ``` Acceptance Criteria: - The model should be take in three parameters as defined above. - The model should be executed in the CLI - The model should generate a job expression - The generated job must follow the convention defined in the adaptor documentation. Documentation: - [How to write jobs](https://docs.openfn.org/documentation/jobs/job-writing-guide) - [Workflows](https://docs.openfn.org/documentation/build/workflows) - [Adaptors](https://docs.openfn.org/adaptors) ### Product Name Product Name: OpenFn ### Project Name Project Name: Generate Job expressions (expression.js) from 3 sample inputs ### Organization Name: Open Function Group ### Domain [Others] ### Tech Skills Needed: Javascript, AI, python ### Mentor(s) ### Complexity [High] ### Category [Feature], [PoC] ### Sub Category Pick one or more of [Artificial Intelligence], [Backend], [Artificial Intelligence].

satyammattoo · June 16, 2024, 11:23am

Thank you for giving this opportunity. I am pleased to inform you that I have been selected for this project.

For this feature, I have planned the following approach:

We can leverage the existing Apollo (formerly Gen) repository. I noticed that you have set up the initial framework for making calls to Apollo services through the CLI.

To move forward, I propose creating a service for job generation using the following inputs. Users would provide these inputs via a .json file:

{
  "api_key": "apiKey",
  "adaptor": "@openfn/language-dhis2@4.0.3",
  "data": {
    "name": "bukayo saka",
    "gender": "male"
  },
  "signature": "Create a new trackedEntityInstance 'person' in dhis2 for the 'dWOAzMcK2Wt' orgUnit."
}

The CLI command openfn apollo job_expression_generator tmp/data.json -o tmp/output.json would then be used to call the job generation service on the Apollo server and return the desired result.

For job generation on the server, we can create a job_expression_generator service. This service would parse inputs from the .json file and generate the required output. Below is a sample implementation:

from util import DictObj, createLogger

from .utils import (
    generate_job_prompt,
)

from inference import inference


logger = createLogger("job_expression_generator")


class Payload(DictObj):
    api_key: str
    adaptor: str
    signature: str
    data: dict


# Generate job expression based on the input data, adaptor specification, and instructions
def main(dataDict) -> str:
    data = Payload(dataDict)
    logger.info("Running job expression generator with adaptor {}".format(data.adaptor))
    result = generate(data.adaptor_spec, data.instructions, data.sample_input, data.get("api_key"))
    logger.success("Job expression generation complete!")
    return result


def generate(adaptor_spec, instructions, sample_input, key) -> str:
    prompt = generate_job_prompt(adaptor_spec, instructions, sample_input)

    result = inference.generate("gpt3_turbo", prompt, {"key": key})

    return result

The prompt for this might look like:

prompts = {
    "job_expression": (
        "You are a helpful Javascript code assistant.",
        "Below is a description of a task along with the adaptor specification and sample input data. "
        "Generate a JavaScript job expression that performs the task described. Ensure the job expression "
        "follows the conventions defined in the adaptor documentation.\n\n"
        "Adaptor: {adaptor}\n"
        "Instructions: {signature}\n"
        "Sample Input: {sample_input}\n"
        "====",
    ),
}

For testing, we can run this with sample inputs from the CLI, write tests in the Apollo repo itself, or both.

I believe this approach aligns with what you’re looking for. Could you please provide feedback on whether I am on the right track or suggest any improvements? Your guidance would be greatly appreciated.