Cant run simpe workflow. help me

Hello,

I installed only lightning source in local server ( GitHub - OpenFn/lightning: Lightning ⚡️ is latest version of the OpenFn platform, a DPG and DPI building block that governments use to manage complex service/workflow automation and data integration projects. ). But i cant run simple (or empty) workflows. i attached screenshots. do i need install other sources?

version: v2.14.13-pre1

Status: starting → (enqueued →) lost. system logs: lightning_worker_1 | [SRV] ❯ Connected to worker queue socket
lightning_worker_1 | [SRV] :check_mark: Connected to Lightning at ws://web:4000/worker
lightning_worker_1 | [SRV] :information_source: Starting workloop
lightning_worker_1 | [SRV] ❯ requesting run (capacity 0/5)
lightning_worker_1 | [SRV] ❯ claimed 0 runs in 16ms (-)
lightning_worker_1 | [SRV] ❯ requesting run (capacity 0/5)
lightning_worker_1 | [SRV] ❯ claimed 0 runs in 17ms (-)
lightning_worker_1 | [SRV] ❯ requesting run (capacity 0/5)
lightning_worker_1 | [SRV] ❯ claimed 0 runs in 13ms (-)
lightning_worker_1 | [SRV] ❯ requesting run (capacity 0/5)
lightning_worker_1 | [SRV] ❯ claimed 0 runs in 14ms (-)
lightning_worker_1 | [SRV] ❯ requesting run (capacity 0/5)
lightning_worker_1 | [SRV] ❯ claimed 0 runs in 14ms (-)
lightning_worker_1 | [SRV] ❯ requesting run (capacity 0/5)
lightning_worker_1 | [SRV] ❯ claimed 0 runs in 19ms (-)
lightning_worker_1 | [SRV] ❯ requesting run (capacity 0/5)
lightning_worker_1 | [SRV] ❯ claimed 0 runs in 10ms (-)
lightning_worker_1 | [SRV] ❯ requesting run (capacity 0/5)
lightning_worker_1 | [SRV] ❯ claimed 0 runs in 5ms (-)
lightning_worker_1 | [SRV] ❯ requesting run (capacity 0/5)
lightning_worker_1 | [SRV] ❯ claimed 0 runs in 7ms (-)
lightning_worker_1 | [SRV] ❯ requesting run (capacity 0/5)
lightning_worker_1 | [SRV] ❯ claimed 0 runs in 5ms (-)
lightning_worker_1 | [SRV] ❯ requesting run (capacity 0/5)
lightning_worker_1 | [SRV] ❯ claimed 0 runs in 8ms (-)
lightning_worker_1 | [SRV] ❯ requesting run (capacity 0/5)
lightning_worker_1 | [SRV] ❯ claimed 0 runs in 7ms (-)

Hi @qwerty, thanks for reaching out!

I can see your logs but not your screenshots.

You don’t need to install anything else, your logs look absolutely fine. Can you search or grep for “claimed 1 runs” anywhere in the logs?

Can you tell me more about your local setup? How are you starting Lightning? How are you triggering runs?

Hi, friend? i just attached pics. pls check it.

OS: ubuntu server 24 LTS.

worker cant claim any runs. all attempts got LOST.

docker-compose.yml :

x-lightning: &default-app
build:
dockerfile: Dockerfile
context: ‘.’
args:

  • ‘MIX_ENV=prod’
  • ‘NODE_ENV=production’
    depends_on:
  • ‘postgres’
    restart: ‘unless-stopped’
    stop_grace_period: ‘3s’

services:
postgres:
image: ‘postgres:15.12-alpine’
restart: ‘unless-stopped’
deploy:
resources:
limits:
cpus: ‘${DOCKER_POSTGRES_CPUS:-0}’
memory: ‘${DOCKER_POSTGRES_MEMORY:-0}’
environment:
POSTGRES_USER: ${POSTGRES_USER}
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_DB: ${POSTGRES_DB}
stop_grace_period: ‘3s’
volumes:

  • ‘postgres:/var/lib/postgresql/data’

web:
<<: *default-app
deploy:
resources:
limits:
cpus: ‘${DOCKER_WEB_CPUS:-0}’
memory: ‘${DOCKER_WEB_MEMORY:-0}’
dns: #

  • 8.8.8.8
  • 8.8.4.4
    environment:
    DATABASE_URL: ${DATABASE_URL}
    WORKER_SECRET: ${WORKER_SECRET}
    SECRET_KEY_BASE: ${SECRET_KEY_BASE}
    WORKER_RUNS_PRIVATE_KEY: ${WORKER_RUNS_PRIVATE_KEY}
    PRIMARY_ENCRYPTION_KEY: ${PRIMARY_ENCRYPTION_KEY}
    PHX_HOST: ${PHX_HOST}
    PHX_PROTO: ${PHX_PROTO}
    SERVER_URL: ${SERVER_URL}
    URL_SCHEME: ${URL_SCHEME}
    PORT: ${PORT}
    HOST: ${HOST}
    MIX_ENV: prod
    NODE_ENV: production
    ERLANG_NODE_DISCOVERY_VIA_POSTGRES_ENABLED: ${ERLANG_NODE_DISCOVERY_VIA_POSTGRES_ENABLED}
    depends_on:
  • postgres
    healthcheck:
    test: ‘${DOCKER_WEB_HEALTHCHECK_TEST:-curl localhost:4000/health_check}’
    interval: ‘10s’
    timeout: ‘3s’
    start_period: ‘5s’
    retries: 3
    ports:
  • “0.0.0.0:4000:4000”
    volumes:
  • ./repo:/tmp/openfn/worker/repo
    worker:
    image: ‘openfn/ws-worker:latest’
    user: “0:0”
    restart: always
    #deploy:
    #resources:
    #limits:
    #cpus: ‘${DOCKER_WORKER_CPUS:-0}’
    #memory: ‘${DOCKER_WEB_MEMORY:-0}’
    depends_on:
  • web
    dns:
  • 8.8.8.8
  • 8.8.4.4
    environment:
    DATABASE_URL: ${DATABASE_URL}
    WORKER_SECRET: ${WORKER_SECRET}
    PRIMARY_ENCRYPTION_KEY: ${PRIMARY_ENCRYPTION_KEY}
    MIX_ENV: prod
    NODE_ENV: production
    NODE_OPTIONS: “–dns-result-order=ipv4first”
    command: [‘pnpm’, ‘start:prod’, ‘-l’, ‘ws://web:${PORT@openfntartopenfntart/worker’]
    stop_grace_period: ‘3s’
    expose:
  • ‘2222’

volumes:
postgres: {}

Are you able to isolate and attach a complete log for just the worker’s output? Or a complete log of everything would still help.

It takes an hour for a run to be marked as Lost (we do that when we worker times out, basically), but I don’t need an hour of logs. Just a minute or so around the triggering of a run.

job is simple: name: workflow1
jobs:
Transform-data:
name: Transform data
adaptor: “@openfn/language-common@latest”
body: |-
// Check out the Job Writing Guide for help getting started:
// Job Writing Guide | OpenFn/docs
fn(state => {
console.log(“:white_check_mark: Test job running..123!”);
return { success: true, timestamp: new Date().toISOString() };
});
triggers:
webhook:
type: webhook
enabled: true
edges:
webhook->Transform-data:
condition_type: always
enabled: true
target_job: Transform-data
source_trigger: webhook

after several minutes its got LOST. it cant started.

its worker log:

~/lightning# docker compose logs -f worker
worker-1 |
worker-1 | > @openfn/ws-worker@1.18.0 start:prod /app/packages/ws-worker
worker-1 | > node dist/start.js -l ws://web:4000/worker
worker-1 |
worker-1 | [SRV] :information_source: Starting worker server…
worker-1 | [SRV] ❯ Creating runtime engine…
worker-1 | [SRV] ❯ Engine options: {
worker-1 | “memoryLimitMb”: 500,
worker-1 | “maxWorkers”: 5,
worker-1 | “statePropsToRemove”: [
worker-1 | “configuration”,
worker-1 | “response”
worker-1 | ],
worker-1 | “runTimeoutMs”: 300000,
worker-1 | “workerValidationTimeout”: 5000,
worker-1 | “workerValidationRetries”: 3
worker-1 | }
worker-1 | [RTE] :warning: Using default repo directory: /tmp/openfn/worker/repo
worker-1 | [RTE] :information_source: repoDir set to /tmp/openfn/worker/repo
worker-1 | [RTE] :information_source: memory limit set to 500mb
worker-1 | [RTE] :information_source: statePropsToRemove set to: [
worker-1 | “configuration”,
worker-1 | “response”
worker-1 | ]
worker-1 | [RTE] ❯ Loading workers from /app/packages/engine-multi/dist/worker/thread/run.js
worker-1 | [RTE] ❯ pool: Creating new child process pool | capacity: 5
worker-1 | [RTE] ❯ pool: Created new child process 25
worker-1 | [RTE] ❯ pool: finished task in worker 25
worker-1 | [RTE] ❯ Engine worker validated in 1508ms
worker-1 | [SRV] ❯ Engine created!
worker-1 | [SRV] ❯ Creating worker instance
worker-1 | [SRV] :warning: WARNING: deprecated socketTimeoutSeconds value passed.
worker-1 |
worker-1 | This will be respected as the default socket timeout value, but will be removed from future versions of the worker.
worker-1 | [SRV] ❯ Worker options: {
worker-1 | “port”: 2222,
worker-1 | “lightning”: “ws://web:4000/worker”,
worker-1 | “sentryEnv”: “dev”,
worker-1 | “noLoop”: false,
worker-1 | “backoff”: {
worker-1 | “min”: 1000,
worker-1 | “max”: 10000
worker-1 | },
worker-1 | “maxWorkflows”: 5,
worker-1 | “payloadLimitMb”: 10,
worker-1 | “messageTimeoutSeconds”: 30,
worker-1 | “claimTimeoutSeconds”: 3600
worker-1 | }
worker-1 | [SRV] :check_mark: Worker cute-toes-fall listening on 2222
worker-1 | [SRV] :warning: WARNING: no collections URL provided. Collections service will not be enabled.
worker-1 | [SRV] :warning: Pass --collections-url or set WORKER_COLLECTIONS_URL to set the url
worker-1 | [SRV] :check_mark: Worker started OK
worker-1 | [SRV] ❯ Connecting to Lightning at ws://web:4000/worker
worker-1 | [SRV] ❯ Reporting connection error to sentry
worker-1 | [SRV] ✘ CRITICAL ERROR: could not connect to lightning at ws://web:4000/worker
worker-1 | [SRV] ❯ connect ECONNREFUSED 172.21.0.3:4000
worker-1 | [SRV] ❯ queue socket closed
worker-1 | [SRV] :information_source: Connection to lightning lost
worker-1 | [SRV] :information_source: Worker will automatically reconnect when lightning is back online
worker-1 | [SRV] ❯ Connected to worker queue socket
worker-1 | [SRV] :check_mark: Connected to Lightning at ws://web:4000/worker
worker-1 | [SRV] :information_source: Starting workloop
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ claimed 0 runs in 57ms (-)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ claimed 0 runs in 21ms (-)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ claimed 0 runs in 6ms (-)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ claimed 0 runs in 22ms (-)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ claimed 0 runs in 22ms (-)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ claimed 0 runs in 19ms (-)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ claimed 0 runs in 23ms (-)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ claimed 0 runs in 21ms (-)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ claimed 0 runs in 10ms (-)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ claimed 0 runs in 6ms (-)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ claimed 0 runs in 7ms (-)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ claimed 0 runs in 6ms (-)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ Connected to worker queue socket
worker-1 | [SRV] :check_mark: Connected to Lightning at ws://web:4000/worker
worker-1 | [SRV] ❯ claimed 0 runs in 828ms (-)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ claimed 0 runs in 8ms (-)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ claimed 0 runs in 14ms (-)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ claimed 0 runs in 7ms (-)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ claimed 0 runs in 8ms (-)

I’ve never run the app out of docker myself, but this looks weird to me:

worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ requesting run (capacity 0/5)
worker-1 | [SRV] ❯ Connected to worker queue socket
worker-1 | [SRV] :check_mark: Connected to Lightning at ws://web:4000/worker
worker-1 | [SRV] ❯ claimed 0 runs in 828ms (-)

Every requesting run log should be followed by a claimed N runs log. But here, it’s like one of the requests failed - presumably while downloading a run - and the connection to Lightning dropped.

Can’t think why that would happen - it’s not something I’ve seen before

Are you able to isolate the corresponding Lightning logs for me? We should see the same request/claim cycle, and I wonder if there’s any error or warning from the Lightning side

Hello, I attached some logs of lightning after failed run. pls check it. And .env file.

i edited ./config/runtime.exs file before built lightning:

~/lightning/config# cat runtime.exs

Lightning.Config.Bootstrap.source_envs()

Lightning.Config.Bootstrap.configure()

import Config

env = System.get_env(“MIX_ENV”) || “prod”

if env == “prod” do

host = System.get_env(“HOST”) || “66.42.57.64”

port = String.to_integer(System.get_env(“PORT”) || “4000”)

config :lightning, LightningWeb.Endpoint,

http: \[ip: {0, 0, 0, 0}, port: port\],

url: \[host: host, port: port, scheme: "http"\],

server: true,

check_origin: \["http://#{host}:#{port}"\]

# Worker socket

#config :lightning, Lightning.Runtime.RuntimeManager,

#start: true

# Worker secret

config :lightning,

worker_secret: System.get_env("WORKER_SECRET")

end

I think this may be because WORKER_RUNS_PRIVATE_KEY is unset (or set to the wrong value)

In your runtime.exs can you ensure that the private_key is set? Take a look at config/dev.exs for an example

1 Like

The keys stored in .env file. I attached env_file file.

Here:

WORKER

RUNS

PRIVATE

KEY=937ac25f760759bf51576ee889f8f16b371bdea04495d58fa521935a32456bce

i just set private_key in runteme.exs. its working. thank you friend.

1 Like

That’s great news @qwerty !

We’re investigating the issue and trying to work out where there’s no good clean error in this case. It shouldn’t have been so hard to diagnose!