Hi team,
We’re currently experiencing frequent lost runs on our OpenFn v2.10.7
instance deployed in OpenShift.
From our logs, here are the consistent patterns:
DBConnection.ConnectionError
due to connection refused
and queue timeout
errors.
- Oban job queue workers (e.g.,
history_exports
, workflow_failures
, scheduler
, background
) repeatedly fail to start with:
** (stop) exited in: :gen_statem.call ... (EXIT) time out
- Connection pool exhaustion:
connection not available and request was dropped from queue after 1999ms...
- System memory high watermark alarms are being raised and cleared repeatedly:
:alarm_handler: {:set, {:system_memory_high_watermark, []}}
Could you kindly advise on help resolving this.
Thanks in advance for your support!
Welcome to the community @Kenyuri ! I’ve just sent you an email… we’ll need some more info to diagnose and resolve the issues you’re having but I’ve already alerted our support team and I’m sure we can sort you out quickly 
For the record here: If this is a production deployment or something you plan to take to scale, we’d recommend having a support contract in place with the OpenFn.org core team for two reasons:
- There’s nobody better positioned to provide support across multiple deployment architectures. If you’re planning on taking this to scale for an NGO or governments, they’ll likely want to de-risk the project by ensuring you’re following best practices for security, stability, and scale and that you have T3/T4 support in place for escalating issues you can’t resolve yourself.
- By contracting the core team for support, you’re also de-risking the country’s investment in this particular digital public good. A support contract with OFG makes sure that the core DPG you’re relying on continues to be invested in—that new versions, bug fixes, and security patches are rolled out in a timely manner.
Let’s be in touch and make sure to update this thread when we have a resolution so others can benefit from the fix, if applicable.
1 Like