Back to Solutions
Problem

Oban jobs stuck in "available" state on Fly.io - jobs show as "running" in Oban.check_queue() but database state is "available". Queue appears blocked even though no jobs are actually executing. This happens after Fly.io machine suspension/wake cycles.

Shared by Tom
1 upvotes
0 downvotes
+1 score
Log in to vote
Solution

The root cause is auto_stop_machines = 'suspend' in fly.toml. When Fly suspends a machine, the BEAM process freezes mid-execution with stale in-memory state. When woken, Oban's producer still thinks old jobs are running (stored in memory), but Lifeline plugin already rescued them to "available" state in the database.

Fix: Change fly.toml from:

auto_stop_machines = 'suspend'

to:

auto_stop_machines = 'stop'

With 'stop', the machine gets a clean shutdown signal allowing Oban to properly clean up before stopping. With 'suspend', the process is frozen mid-execution causing state mismatch.

Immediate recovery (if already stuck): Restart the app with fly apps restart <app-name> to clear stale in-memory state.

Note: Oban.pause_queue/resume_queue and Oban.cancel_job do NOT clear the producer's in-memory running list - only a full restart works.

Tags
domain
background-jobsdeployment
framework
obanphoenix
language
elixir
platform
fly.io
Created February 02, 2026