I spent hours debugging a test that worked perfectly in production but exploded in the test environment. The workflow ran fine. The activities completed. The signals were correct. But every single test failed with a TimeoutError — the workflow would just… die, before my test code even had a chance to interact with it.
The culprit was a clock I didn’t know existed.
This post is about the mental model that finally made Temporal’s time-skipping test server make sense to me. If you’ve ever been confused by WorkflowEnvironment.start_time_skipping(), or had tests fail mysteriously with timeouts that don’t happen in production, this is for you.
What is Temporal? (The 30-Second Version)
Temporal is a workflow orchestration engine. You define your business logic as a “workflow” — a sequence of steps — and Temporal takes care of running it reliably. If a step fails, Temporal retries it. If your server crashes, Temporal picks up where it left off. It handles all the ugly stuff: retries, timeouts, state persistence, distributed coordination.
The important thing for this post: Temporal runs your workflow inside its own runtime environment. Your workflow code doesn’t just execute like a normal Python function. It runs inside Temporal, and Temporal manages when things happen.
That distinction — “runs inside Temporal” — is the root of everything that follows.
Temporal is a Timekeeper
Here’s the insight that changed everything for me: Temporal doesn’t skip your code. It only controls its own clock.
I had been thinking about Temporal as some kind of execution engine that speeds up my code. It’s not. Temporal is a timekeeper. Think of it like a kernel — a central dispatcher that keeps a timetable of scheduled events.
When your workflow calls workflow.sleep(3600) (sleep for one hour), Temporal doesn’t somehow make your Python code run faster. What it does is add an entry to its internal timetable:
"Wake up workflow ABC at current_time + 3600 seconds"
When your workflow starts an activity with a 30-second timeout, Temporal adds:
"If activity XYZ hasn't returned by current_time + 30 seconds, fail it"
When your workflow has an execution timeout of 10 minutes:
"If workflow ABC isn't done by current_time + 600 seconds, kill it"
Sleeps, timeouts, activity deadlines, heartbeat intervals — from Temporal’s perspective, they’re all the same thing. They’re entries in a timetable. Each one says: “fire event X at time T.” The only difference is what event gets fired — resume the workflow, fail an activity, kill the whole thing. But the mechanism is identical: a timestamp and an action.
This is the foundation for understanding time-skipping.
Time-Skipping: Fast-Forwarding Idle Time
Temporal’s Python SDK gives you two test environments:
WorkflowEnvironment.start_time_skipping()— downloads a lightweight Rust-based test server binary that runs in-process (no Docker needed). This server has a virtual clock that can jump forward.WorkflowEnvironment.start_local()— runs a full Temporal server with a real clock, just like production.
The time-skipping server does one clever thing: when nothing is happening, it fast-forwards its clock to the next scheduled event.
Imagine your workflow does this:
await workflow.sleep(3600) # sleep 1 hour
await do_some_activity()
await workflow.sleep(7200) # sleep 2 hours
In production, this takes 3 hours of wall-clock time (plus however long the activity takes). With time-skipping, here’s what happens:
- Workflow calls
sleep(3600). Temporal adds to its timetable: “resume at now + 3600s.” - Nothing else is pending. The server jumps its clock forward 3600 seconds instantly.
- Timer fires. Workflow resumes. Activity starts.
- Activity is running — the server does not skip, because it’s waiting for a real result.
- Activity completes. Workflow calls
sleep(7200). Timetable entry added. - Nothing pending. Server jumps forward 7200 seconds.
- Timer fires. Workflow finishes.
Total real time: however long the activity took (maybe milliseconds if it’s a mock). The 3 hours of sleeping? Gone. Skipped.
This leads to a really important corollary that solidified my understanding:
If your code never sleeps, never waits, and never sets timeouts, time-skipping gives you zero speedup.
Think about it. Time-skipping only fast-forwards idle time on Temporal’s clock. If there’s no idle time — no timers, no sleeps, no timeouts — there’s nothing to skip. Your workflow would run at the exact same speed with time-skipping as without it.
Of course, that’s a theoretical extreme. Real workflows almost always have timeouts, retry intervals, and sleeps. But the principle is clarifying: time-skipping is not about making your code faster. It’s about eliminating wait time between events on Temporal’s timetable.
The Handle: Your Remote Control
When you start a workflow, Temporal gives you back a handle. Think of it like a restaurant ticket — you placed your order (started the workflow), and now you have a ticket to interact with it while it’s being prepared.
handle = await client.start_workflow(
MyWorkflow.run,
inputs,
id="order-123",
task_queue="kitchen",
)
The workflow is now running independently inside Temporal’s runtime. The handle is your remote control. It has a few buttons:
handle.result()— “Call me when my food is ready.” You sit on the phone, waiting, until the workflow finishes and gives you the result.handle.query()— “Hey, how’s my order coming?” A quick question. You get an answer immediately and hang up.handle.signal()— “Actually, add extra cheese.” You send a message to the running workflow. It gets delivered immediately.handle.cancel()— “Cancel my order.”
The handle is just temporalio.client.WorkflowHandle — nothing magical. But understanding what each button does to the time-skipping server is where it gets interesting.
result() vs query() — The Trap
This is where my intuition was completely backwards, and getting it right is what finally unblocked me.
handle.result() is a phone call where you say: “Don’t hang up until my food is ready.” You’re blocking. You’re waiting. And critically, you’re telling the time-skipping server: “I’m waiting for this workflow to finish.” The server hears that and thinks: “They want the result. Let me help by fast-forwarding to when it’s done.”
handle.query() is a phone call where you say: “Is my food ready? No? Okay, bye.” One shot. You get the current state, and then you’re done. The server has no reason to fast-forward anything — you didn’t ask it to wait for anything.
Here’s the part that tripped me up: result() is event-based (wait for the “done” event), and query() is polling (check the current state and return immediately). You’d think the event-based approach is the “better” one — and in normal programming, it usually is. But in time-skipping mode, the polling approach is safer, because it doesn’t give Temporal permission to mess with its clock.
| Operation | What it says to Temporal | Triggers time-skipping? |
|---|---|---|
handle.result() | “Wake me up when it’s done” | Yes — server tries to make it be done |
handle.query() | “What’s the status right now?” | No — server just answers |
handle.signal() | “Deliver this message” | No — immediate delivery |
That middle column is the whole story. result() gives the server license to fast-forward. query() doesn’t.
The Race Condition: Death by Time Travel
Now you have all the pieces. Let me show you how they combine to create a very confusing bug.
The Setup
Our workflow has steps that need user feedback. After a step runs (say, analyzing a document), the workflow pauses and waits for the user to review the output and click “Continue.” In production, this might take minutes or hours — the user is reading, thinking, editing.
In the code, this waiting looks like:
# backend/src/genai/temporal/workflow_executor.py, line 149
await workflow.wait_condition(lambda: waiter.signal_received)
This tells Temporal: “Pause here until signal_received becomes True.” There’s no timeout — it waits as long as it takes. In production, that’s fine. The user eventually clicks Continue, a signal is sent, signal_received flips to True, and the workflow resumes.
The Exam Analogy
Imagine you’re a teacher proctoring an exam with a 2-hour time limit. You have a clock on the wall.
Production (real clock): A student raises their hand. “I need my calculator from my locker.” You wait. Someone brings it. The student finishes the exam. The clock says 45 minutes passed. No problem.
Time-skipping (magic clock): A student raises their hand. “I need my calculator from my locker.” You look around the room. Nobody is actively writing. Nothing is happening. So you spin the magic clock forward — 30 minutes, 1 hour, 1.5 hours, 2 hours. “Time’s up! Exam over!” The student fails.
The person bringing the calculator walks in 0.1 real seconds later. But the clock already says 2 hours. Too late.
What Actually Happened
Here’s the exact sequence in our tests:
Real time 0.00s — Test starts the workflow. Mock activities return instantly (they’re fakes — no real I/O).
Real time ~0.02s — All activities are done. Workflow enters wait_condition() — waiting for a feedback signal. At this moment, Temporal’s timetable has nothing pending: no activities running, no timers set. Just a workflow sitting at a wait_condition.
Real time ~0.02s — Test calls handle.result(): “Tell me when the workflow finishes.”
The time-skipping server hears this and thinks: “The client wants the result. Let me check what’s pending… No activities. No timers. Nothing to wait for except a wait_condition I can’t satisfy. But there IS a workflow execution timeout at now + 600 seconds. Let me jump there.”
Server clock jumps: 0s → 600s.
Workflow execution timeout fires. Workflow dies with TimeoutError.
Real time ~0.03s — handle.result() returns with an error.
Meanwhile, the test had a polling loop that was supposed to query for pending feedback steps and send signals. That loop was scheduled to run after asyncio.sleep(0.1) — at real time 0.1 seconds. But the workflow is already dead. The signals arrive at a corpse.
The whole thing happened in ~30 milliseconds of real time. The server just… jumped to the end.
The Fix
The fix is almost anticlimactic once you understand the problem. Don’t call handle.result() while the workflow is waiting for signals. Instead, use handle.query() to poll, send signals when needed, and only call result() after confirming the workflow is already done:
# backend/tests/integration/genai/test_temporal_workflow.py, lines 49-77
async def _run_workflow_with_feedback(handle):
while True:
await asyncio.sleep(0.1) # real-time sleep in the test process
# query() doesn't trigger time-skipping — clock stays frozen
progress = await handle.query(JurorAnalysisWorkflow.get_progress)
# Send signals for any steps waiting for feedback
for step_id in progress.pending_feedback_steps:
await handle.signal(
JurorAnalysisWorkflow.submit_step_feedback,
StepFeedbackSignal(release_step_id=step_id),
)
# Only call result() AFTER the workflow reports it's done
if progress.status == WorkflowStatus.COMPLETED:
return await handle.result() # returns instantly — nothing to skip
Why this works:
query()doesn’t touch the clock. The workflow stays frozen at itswait_condition. No fast-forwarding.- Our polling loop runs in real time.
asyncio.sleep(0.1)is a real sleep in the test process — Temporal doesn’t control it. - Signals are delivered immediately regardless of what Temporal’s clock says.
- By the time we call
result(), the workflow is already complete. There’s nothing to fast-forward to. It returns instantly.
The workflow’s clock never jumps because we never gave the server permission to jump it.
Takeaways
A few rules of thumb I’m keeping in my back pocket:
The mental model:
- Temporal is a timekeeper, not an execution engine. It manages a timetable of timers, timeouts, and activity deadlines.
- Time-skipping fast-forwards idle time on Temporal’s clock. It doesn’t speed up your code.
- If there’s nothing on the timetable to skip to, there’s nothing to skip.
The practical rules:
- Never
await handle.result()on a workflow that’s waiting for external signals. It gives Temporal permission to fast-forward, and the workflow will die before your signals arrive. - Use
handle.query()to poll workflow state. Queries are passive reads — they don’t affect the clock. - Signals work immediately in any time mode. They don’t care what the server clock says.
- Only call
handle.result()after you’ve confirmed the workflow is done viaquery().
When to use which test environment:
start_time_skipping()— great for workflows that are self-contained (just timers and activities, no external interaction). Also works for signal-based workflows if you use the query-based polling pattern.start_local()— safer for workflows that require signals/queries during execution, since it runs in real-time. But slower, because timers actually wait.
We kept start_time_skipping() because the query-based pattern works correctly, and we get free speedup on any timer-based operations (like activity retry intervals and jitter sleeps).
Disclaimer: Written by Human, improved using AI where applicable.
