How do I test an automation before trusting it in production?

Run it in shadow mode against real data without taking action, compare its output to the manual result, and only then let it act.

What is shadow mode for automation?

The workflow runs on live inputs and produces what it would do, but takes no real action, so you can verify accuracy with zero risk.

What failure modes should I test for?

Bad or missing inputs, duplicate triggers, downstream outages, and edge cases the happy path ignores, since those are where automations break.

How long should I test before going live?

Long enough to see the workflow handle the real range of cases, including the messy ones, not just a few clean test runs.

How to Test an Automation Before You Trust It

Most automations fail the second week, not the first. The first run works because you watched it happen with clean test data. Then something shifts upstream, an edge case shows up, or the API returns a shape you didn't expect, and now you have eleven duplicate invoices and a customer asking why they got the same email three times.

The gap between "it ran once" and "I trust it to run unattended" is real, and you bridge it with deliberate testing — not hope. Here's the structure we use on every build.

Phase 1: Dry Run (No Side Effects)#

A dry run means the automation executes its full logic but writes nothing, sends nothing, and charges nothing. Every external action — emails, API writes, database updates, Slack messages — is either stubbed out or routed to a sandbox.

The goal is to verify the logic, not the integration. You want to see:

Does the workflow trigger when it should?
Does it pull the right data?
Does it route to the right branches based on conditions?
Does it produce the output payload you expect?

Most automation platforms (n8n, Make, Zapier, Pipedream) let you inspect the data at each step without executing the final action. Use that. Run the workflow against ten or fifteen real records pulled from production. Not made-up test data — actual records, copied into a sandbox. Synthetic data hides the messiness that breaks things in week two.

Write down what you expect each record to do before you run it. If a record should hit the "skip" branch because the customer is on a paid plan, predict that, then verify. If your predictions match reality across the sample, you've earned the right to move on. If they don't, find out why before you touch anything live.

Phase 2: Shadow Mode (Runs Alongside the Human)#

Shadow mode is where most teams skip a step and pay for it later. The automation runs in production, on real data, with real side effects turned off — but the human is still doing the work manually in parallel. You compare outputs.

For a week or two, the automation generates what it would have sent, written, or decided, and logs it somewhere you can review. Maybe that's a Google Sheet, a Notion database, or a Slack channel that gets a message every time the workflow fires. The human keeps doing their job. At the end of each day, you spot-check: did the automation reach the same decision the human did? When it didn't, who was right?

This phase catches three things dry runs miss:

Volume issues. Maybe the workflow handles one record fine but chokes on 400 in an hour.
Real-world data variance. Customers do strange things. Forms get submitted with emoji in the name field. Phone numbers come in seven different formats.
Timing and ordering. When two events fire within a second of each other, what happens?

Shadow mode is also where you build trust with the person whose work is being automated. If they can see the automation matching their judgment for ten straight days, the handoff is a conversation, not a fight.

Phase 3: Supervised Live (Real Actions, Human Reviewing)#

Now the automation writes, sends, and charges for real — but a human reviews every output before it goes out, or reviews a batch at the end of each day. This is the last checkpoint before unattended operation.

The practical setup looks like one of these:

Approval queues. The workflow generates the draft email, but it sits in a queue until someone clicks approve. After two weeks of approving 95% without changes, you remove the gate.
Daily review. The workflow runs autonomously, but every morning someone scans yesterday's actions and flags anything weird.
Sampling. For high-volume workflows, review a random 10% of outputs each day. If error rate stays under your threshold for a defined period, you reduce sampling.

Decide upfront what "good enough to stop supervising" looks like. A specific number. "Two consecutive weeks with zero corrections required" is a real bar. "It feels fine now" is not.

Probe the Failure Modes Deliberately#

During shadow and supervised phases, don't just wait for problems to find you. Provoke them. Three categories of failure you should actively test:

Malformed input. Submit a form with an empty required field. Paste an email address with a typo. Upload a CSV with a missing column. Send a record where the date is in the wrong format. The question isn't whether your automation handles perfect data — it's what it does when it gets garbage. Does it fail loudly? Does it silently skip? Does it crash and leave the workflow in a half-done state? You want loud, logged failures with a clear error you can read on Monday morning.

Upstream change. Have someone rename a field in your CRM. Change a tag. Add a new option to a dropdown. Real businesses change things constantly, and a workflow that breaks the moment someone renames "Status" to "Lead Status" is not production-ready. The fix is either defensive code (don't hard-reference field names where you can avoid it) or alerting (you find out within an hour, not when a customer complains).

API down. Block the workflow's network access to one of its integrations, or pick a real outage when one happens. Does the workflow retry? Does it queue the work and resume? Does it drop the event silently? Does it spam your error channel with 4,000 identical messages? Each of these tells you something different about what you need to fix.

Write these tests down. Run them on day one and run them again every quarter — because the answers change as you add new steps.

What "Production Ready" Actually Means#

A workflow is production ready when four things are true:

It's run on real data, at real volume, with outputs validated against a human baseline.
You know how it fails — not theoretically, because you've made it fail and watched what happens.
There's logging or alerting that tells you within a defined window when something's wrong.
There's a documented rollback: how to turn it off, how to clean up partial work, who owns the decision.

If you can't check all four, you don't have an automation. You have a script that worked once.

The whole sequence — dry run, shadow, supervised — usually takes two to four weeks for a meaningful workflow. That sounds slow until you compare it to the time you'll spend cleaning up a workflow that went live on vibes and broke quietly for a month.

If you're standing up automations and want a second set of eyes on the test plan before you go live, here's how we work.

The gap between "it ran once" and "I trust it to run unattended" is real, and you bridge it with deliberate testing — not hope. Here's the structure we use on every build.

Phase 1: Dry Run (No Side Effects)#

The goal is to verify the logic, not the integration. You want to see:

Does the workflow trigger when it should?
Does it pull the right data?
Does it route to the right branches based on conditions?
Does it produce the output payload you expect?

Phase 2: Shadow Mode (Runs Alongside the Human)#

This phase catches three things dry runs miss:

Volume issues. Maybe the workflow handles one record fine but chokes on 400 in an hour.
Real-world data variance. Customers do strange things. Forms get submitted with emoji in the name field. Phone numbers come in seven different formats.
Timing and ordering. When two events fire within a second of each other, what happens?

Phase 3: Supervised Live (Real Actions, Human Reviewing)#

The practical setup looks like one of these:

Approval queues. The workflow generates the draft email, but it sits in a queue until someone clicks approve. After two weeks of approving 95% without changes, you remove the gate.
Daily review. The workflow runs autonomously, but every morning someone scans yesterday's actions and flags anything weird.
Sampling. For high-volume workflows, review a random 10% of outputs each day. If error rate stays under your threshold for a defined period, you reduce sampling.

Decide upfront what "good enough to stop supervising" looks like. A specific number. "Two consecutive weeks with zero corrections required" is a real bar. "It feels fine now" is not.

Probe the Failure Modes Deliberately#

During shadow and supervised phases, don't just wait for problems to find you. Provoke them. Three categories of failure you should actively test:

Write these tests down. Run them on day one and run them again every quarter — because the answers change as you add new steps.

What "Production Ready" Actually Means#

A workflow is production ready when four things are true:

It's run on real data, at real volume, with outputs validated against a human baseline.
You know how it fails — not theoretically, because you've made it fail and watched what happens.
There's logging or alerting that tells you within a defined window when something's wrong.
There's a documented rollback: how to turn it off, how to clean up partial work, who owns the decision.

If you can't check all four, you don't have an automation. You have a script that worked once.

If you're standing up automations and want a second set of eyes on the test plan before you go live, here's how we work.

How to Test an Automation Before You Trust It

Phase 1: Dry Run (No Side Effects)#

Phase 2: Shadow Mode (Runs Alongside the Human)#

Phase 3: Supervised Live (Real Actions, Human Reviewing)#

Probe the Failure Modes Deliberately#

What "Production Ready" Actually Means#

Need help implementing this?

Frequently asked questions

More insights

Why Your First Automation Should Be the Most Annoying Task

The Handoff Checklist: What 100% Ownership of an Automation Should Include

Connecting Your CRM, Scheduler, and Inbox So Data Moves on Its Own

Get one of these every Wednesday

How to Test an Automation Before You Trust It

Phase 1: Dry Run (No Side Effects)#

Phase 2: Shadow Mode (Runs Alongside the Human)#

Phase 3: Supervised Live (Real Actions, Human Reviewing)#

Probe the Failure Modes Deliberately#

What "Production Ready" Actually Means#

Need help implementing this?

Frequently asked questions

More insights

Why Your First Automation Should Be the Most Annoying Task

The Handoff Checklist: What 100% Ownership of an Automation Should Include

Connecting Your CRM, Scheduler, and Inbox So Data Moves on Its Own

Get one of these every Wednesday