How to Test an Automation Before You Trust It
Most automations fail the second week, not the first. The first run works because you watched it happen with clean test data. Then something shifts upstream, an edge case shows up, or the API returns a shape you didn't expect, and now you have eleven duplicate invoices and a customer asking why they got the same email three times.
The gap between "it ran once" and "I trust it to run unattended" is real, and you bridge it with deliberate testing — not hope. Here's the structure we use on every build.
Phase 1: Dry Run (No Side Effects)
A dry run means the automation executes its full logic but writes nothing, sends nothing, and charges nothing. Every external action — emails, API writes, database updates, Slack messages — is either stubbed out or routed to a sandbox.
The goal is to verify the logic, not the integration. You want to see:
- Does the workflow trigger when it should?
- Does it pull the right data?
- Does it route to the right branches based on conditions?
- Does it produce the output payload you expect?
Write down what you expect each record to do before you run it. If a record should hit the "skip" branch because the customer is on a paid plan, predict that, then verify. If your predictions match reality across the sample, you've earned the right to move on. If they don't, find out why before you touch anything live.
Phase 2: Shadow Mode (Runs Alongside the Human)
Shadow mode is where most teams skip a step and pay for it later. The automation runs in production, on real data, with real side effects turned off — but the human is still doing the work manually in parallel. You compare outputs.
For a week or two, the automation generates what it would have sent, written, or decided, and logs it somewhere you can review. Maybe that's a Google Sheet, a Notion database, or a Slack channel that gets a message every time the workflow fires. The human keeps doing their job. At the end of each day, you spot-check: did the automation reach the same decision the human did? When it didn't, who was right?
This phase catches three things dry runs miss:
Shadow mode is also where you build trust with the person whose work is being automated. If they can see the automation matching their judgment for ten straight days, the handoff is a conversation, not a fight.
Phase 3: Supervised Live (Real Actions, Human Reviewing)
Now the automation writes, sends, and charges for real — but a human reviews every output before it goes out, or reviews a batch at the end of each day. This is the last checkpoint before unattended operation.
The practical setup looks like one of these:
- Approval queues. The workflow generates the draft email, but it sits in a queue until someone clicks approve. After two weeks of approving 95% without changes, you remove the gate.
- Daily review. The workflow runs autonomously, but every morning someone scans yesterday's actions and flags anything weird.
- Sampling. For high-volume workflows, review a random 10% of outputs each day. If error rate stays under your threshold for a defined period, you reduce sampling.
Probe the Failure Modes Deliberately
During shadow and supervised phases, don't just wait for problems to find you. Provoke them. Three categories of failure you should actively test:
Malformed input. Submit a form with an empty required field. Paste an email address with a typo. Upload a CSV with a missing column. Send a record where the date is in the wrong format. The question isn't whether your automation handles perfect data — it's what it does when it gets garbage. Does it fail loudly? Does it silently skip? Does it crash and leave the workflow in a half-done state? You want loud, logged failures with a clear error you can read on Monday morning.
Upstream change. Have someone rename a field in your CRM. Change a tag. Add a new option to a dropdown. Real businesses change things constantly, and a workflow that breaks the moment someone renames "Status" to "Lead Status" is not production-ready. The fix is either defensive code (don't hard-reference field names where you can avoid it) or alerting (you find out within an hour, not when a customer complains).
API down. Block the workflow's network access to one of its integrations, or pick a real outage when one happens. Does the workflow retry? Does it queue the work and resume? Does it drop the event silently? Does it spam your error channel with 4,000 identical messages? Each of these tells you something different about what you need to fix.
Write these tests down. Run them on day one and run them again every quarter — because the answers change as you add new steps.
What "Production Ready" Actually Means
A workflow is production ready when four things are true:
If you can't check all four, you don't have an automation. You have a script that worked once.
The whole sequence — dry run, shadow, supervised — usually takes two to four weeks for a meaningful workflow. That sounds slow until you compare it to the time you'll spend cleaning up a workflow that went live on vibes and broke quietly for a month.
If you're standing up automations and want a second set of eyes on the test plan before you go live, here's how we work.
Need help implementing this?
We build these systems for small businesses and hand you the keys. Book a free discovery call — no sales pressure.
Book a Discovery Call