By Vicente Arteaga Gomez
MisLinux · Last updated: May 5, 2026
This is part of my Kubernetes-on-Hetzner-and-operations series on MisLinux. It is based on real operational browser work for web AI automation and adjacent browser-driven investigation paths, not on demo-only automation.
When a browser automation task gets blocked by login friction, bot protection, or a challenge page, the obvious cost is time.
The less obvious cost is decision quality.
Once the clean automated path is blocked, people often start improvising:
- one manual browser for the challenge
- another browser for the actual task
- partial screenshots
- copied HTML from the wrong frame
- assumptions about what would have happened after the challenge
That is how proof quality degrades.
Why this matters
During an investigation, a browser path is often being used to answer one of these questions:
- what does the real UI show right now?
- which exact button or dialog state matters?
- is the saved HTML the same as the rendered state?
- which route fragment is still valid?
If the automation path is blocked, the investigation has to account for that blocker explicitly instead of pretending the missing evidence is equivalent to a negative result.
The three states I distinguish now
| State | What it means |
|---|---|
| Automated success | The browser reached the intended DOM/runtime path and the evidence is complete |
| Manual bridge needed | A human has to clear a gate, then automation may resume |
| Hard blocker | The session cannot proceed safely or repeatably right now |
That middle state matters a lot. Many investigations are not "blocked forever." They are "blocked until a human clears exactly one step."
What I do instead of brute forcing it
My preferred sequence is:
- preserve the blocker evidence
- launch a separate, clearly scoped headed session if needed
- let the human clear the gate
- resume automation from the now-authenticated session
- save artifacts from the real post-gate state
This keeps the proof chain cleaner than mixing one-off manual browsing with later guesswork.
A command trail that reflects that model
# 1. Launch a separate headed session for the blocked surface
python3 ensure-browser-session.py --port 9230 --headed --target-url https://example.invalid
# 2. Record the blocker artifact first
python3 capture-blocked-state.py --port 9230 --output history/blocked-proof/
# 3. After the manual gate is cleared, resume the structured automation
python3 collect-real-ui-proof.py --port 9230 --output history/post-gate-proof/
The important part is not the exact script names. The important part is the explicit handoff.
In plain language, those example commands do three different jobs:
ensure-browser-session.pystarts a separate browser profile on its own debugging port so the blocked flow does not contaminate the main sessioncapture-blocked-state.pysaves the proof that the current stop point is the challenge page rather than the real application DOMcollect-real-ui-proof.pyresumes structured inspection after the human clears the blocker, so the later screenshots/HTML/runtime notes all come from the real target state
The filenames here are illustrative. The reusable rule is what matters: separate the blocked session, preserve evidence of the blocker, and only then resume the automated collection path.
Failure case: what goes wrong if I skip the handoff model
If I skip that structure, I get mixed evidence:
- screenshots from one session
- HTML from another
- cookies in a third
- conclusions that assume all of them refer to the same state
That is especially dangerous when the system under investigation has expiring auth or anti-bot checks.
Why this matters for AI-assisted work
AI agents are very good at continuing confidently after a blocker page.
That is exactly why I want the blocker to be first-class in the workflow.
The right conclusion is often:
> "I do not have the actual runtime DOM yet. I only have the blocker state."
That is a useful result, but it is not the same result as a successful DOM inspection.
What I'd do differently now
I used to treat browser blockers as an inconvenience outside the "real" work. What I'd do differently now is design the browser workflow around them from the start: separate session, explicit handoff, and artifact-backed resumption. It turns a messy interruption into a real operator state.