WaybillAgent

Walk the warehouse, Claude does the audit.

WaybillAgent transforms warehouse auditing from a multi-day manual process into an AI-assisted guided walk using phone capture and agentic reconciliation—flagship build for the Built with Opus 4.7 hackathon hosted by Cerebral Valley and Anthropic (selected top ~500 of 13,000+ applicants).

Stack

Claude Opus 4.7

Claude Managed Agents

TypeScript

Next.js

Supabase

Vercel

Computer Vision/OCR

Proof metrics

Hackathon Cohort

500 / ~13,000 (Top 3.8%)

Audit Cycle

2 weeks -> ~40 minutes

Device Shift

$2,000 scanner -> phone/glasses workflow

Evaluation scorecard

Production eval dimensions — how this system is judged before and after changes ship.

Label OCR accuracy

Eval harness in place

Test set of damaged, faded, and angled warehouse labels from real captures

Variance detection

Eval harness in place

Structured reconciliation cases against ERP master data

Session resume success

Eval harness in place

Long-horizon walk sessions with intentional interruption and retry

Cost per audit walk

Tracked in production

Selective high-effort reasoning only on variance paths; routine OCR stays cost-efficient

Problem

Warehouse audits often run for weeks with multiple field staff and heavy manual reconciliation in spreadsheets.

Damaged, faded, or angled labels fail frequently on traditional handheld scanners, creating repeated variance loops.

Audit workflows require sustained context across long sessions, not isolated one-shot API calls.

Solution

Built a stateful agent workflow for end-to-end warehouse walk sessions with resumable progress.

Used high-fidelity model vision to extract bin and label data from low-quality real-world images.

Applied selective high-effort reasoning only for variance classification while keeping routine OCR paths cost-efficient.

Added self-verification before report output to improve confidence for enterprise audit handoff.

Technical deep-dive

Vision resolution is the unlock

Opus 4.7 shipped roughly 3x improved vision resolution, and that single capability is what makes this product possible rather than merely interesting. It reads bin labels that are torn, faded, glare-affected, or shot at awkward angles — precisely the labels industrial handhelds reject.

In testing it extracted bin codes from labels I could barely parse by eye. That collapses a $2,000 specialised scanner into a phone already in the operator's pocket, which is the difference between a capital purchase and a download.

This matters more in a working godown than a spec sheet suggests. Most labels in a real warehouse are damaged, and every rejected scan is a manual re-key that becomes a variance loop next quarter.

Long-horizon execution with Claude Managed Agents

A warehouse walk is not one API call. It is a 30-to-60 minute stateful session spanning dozens of tool calls and continuous reconciliation against ERP master data.

Claude Managed Agents runs the entire walk as one continuous session that survives network drops and resumes cleanly — a hard requirement in a warehouse, where connectivity is unreliable by default. Each workflow state (capture, extract, lookup, reconcile, tag variance, report) has explicit entry and exit criteria plus persistence, so an operator can stop mid-aisle and continue without losing context.

Effort tiers and predictable unit cost

Variance classification is the cognitively hard part: deciding whether a discrepancy is a miscount, a wrong item, or a label error genuinely benefits from extended reasoning. Those calls run on the xhigh effort tier.

Routine OCR runs on default effort. Combined with task budgets, that split keeps each aisle audit predictably priced instead of letting a long session drift into open-ended spend — the difference between a demo and something with a defensible cost per audit.

Self-verification with /ultrareview

Before a variance report leaves the agent, it audits itself: re-reads its own output, checks the citation chain back to source captures, and flags low-confidence claims for human review.

This is the governance layer that separates a hackathon demo from something an enterprise auditor would actually sign. Audit handoff cannot tolerate silent failures, so verification is a workflow stage rather than an afterthought.

Messy real-world captures are the product

The definitive constraint is not prompt cleverness — it is torn labels, glare, partial occlusion, and operators who will not reshoot five times. WaybillAgent is built around that distribution of inputs.

That is why the writing on warehouse copilots and AIDC tooling points back here: telemetry and vision systems only matter when they survive the godown, not the demo aisle.

Architecture

Capture layer: phone/meta glasses image capture during aisle walkthrough.

Interpretation layer: Claude Opus vision + extraction pipelines for labels and bin codes.

Agent layer: managed multi-step session coordinating scan, lookup, reconciliation, and variance tagging.

Data layer: ERP/master-data reconciliation plus structured variance report output.

Outcomes

Proved a practical AI-first audit workflow that can run in real warehouse conditions in Nairobi.

Demonstrated operational viability for long-horizon agent sessions and resume/retry behavior.

Established a flagship product proof for forward-deployed AI engineering in East African enterprise environments.

Links & artifacts

Live Demo GitHub LinkedIn Profile

Related work

Soko ERP

Soko is a production multi-tenant ERP for East African SMBs — now live with 500+ businesses, 2.1M+ transactions, and 99.9% uptime. One platform for POS, multi-location inventory, HR/payroll, accounts, and reporting, with M-Pesa, offline-first sync, and multi-currency support (KES/UGX/TZS).

Read case study

AIDC Barcode Toolkit

Open-source toolkit that packages real-world AIDC workflows so Claude Code can generate, validate, and reason about barcode and labeling tasks with domain-correct defaults.

Read case study

Discuss this work

Hiring or building something similar—reach out with context and constraints.

Email Joseph

WaybillAgent

Walk the warehouse, Claude does the audit.

Evaluation scorecard

Production eval dimensions — how this system is judged before and after changes ship.

Label OCR accuracy

Eval harness in place

Test set of damaged, faded, and angled warehouse labels from real captures

Variance detection

Eval harness in place

Structured reconciliation cases against ERP master data

Session resume success

Eval harness in place

Long-horizon walk sessions with intentional interruption and retry

Cost per audit walk

Tracked in production

Selective high-effort reasoning only on variance paths; routine OCR stays cost-efficient

Problem

Warehouse audits often run for weeks with multiple field staff and heavy manual reconciliation in spreadsheets.

Damaged, faded, or angled labels fail frequently on traditional handheld scanners, creating repeated variance loops.

Audit workflows require sustained context across long sessions, not isolated one-shot API calls.

Solution

Built a stateful agent workflow for end-to-end warehouse walk sessions with resumable progress.

Used high-fidelity model vision to extract bin and label data from low-quality real-world images.

Applied selective high-effort reasoning only for variance classification while keeping routine OCR paths cost-efficient.

Added self-verification before report output to improve confidence for enterprise audit handoff.

Technical deep-dive

Vision resolution is the unlock

This matters more in a working godown than a spec sheet suggests. Most labels in a real warehouse are damaged, and every rejected scan is a manual re-key that becomes a variance loop next quarter.

Long-horizon execution with Claude Managed Agents

A warehouse walk is not one API call. It is a 30-to-60 minute stateful session spanning dozens of tool calls and continuous reconciliation against ERP master data.

Effort tiers and predictable unit cost

Self-verification with /ultrareview

Before a variance report leaves the agent, it audits itself: re-reads its own output, checks the citation chain back to source captures, and flags low-confidence claims for human review.

Messy real-world captures are the product

That is why the writing on warehouse copilots and AIDC tooling points back here: telemetry and vision systems only matter when they survive the godown, not the demo aisle.

Architecture

Capture layer: phone/meta glasses image capture during aisle walkthrough.

Interpretation layer: Claude Opus vision + extraction pipelines for labels and bin codes.

Agent layer: managed multi-step session coordinating scan, lookup, reconciliation, and variance tagging.

Data layer: ERP/master-data reconciliation plus structured variance report output.

Outcomes

Proved a practical AI-first audit workflow that can run in real warehouse conditions in Nairobi.

Demonstrated operational viability for long-horizon agent sessions and resume/retry behavior.

Established a flagship product proof for forward-deployed AI engineering in East African enterprise environments.