AI architecture audit · BlitzApr 2026 · 4 wks
Blitz's AI voice agent — a Gemini caller handling thousands of customer calls a day, rescheduling deliveries and logging failed-delivery reasons for a logistics-tech company — was drifting off-script. Over four weeks I built the observability and LLM-as-judge evals it was missing, isolated where it failed, and added a pre-call intent gate plus an output validator that lifted correct-action rate from 95% to 98% in eval.
Lead discovery & validation · Leadzo2026
2,143 usable leads at ~2.5¢ each. An async backend that discovers service-provider leads from search, crawls and validates each with an LLM, and dedups them into the marketplace's database — crawling fanned out one URL per Lambda invocation to keep per-lead cost flat. Delivered as a clean API for the client's own team to build against.
03demo soon
HR AI · agent security testbed2026
An AI recruitment agent built as a security-research testbed: it screens candidates end to end across 15 tools — resume, LinkedIn, and web presence gathered concurrently, scored and ranked via a dedicated ATS sub-agent — then deliberately surfaces the prompt-injection, data-poisoning, and cross-tenant-leakage failure modes production LLM agents hit, and how to harden them. 145 tests.
04demo soon
jakk · black-box MCP security scanner2026
A security scanner that probes AI agents (MCP servers) for whether they can be tricked into leaking data or running unintended commands. It enumerates the server's tools, fires curated probes, and classifies each response deterministically — no LLM in the loop, so every “vulnerable” verdict is real, not noise.
ReelLab · video DNA engine2026
An AI pipeline that pulls the psychological hooks, visual mechanics, and clean transcriptions out of Instagram Reels, so creators can read what's actually working on a profile.