Walkthrough — AI Scraper Defense Portfolio

Problem Framing

Understanding the Threat

The narrative problem statement establishes the business context using a Prohibition-era bootlegger analogy. It answers: what is happening, why does it matter commercially, and who is being hurt? Uses Australian industry case studies (anonymised) to make the threat tangible.

🥃

Narrative

The Data Bootleggers

→

Solution Strategy

The Defense Playbook

Maps the FBI "Untouchables" doctrine to a 5-layer enterprise defense stack. Presents the market landscape, identifies three investable whitespace opportunities, and outlines a 4-phase 12-month enterprise roadmap. This is the strategy brief for decision-makers.

⚔️

Narrative

Operation Untouchable

→

Current State · Architecture

The Undefended Platform

An SVG architecture diagram of a typical enterprise eCommerce platform with animated red attack-flow lines showing exactly how scrapers traverse each layer — CDN → Load Balancer → Web App → API → Database. Six attack vector panels detail each vulnerability with severity ratings.

🏚️

Architecture

Undefended Enterprise Arch

→

Target State · Architecture

The Defended Platform

The same architecture transformed with 5 color-coded defense layers. Green flows show legitimate user traffic passing normally; orange paths show sophisticated bots being rerouted to the Data Poison Engine; teal lines show the IP watermarking applied to all content. Includes a Bot Fate Decision Matrix showing what percentage of bots each layer stops.

Current State · Detailed Design

The Attacker's Playbook

A forensic 6-phase attack pipeline showing exactly how an AI scraper operates: Target Discovery → Browser Impersonation → JS Execution → LLM Semantic Extraction → Batch Crawl → Monetisation. Each phase includes real API code samples (Firecrawl, Playwright) and a business impact panel. Ends with a swimlane actor interaction map across all six phases — with a key insight: the attacker's total cost for your full 40,000 SKU catalog is approximately $8.

🔬

Detailed Design

Attack Vector Blueprint

→

Target State · Detailed Design

The Defense Stack Blueprint

The complete technical specification: a bot decision tree flowchart showing every possible traffic path; four component specification cards with real Terraform, Next.js, and Express.js code; a swimlane showing how each of the four actor types (real user, basic scraper, headless Chrome, sophisticated AI bot) experience the defended platform; and an L5 IP watermarking spec covering both text (NLP signatures) and image (pixel-level perturbations) provenance.

⚙️

Detailed Design

Defense Stack Blueprint

→

📄

HTML Documents

Problem narrative, solution strategy, 2 architecture diagrams, 2 detailed designs — all interlinked via shared navigation.

🛡️

Defense Layers

TLS fingerprinting, Behavioral AI, App obfuscation + honeypots, Data poisoning engine, IP watermarking.

🎯

95%

Bot Block Rate

L1 blocks ~60% of basic scrapers. L2 adds ~30% of headless Chrome bots. Remaining 9% receive poisoned data. 1% residual has legal watermark trail.

🏦

Market Whitespaces

Text IP Watermarking SaaS, Enterprise RAG Poisoning, Mid-Market Bot Shield — all identified as uninhabited product opportunities.

📅

12mo

Implementation Roadmap

4-phase deployment plan from edge filtering through full IP legal provenance capability, suitable for enterprise adoption.

⚖️

∞

Legal Recourse

L5 watermarking provides court-grade cryptographic proof for any content scraped and used in LLM training or competitor products.

From Problemto Architecture

From Problem
to Architecture