Skip to content

feat: Mixpeek contextual enrichment for IAB taxonomy classification#78

Open
esteininger wants to merge 2 commits intoIABTechLab:mainfrom
esteininger:feat/mixpeek-contextual-enrichment
Open

feat: Mixpeek contextual enrichment for IAB taxonomy classification#78
esteininger wants to merge 2 commits intoIABTechLab:mainfrom
esteininger:feat/mixpeek-contextual-enrichment

Conversation

@esteininger
Copy link
Copy Markdown

@esteininger esteininger commented Mar 30, 2026

Summary

Adds Mixpeek contextual enrichment capabilities to the buyer agent, enabling:

  • IAB v3.0 content classification — classify ad page text into standardized IAB taxonomy categories (Sports > American Football, Automotive > Luxury Cars, etc.) with confidence scores using semantic search against an IAB category reference corpus
  • Brand-safety scoring — flag content matching sensitive IAB categories (gambling, adult, etc.) with risk levels (low/medium/high) and flagged category details
  • Contextual inventory search — search indexed ad inventory via Mixpeek retriever pipelines combining multimodal search, taxonomy enrichment, and reranking
  • CrewAI crew integration — tools auto-activate in the research crew when MIXPEEK_API_KEY is configured

What's included

Layer Files Description
Client clients/mixpeek_client.py Async HTTP client: classify_content, check_brand_safety, search_content, list_retrievers, list_taxonomies
CrewAI Tools tools/research/contextual_enrichment.py ClassifyContentTool, BrandSafetyTool, ContextualSearchTool with auto-discovery of IAB retrievers
MCP Tools interfaces/mcp_server.py classify_content, check_brand_safety, contextual_search via @mcp.tool()
Crew Wiring crews/channel_crews.py Tools added to _create_research_tools() when MIXPEEK_API_KEY is set
Config config/settings.py, .env.example MIXPEEK_API_KEY, MIXPEEK_BASE_URL, MIXPEEK_NAMESPACE
Unit Tests tests/unit/test_mixpeek_client.py, test_contextual_enrichment.py 25 tests covering client, brand safety, and tool behavior
E2E Tests tests/e2e/test_mixpeek_production.py 16 tests hitting production Mixpeek API with real IAB data

Production E2E verification

All e2e tests run against the production Mixpeek API (golden_adtech_iab namespace with 700+ IAB category documents across 4 tiers). Verified:

  • NFL content classifies as Sports > American Football (score 0.87)
  • Luxury car content classifies as Automotive hierarchy (score 0.84)
  • Cooking content classifies as Food & Drink > Cooking (score 0.84)
  • AI content classifies as Technology & Computing > Artificial Intelligence (score 0.90)
  • Gambling content flagged as brand-unsafe (Poker and Professional Gambling, score 0.88)
  • Safe sports content correctly returns safe=true, risk_level=low
  • Brand-safety threshold filtering works (higher threshold = fewer matches)
  • Invalid retriever IDs and API keys return proper errors

How it works

  1. Classification: Uses a Mixpeek retriever pipeline that performs semantic search against an IAB v3.0 category reference corpus. Each result is an IAB category with hierarchical path and confidence score.
  2. Brand safety: Classifies content then checks matches against a curated list of sensitive IAB categories (gambling, adult, terrorism, etc.). Returns safe/unsafe verdict with risk level.
  3. Auto-discovery: When no retriever_id is specified, tools automatically discover IAB retrievers in the configured namespace.
  4. Graceful degradation: All tools return error JSON (not exceptions) when Mixpeek is unconfigured. Crew tools only activate when MIXPEEK_API_KEY is set.

Configuration

All optional. Set in .env:

MIXPEEK_API_KEY=        # Mixpeek API key
MIXPEEK_BASE_URL=https://api.mixpeek.com
MIXPEEK_NAMESPACE=      # Namespace with IAB data

Test plan

  • 25 unit tests pass (mocked)
  • 16 e2e tests pass against production Mixpeek API (real IAB classifications)
  • Brand-safety correctly flags gambling content, passes safe content
  • Auto-discovery finds IAB retrievers without explicit retriever_id
  • Error handling for invalid API keys and retriever IDs
  • Smoke test with full buyer server running (pytest tests/smoke/test_mcp_e2e.py)

Add MixpeekClient and contextual enrichment tools that enable buyer
agents to classify content into IAB v3.0 taxonomy categories and
search indexed inventory via Mixpeek retriever pipelines.

New files:
- MixpeekClient: async HTTP client for Mixpeek content-intelligence API
- ClassifyContentTool: CrewAI tool for IAB taxonomy classification
- ContextualSearchTool: CrewAI tool for multimodal inventory search
- MCP tools: classify_content and contextual_search via @mcp.tool()
- Unit tests: 18 tests covering client and tool behavior

Configuration: MIXPEEK_API_KEY, MIXPEEK_BASE_URL, MIXPEEK_NAMESPACE
env vars (all optional, tools gracefully degrade when unconfigured).
Major improvements to the Mixpeek contextual enrichment integration:

- Rearchitect classify_content to use retriever-based IAB classification
  (semantic search against IAB category corpus) instead of batch taxonomy
  execute endpoint which returns empty for real-time queries
- Add check_brand_safety method and BrandSafetyTool that flags sensitive
  IAB categories (gambling, adult, etc.) with risk levels
- Add auto-discovery of IAB retrievers when no retriever_id is specified
- Wire ClassifyContentTool, BrandSafetyTool, ContextualSearchTool into
  the research crew (channel_crews.py) — tools activate when
  MIXPEEK_API_KEY is configured
- Register check_brand_safety as MCP tool alongside classify_content
  and contextual_search
- Add 16 e2e tests hitting production Mixpeek API (golden_adtech_iab
  namespace with 700+ IAB category documents) verifying:
  - Sports content → "American Football" (score > 0.80)
  - Automotive content → "Automotive" hierarchy (score > 0.80)
  - Food content → "Food & Drink" / "Cooking"
  - Tech content → "Artificial Intelligence" (score > 0.85)
  - Gambling content flagged as brand-unsafe (high risk)
  - Safe content not flagged
  - Threshold filtering works correctly
  - Error handling for invalid keys and retriever IDs

All 41 tests pass (25 unit + 16 e2e against production).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant