feat: Mixpeek contextual enrichment for IAB taxonomy classification by esteininger · Pull Request #78 · IABTechLab/buyer-agent

esteininger · 2026-03-30T19:12:15Z

Summary

Adds Mixpeek contextual enrichment capabilities to the buyer agent, enabling:

IAB v3.0 content classification — classify ad page text into standardized IAB taxonomy categories (Sports > American Football, Automotive > Luxury Cars, etc.) with confidence scores using semantic search against an IAB category reference corpus
Brand-safety scoring — flag content matching sensitive IAB categories (gambling, adult, etc.) with risk levels (low/medium/high) and flagged category details
Contextual inventory search — search indexed ad inventory via Mixpeek retriever pipelines combining multimodal search, taxonomy enrichment, and reranking
CrewAI crew integration — tools auto-activate in the research crew when MIXPEEK_API_KEY is configured

What's included

Layer	Files	Description
Client	clients/mixpeek_client.py	Async HTTP client: classify_content, check_brand_safety, search_content, list_retrievers, list_taxonomies
CrewAI Tools	tools/research/contextual_enrichment.py	ClassifyContentTool, BrandSafetyTool, ContextualSearchTool with auto-discovery of IAB retrievers
MCP Tools	interfaces/mcp_server.py	classify_content, check_brand_safety, contextual_search via @mcp.tool()
Crew Wiring	crews/channel_crews.py	Tools added to _create_research_tools() when MIXPEEK_API_KEY is set
Config	config/settings.py, .env.example	MIXPEEK_API_KEY, MIXPEEK_BASE_URL, MIXPEEK_NAMESPACE
Unit Tests	tests/unit/test_mixpeek_client.py, test_contextual_enrichment.py	25 tests covering client, brand safety, and tool behavior
E2E Tests	tests/e2e/test_mixpeek_production.py	16 tests hitting production Mixpeek API with real IAB data

Production E2E verification

All e2e tests run against the production Mixpeek API (golden_adtech_iab namespace with 700+ IAB category documents across 4 tiers). Verified:

NFL content classifies as Sports > American Football (score 0.87)
Luxury car content classifies as Automotive hierarchy (score 0.84)
Cooking content classifies as Food & Drink > Cooking (score 0.84)
AI content classifies as Technology & Computing > Artificial Intelligence (score 0.90)
Gambling content flagged as brand-unsafe (Poker and Professional Gambling, score 0.88)
Safe sports content correctly returns safe=true, risk_level=low
Brand-safety threshold filtering works (higher threshold = fewer matches)
Invalid retriever IDs and API keys return proper errors

How it works

Classification: Uses a Mixpeek retriever pipeline that performs semantic search against an IAB v3.0 category reference corpus. Each result is an IAB category with hierarchical path and confidence score.
Brand safety: Classifies content then checks matches against a curated list of sensitive IAB categories (gambling, adult, terrorism, etc.). Returns safe/unsafe verdict with risk level.
Auto-discovery: When no retriever_id is specified, tools automatically discover IAB retrievers in the configured namespace.
Graceful degradation: All tools return error JSON (not exceptions) when Mixpeek is unconfigured. Crew tools only activate when MIXPEEK_API_KEY is set.

Configuration

All optional. Set in .env:

MIXPEEK_API_KEY=        # Mixpeek API key
MIXPEEK_BASE_URL=https://api.mixpeek.com
MIXPEEK_NAMESPACE=      # Namespace with IAB data

Test plan

25 unit tests pass (mocked)
16 e2e tests pass against production Mixpeek API (real IAB classifications)
Brand-safety correctly flags gambling content, passes safe content
Auto-discovery finds IAB retrievers without explicit retriever_id
Error handling for invalid API keys and retriever IDs
Smoke test with full buyer server running (pytest tests/smoke/test_mcp_e2e.py)

Add MixpeekClient and contextual enrichment tools that enable buyer agents to classify content into IAB v3.0 taxonomy categories and search indexed inventory via Mixpeek retriever pipelines. New files: - MixpeekClient: async HTTP client for Mixpeek content-intelligence API - ClassifyContentTool: CrewAI tool for IAB taxonomy classification - ContextualSearchTool: CrewAI tool for multimodal inventory search - MCP tools: classify_content and contextual_search via @mcp.tool() - Unit tests: 18 tests covering client and tool behavior Configuration: MIXPEEK_API_KEY, MIXPEEK_BASE_URL, MIXPEEK_NAMESPACE env vars (all optional, tools gracefully degrade when unconfigured).

Major improvements to the Mixpeek contextual enrichment integration: - Rearchitect classify_content to use retriever-based IAB classification (semantic search against IAB category corpus) instead of batch taxonomy execute endpoint which returns empty for real-time queries - Add check_brand_safety method and BrandSafetyTool that flags sensitive IAB categories (gambling, adult, etc.) with risk levels - Add auto-discovery of IAB retrievers when no retriever_id is specified - Wire ClassifyContentTool, BrandSafetyTool, ContextualSearchTool into the research crew (channel_crews.py) — tools activate when MIXPEEK_API_KEY is configured - Register check_brand_safety as MCP tool alongside classify_content and contextual_search - Add 16 e2e tests hitting production Mixpeek API (golden_adtech_iab namespace with 700+ IAB category documents) verifying: - Sports content → "American Football" (score > 0.80) - Automotive content → "Automotive" hierarchy (score > 0.80) - Food content → "Food & Drink" / "Cooking" - Tech content → "Artificial Intelligence" (score > 0.85) - Gambling content flagged as brand-unsafe (high risk) - Safe content not flagged - Threshold filtering works correctly - Error handling for invalid keys and retriever IDs All 41 tests pass (25 unit + 16 e2e against production).

esteininger added 2 commits March 30, 2026 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Mixpeek contextual enrichment for IAB taxonomy classification#78

feat: Mixpeek contextual enrichment for IAB taxonomy classification#78
esteininger wants to merge 2 commits intoIABTechLab:mainfrom
esteininger:feat/mixpeek-contextual-enrichment

esteininger commented Mar 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

esteininger commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Production E2E verification

How it works

Configuration

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

esteininger commented Mar 30, 2026 •

edited

Loading