feat: Mixpeek contextual enrichment for IAB taxonomy classification#78
Open
esteininger wants to merge 2 commits intoIABTechLab:mainfrom
Open
feat: Mixpeek contextual enrichment for IAB taxonomy classification#78esteininger wants to merge 2 commits intoIABTechLab:mainfrom
esteininger wants to merge 2 commits intoIABTechLab:mainfrom
Conversation
Add MixpeekClient and contextual enrichment tools that enable buyer agents to classify content into IAB v3.0 taxonomy categories and search indexed inventory via Mixpeek retriever pipelines. New files: - MixpeekClient: async HTTP client for Mixpeek content-intelligence API - ClassifyContentTool: CrewAI tool for IAB taxonomy classification - ContextualSearchTool: CrewAI tool for multimodal inventory search - MCP tools: classify_content and contextual_search via @mcp.tool() - Unit tests: 18 tests covering client and tool behavior Configuration: MIXPEEK_API_KEY, MIXPEEK_BASE_URL, MIXPEEK_NAMESPACE env vars (all optional, tools gracefully degrade when unconfigured).
Major improvements to the Mixpeek contextual enrichment integration: - Rearchitect classify_content to use retriever-based IAB classification (semantic search against IAB category corpus) instead of batch taxonomy execute endpoint which returns empty for real-time queries - Add check_brand_safety method and BrandSafetyTool that flags sensitive IAB categories (gambling, adult, etc.) with risk levels - Add auto-discovery of IAB retrievers when no retriever_id is specified - Wire ClassifyContentTool, BrandSafetyTool, ContextualSearchTool into the research crew (channel_crews.py) — tools activate when MIXPEEK_API_KEY is configured - Register check_brand_safety as MCP tool alongside classify_content and contextual_search - Add 16 e2e tests hitting production Mixpeek API (golden_adtech_iab namespace with 700+ IAB category documents) verifying: - Sports content → "American Football" (score > 0.80) - Automotive content → "Automotive" hierarchy (score > 0.80) - Food content → "Food & Drink" / "Cooking" - Tech content → "Artificial Intelligence" (score > 0.85) - Gambling content flagged as brand-unsafe (high risk) - Safe content not flagged - Threshold filtering works correctly - Error handling for invalid keys and retriever IDs All 41 tests pass (25 unit + 16 e2e against production).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds Mixpeek contextual enrichment capabilities to the buyer agent, enabling:
What's included
Production E2E verification
All e2e tests run against the production Mixpeek API (golden_adtech_iab namespace with 700+ IAB category documents across 4 tiers). Verified:
How it works
Configuration
All optional. Set in .env:
Test plan