-
Notifications
You must be signed in to change notification settings - Fork 83
Added Conversation Compression System #1343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tim-inkeep
commented
Dec 19, 2025
- New compression services: Add ConversationCompressor and update conversation history handling to use getConversationHistoryWithCompression()
- Code organization: Extract generation logic into helper methods (setupGenerationContext, loadToolsAndPrompts, buildConversationHistory, etc.) to reduce complexity in main generateResponse method
- JSON post-processing: Add withJsonPostProcessing utility for better response handling
🦋 Changeset detectedLatest commit: 9e5ecb5 The changes in this PR will be included in the next version bump. This PR includes changesets to release 9 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
|
Claude finished @tim-inkeep's task —— View job Code Review: Conversation Compression SystemHaving conducted a comprehensive analysis of this conversation compression system PR, I'll provide feedback focusing on the most critical architectural and engineering considerations. Review Checklist:
Architecture & System Design Review✅ Strong Architectural Decisions1. Inheritance-Based Compression Hierarchy 2. Clean Integration with Existing Message Flow 🚨 Critical Areas for Improvement1. Memory Management & Potential Memory Leaks // BaseCompressor.ts:36-37
protected processedToolCalls = new Set<string>();
protected cumulativeSummary: ConversationSummary | null = null;Issue: These class-level sets and objects will grow indefinitely during long-running conversations. There's no cleanup mechanism. Recommendation: Implement periodic cleanup based on message age or implement a TTL mechanism: // Add to BaseCompressor
private cleanup() {
// Clean tool calls older than X hours
const cutoff = Date.now() - (6 * 60 * 60 * 1000); // 6 hours
this.processedToolCalls = new Set(Array.from(this.processedToolCalls).slice(-100));
}2. Race Conditions in Compression Logic // conversations.ts:500-506
const newMessagesCompressed = await compressConversationIfNeeded(messagesAfterCompression, {
conversationId, tenantId, projectId, summarizerModel, streamRequestId,
});Issue: Multiple concurrent requests could trigger compression simultaneously for the same conversation, potentially creating duplicate compression summaries. Recommendation: Implement conversation-level locking or atomic compression checks using database constraints. 3. Token Estimation Accuracy // BaseCompressor.ts:58-62
protected estimateTokens(content: any): number {
const text = typeof content === 'string' ? content : JSON.stringify(content);
return Math.ceil(text.length / 4);
}Issue: The 4:1 character-to-token ratio is a rough approximation that can be significantly inaccurate, especially for code, JSON, or multilingual content. This could lead to under/over-compression. Recommendation: Use the actual model's tokenizer when available, or implement model-specific estimation: // Consider using tiktoken for OpenAI models or similar libraries for other providers
protected estimateTokens(content: any, model?: string): number {
// Use actual tokenizer when available
if (this.baseModel?.provider === 'openai' && tiktoken) {
return tiktoken.encode(text).length;
}
// Fallback with model-specific ratios
return this.getModelSpecificEstimate(text, model);
}4. Error Boundary & Fallback Handling Recommendation: Add fallback mechanisms: async compress(messages: any[]): Promise<CompressionResult> {
try {
return await this.performCompression(messages);
} catch (error) {
logger.error({ error }, 'Compression failed, falling back to simple truncation');
return this.fallbackToSimpleCompression(messages);
}
}🏗️ Code Organization & Maintainability1. Agent.ts Refactoring Quality Recommendation: Consider further extraction: // Extract phase execution into separate methods
private async executePhase1(config: Phase1Config): Promise<GenerationResult>
private async executePhase2(config: Phase2Config): Promise<StructuredResult>2. Tool Result Artifact Creation Logic // BaseCompressor.ts:112-325 (saveToolResultsAsArtifacts)This method is doing too many things: format conversion, filtering, deduplication, and artifact creation. It's 200+ lines and violates single responsibility principle. Recommendation: Break into smaller, focused methods: private async processMessage(message: any): Promise<ToolCallArtifact[]>
private async createArtifactForToolCall(toolCall: ToolCall): Promise<string>
private shouldSkipToolCall(toolName: string): boolean🔒 Security & Data Handling1. Tool Result Content Sanitization // BaseCompressor.ts:436-456 (removeStructureHints)While structure hints are removed, there's no validation for potentially sensitive content in tool results before compression. Recommendation: Add content sanitization before compression: private sanitizeToolResult(result: any): any {
// Remove potential secrets, API keys, etc.
return this.recursivelyCleanSensitiveData(result);
}2. Artifact Metadata Consistency ⚡ Performance Considerations1. N+1 Query Pattern // BaseCompressor.ts:195-199
const existingArtifacts = await getLedgerArtifacts(dbClient)({
scopes: { tenantId: this.tenantId, projectId: this.projectId },
toolCallId: block.toolCallId,
});This creates individual database queries for each tool call, leading to N+1 performance issues. Recommendation: Batch the artifact lookups: // Collect all toolCallIds first, then query in batch
const toolCallIds = extractToolCallIds(messages);
const existingArtifacts = await getLedgerArtifactsBatch(dbClient)({
scopes: { tenantId: this.tenantId, projectId: this.projectId },
toolCallIds,
});2. Context Size Calculation Efficiency 📋 Testing & Documentation Gaps1. Missing Edge Case Tests
2. Documentation for Model-Specific Behavior 🔄 API Contracts & Backwards Compatibility1. Message Format Evolution Recommendation: Add database migration for existing conversations: -- Add migration to handle existing conversations without compression metadata
ALTER TABLE conversations ADD COLUMN compression_version INTEGER DEFAULT 1;🎯 Specific Recommendations for Implementation
📊 Overall AssessmentThis is a well-architected feature that addresses a real need for conversation context management. The inheritance hierarchy and integration approach are sound. However, the implementation has several production-readiness concerns around memory management, race conditions, and error handling that should be addressed before merge. The refactoring of Agent.ts improves maintainability, and the compression system provides a solid foundation for scaling to longer conversations. Severity Level: Medium-High - The core functionality is solid, but the production reliability concerns (memory leaks, race conditions, error handling) need to be addressed to ensure system stability under load. |
|
🔎💬 Inkeep AI search and chat service is syncing content for source 'Inkeep Agent Framework Docs' |