-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Summary
Evaluate implementing Just-in-Time (JIT) context loading to fetch relevant context only when immediately needed, rather than pre-loading based on tier classification.
Background: State of the Art
Philipp Schmid's context engineering framework emphasizes "just-in-time" loading as a key optimization. From his practical tips:
"Instead of pre-loading all data (traditional RAG), use just-in-time strategies."
The core insight: pre-loading context based on predicted relevance introduces two problems:
- Latency penalty - loading context that may never be used
- Relevance decay - context loaded early may be stale by the time it's needed
JIT loading means the agent requests specific context at the moment a tool call or decision requires it, ensuring maximum relevance and minimum waste.
Current State in CodeFRAME
The tiered HOT/WARM/COLD memory system assigns importance scores and manages context retention. However, it's unclear whether:
- Context is loaded proactively based on tier (pre-loading)
- Context is fetched on-demand when the LLM signals need (JIT)
- There's a hybrid approach
The flash_save mechanism handles persistence, but the loading strategy needs examination.
Investigation Tasks
-
Audit current loading behavior
- Trace when context moves from COLD → WARM → HOT
- Identify if loading is triggered by tier promotion rules or by explicit LLM requests
- Measure how often loaded context is actually used in subsequent calls
-
Benchmark current approach
- Track "context utilization rate" (loaded tokens vs. tokens referenced in responses)
- Measure time-to-first-token with current loading strategy
-
Design JIT loading mechanism
- Define tool/function for agents to request specific context categories
- Implement lazy loading from COLD storage
- Consider prefetch hints based on task type (e.g., "test task" prefetches test history)
-
Prototype and compare
- A/B test pre-load vs. JIT for representative tasks
- Measure token efficiency, latency, and task success rate
Success Criteria
- Documented understanding of current loading behavior
- Metrics showing context utilization rate
- If JIT shows benefit: implementation with measurable improvement in token efficiency
- If pre-loading is optimal: documented rationale for current approach
References
- Context Engineering - Philipp Schmid
- 12-Factor Agents - Own Your Context Window
- Manus data point: ~50 tool calls per task, context management critical