Skip to content

[Future] Evaluate Just-in-Time Context Loading Strategy #62

@frankbria

Description

@frankbria

Summary

Evaluate implementing Just-in-Time (JIT) context loading to fetch relevant context only when immediately needed, rather than pre-loading based on tier classification.

Background: State of the Art

Philipp Schmid's context engineering framework emphasizes "just-in-time" loading as a key optimization. From his practical tips:

"Instead of pre-loading all data (traditional RAG), use just-in-time strategies."

The core insight: pre-loading context based on predicted relevance introduces two problems:

  1. Latency penalty - loading context that may never be used
  2. Relevance decay - context loaded early may be stale by the time it's needed

JIT loading means the agent requests specific context at the moment a tool call or decision requires it, ensuring maximum relevance and minimum waste.

Current State in CodeFRAME

The tiered HOT/WARM/COLD memory system assigns importance scores and manages context retention. However, it's unclear whether:

  • Context is loaded proactively based on tier (pre-loading)
  • Context is fetched on-demand when the LLM signals need (JIT)
  • There's a hybrid approach

The flash_save mechanism handles persistence, but the loading strategy needs examination.

Investigation Tasks

  1. Audit current loading behavior

    • Trace when context moves from COLD → WARM → HOT
    • Identify if loading is triggered by tier promotion rules or by explicit LLM requests
    • Measure how often loaded context is actually used in subsequent calls
  2. Benchmark current approach

    • Track "context utilization rate" (loaded tokens vs. tokens referenced in responses)
    • Measure time-to-first-token with current loading strategy
  3. Design JIT loading mechanism

    • Define tool/function for agents to request specific context categories
    • Implement lazy loading from COLD storage
    • Consider prefetch hints based on task type (e.g., "test task" prefetches test history)
  4. Prototype and compare

    • A/B test pre-load vs. JIT for representative tasks
    • Measure token efficiency, latency, and task success rate

Success Criteria

  • Documented understanding of current loading behavior
  • Metrics showing context utilization rate
  • If JIT shows benefit: implementation with measurable improvement in token efficiency
  • If pre-loading is optimal: documented rationale for current approach

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    FutureDeferred - beyond v1/v2 scope, consider for future versionsarchitectureSystem architecture and design patternscontext-engineeringContext window management and optimizationenhancementNew feature or requestpriority:medium

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions