[Phase 4] Optimize Context Assembly Order for LLM Cache Hits

## Summary

Audit and optimize context assembly order to maximize LLM provider cache hits, potentially achieving up to 4x cost reduction through prompt caching.

## Background: State of the Art

From Philipp Schmid's 5 Practical Tips for Context Engineering:

> "Context Ordering Matters: Try to use 'append-only' context, adding new information to the end. This maximizes cache hits reducing cost (4x) and latency."

LLM providers (Anthropic, OpenAI) implement prompt caching where repeated prefixes are cached. If your context window looks like:

```
[System Prompt] + [Project Context] + [Task History] + [Current Request]
```

And only `[Current Request]` changes between calls, the prefix can be cached. But if you reorder or modify earlier sections, the cache is invalidated.

**Key principle**: Static content first, dynamic content last.

## Current State in CodeFRAME

The tiered memory system assembles context, but the *ordering* of that assembly is unclear:
- Is system prompt consistently first?
- Does tier promotion/demotion cause reordering?
- Are tool definitions stable in position?
- Is task-specific context appended at the end?

With Claude API's prompt caching (available since late 2024), improper ordering directly impacts costs.

## Investigation Tasks

1. **Map current context assembly order**
   - Document the exact sequence: system prompt → X → Y → Z → user message
   - Identify what components are static vs. dynamic per-call
   - Check if tier changes cause mid-context insertions

2. **Identify cache-breaking patterns**
   - Log context assembly across multiple agent calls in a session
   - Diff consecutive prompts to find what's changing and where
   - Quantify how much prefix is stable vs. changing

3. **Implement append-only assembly**
   - Restructure context builder to enforce: `static_prefix + append_only_dynamic`
   - Move all changing content to end of context
   - Ensure tool definitions don't shift position

4. **Measure cache hit rates**
   - Enable cache metrics from Anthropic API (if using Claude)
   - Compare before/after cost and latency

## Success Criteria

- [ ] Documented context assembly sequence
- [ ] Identified cache-breaking patterns in current implementation
- [ ] Refactored to append-only pattern (if beneficial)
- [ ] Measured improvement in cache hit rate and cost reduction

## Cost Impact Estimate

If CodeFRAME averages 50 LLM calls per task with 10K tokens of static context:
- Without caching: 50 × 10K = 500K input tokens billed
- With caching: 10K + (49 × cached rate) = potentially 75% reduction
- At Claude Sonnet rates: significant $ savings at scale

## References

- [Anthropic Prompt Caching Documentation](https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching)
- [Context Engineering Tips - Philipp Schmid](https://x.com/_philschmid/status/1982861526466707477)
- CodeFRAME cost tracking already exists - leverage for measurement


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Phase 4] Optimize Context Assembly Order for LLM Cache Hits #63

Summary

Background: State of the Art

Current State in CodeFRAME

Investigation Tasks

Success Criteria

Cost Impact Estimate

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Phase 4] Optimize Context Assembly Order for LLM Cache Hits #63

Description

Summary

Background: State of the Art

Current State in CodeFRAME

Investigation Tasks

Success Criteria

Cost Impact Estimate

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions