Skip to content

perf: Parallelize LLM calls in extract_keywords_and_rewrite to reduce latency #78

@QuantumByte-01

Description

@QuantumByte-01

Problem

extract_keywords_and_rewrite makes 4 sequential Gemini calls per request:

  1. detect_intents (raw query)
  2. rewrite_with_history
  3. call_gemini_for_keywords
  4. detect_intents (rewritten query)

Calls 1 and 2 are independent — both only need the raw query. Running them with asyncio.gather saves ~1-2s per request.

Fix

intents0, effective = await asyncio.gather(
    call_gemini_detect_intents(state["query"], history),
    call_gemini_rewrite_with_history(state["query"], history),
)

Then run call_gemini_for_keywords and second detect_intents in a second gather.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions