Optimizing Chat Memory with ConversationSummaryBufferMemory for Reduced Latency and Context Size Using GroqLLM #89

jhaayush2004 · 2025-05-11T18:54:33Z

Optimizing Chat Memory with ConversationSummaryBufferMemory for Reduced Latency and Context Size Using GroqLLM

Description

Use of "ChatMessageHistory" was storing full role tagged messages and it was thus, using a high context length which can apparently lead to increased latency, violating token limits as well as hallucination risk due to large context length. So, I have used "ConversationSummaryBufferMemory" in place of "ChatMessageHistory". This will summarize the older chats retaining all important parts of conversation and keep in buffer, the recent messages with roles tagged with them while strictly maintaining the provided max limit of token length which will surely reduce latency and hallucination risk due to short, exact and crisp summary of chat history instead of using full chat history in vanilla form. I have used Groq LLM Backend for supporting "ConversationSummaryBufferMemory" instance bcoz of its faster response to calls which will further reduce latency and help maintaining user-engagement.

session_history.py

import os
from langchain.memory import ConversationSummaryBufferMemory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_groq import ChatGroq

# We can Load environment variables from .env file
from dotenv import load_dotenv
load_dotenv()

# GROQ API key 
groq_api_key = os.getenv("GROQ_API_KEY")
if not groq_api_key:
    raise ValueError("GROQ_API_KEY not found in environment variables.")

# Initializing our Groq LLM, we can change model as per need.
llm = ChatGroq(temperature=0, model_name="llama3-70b-8192")

# Dictionary to hold memory per session
store = {}

from langchain.memory import ConversationSummaryBufferMemory

def get_session_memory(session_id: str) -> ConversationSummaryBufferMemory:
    if session_id not in store:
        # Creating a chat history instance
        chat_history = ChatMessageHistory()

        # Create summary buffer memory using Groq LLM
        memory = ConversationSummaryBufferMemory(
            llm=llm,
            max_token_limit= 50,  # Adjust based on desired summary size
            memory_key="chat_history",
            return_messages=True,
            chat_memory=chat_history,
            verbose = True
        )

        store[session_id] = memory

    return store[session_id]

test_session_memory.py

# test_session_memory.py
from session_history import get_session_memory

def test_summary_buffer():
    session_id = "demo-session-001"
    
    # Get memory for this session
    memory = get_session_memory(session_id)

    # Simulate a conversation using save_context to enable summarization
    memory.save_context(
        {"input": "Hello, who are you?"},
        {"output": "I'm an AI developed to help you."}
    )

    memory.save_context(
        {"input": "Tell me the entire process of how LangChain handles a memory class including message passing, token estimation, and persistence."},
        {"output": "LangChain is a framework for building LLM-powered apps."}
    )

    # Trigger summarization and retrieve output
    result = memory.load_memory_variables({})

    print("\n--- 🔁 Raw Messages in Buffer ---")
    for msg in memory.chat_memory.messages:
        print(f"{msg.type.capitalize()}: {msg.content}")

    print("\n--- 🧠 Moving Summary ---")
    print(memory.moving_summary_buffer)

    print("\n--- 📦 Final memory.load_memory_variables() Output ---")
    print(result["chat_history"])  # Used in the prompt if you’re chaining

if __name__ == "__main__":
    test_summary_buffer()

output

Test Conclusion

It works perfectly without any errors.

Use of "ChatMessageHistory" was storing full role tagged messages and it was thus, using a high context length which can apparently lead to increased latency as well as hallucination risk due to large context length. So, I have used "ConversationSummaryBufferMemory" in place of "ChatMessageHistory". This will summarize the older chats retaining all important parts of conversation and keep the recent messages with roles tagged with them while strictly maintaining the provided max limit of token length which will surely reduce latency and hallucination risk due to short, exact and crisp summary of chat history instead of using full chat history in vanilla form. Groq has been used bcoz of its fast response to calls which will further reduce latency and help maintaining user-engagement. Signed-off-by: Ayush Shaurya Jha <[email protected]>

Added langchain-core, langchain-groq and python-dotenv with repect to the changes made to session_history.py Signed-off-by: Ayush Shaurya Jha <[email protected]>

Signed-off-by: Ayush Shaurya Jha <[email protected]>

jhaayush2004 · 2025-05-11T18:57:52Z

I have continuously been trying to improve and make the multi agent RAG Pipeline more efficient and user-engaging and want to contribute to it through LFX mentorship 2025. Kindly view the above PR. Waiting for your suggestions !

jhaayush2004 · 2025-05-25T07:27:53Z

@gcapuzzi Kindly review it and do the needful.

jhaayush2004 added 4 commits May 11, 2025 21:13

Update requirements.txt

5b81e19

Added langchain-core, langchain-groq and python-dotenv with repect to the changes made to session_history.py Signed-off-by: Ayush Shaurya Jha <[email protected]>

Update session_history.py

8fed4dd

Signed-off-by: Ayush Shaurya Jha <[email protected]>

Update requirements.txt

f6d05e9

Signed-off-by: Ayush Shaurya Jha <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimizing Chat Memory with ConversationSummaryBufferMemory for Reduced Latency and Context Size Using GroqLLM #89

Optimizing Chat Memory with ConversationSummaryBufferMemory for Reduced Latency and Context Size Using GroqLLM #89

Uh oh!

jhaayush2004 commented May 11, 2025

Uh oh!

jhaayush2004 commented May 11, 2025 •

edited

Loading

Uh oh!

jhaayush2004 commented May 25, 2025

Uh oh!

Uh oh!

Optimizing Chat Memory with ConversationSummaryBufferMemory for Reduced Latency and Context Size Using GroqLLM #89

Are you sure you want to change the base?

Optimizing Chat Memory with ConversationSummaryBufferMemory for Reduced Latency and Context Size Using GroqLLM #89

Uh oh!

Conversation

jhaayush2004 commented May 11, 2025

Optimizing Chat Memory with ConversationSummaryBufferMemory for Reduced Latency and Context Size Using GroqLLM

Description

session_history.py

test_session_memory.py

output

Test Conclusion

Uh oh!

jhaayush2004 commented May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jhaayush2004 commented May 25, 2025

Uh oh!

Uh oh!

jhaayush2004 commented May 11, 2025 •

edited

Loading