Skip to content

Optimizing Chat Memory with ConversationSummaryBufferMemory for Reduced Latency and Context Size Using GroqLLM #89

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: agents
Choose a base branch
from

Conversation

jhaayush2004
Copy link

Optimizing Chat Memory with ConversationSummaryBufferMemory for Reduced Latency and Context Size Using GroqLLM

Description

Use of "ChatMessageHistory" was storing full role tagged messages and it was thus, using a high context length which can apparently lead to increased latency, violating token limits as well as hallucination risk due to large context length. So, I have used "ConversationSummaryBufferMemory" in place of "ChatMessageHistory". This will summarize the older chats retaining all important parts of conversation and keep in buffer, the recent messages with roles tagged with them while strictly maintaining the provided max limit of token length which will surely reduce latency and hallucination risk due to short, exact and crisp summary of chat history instead of using full chat history in vanilla form. I have used Groq LLM Backend for supporting "ConversationSummaryBufferMemory" instance bcoz of its faster response to calls which will further reduce latency and help maintaining user-engagement.

session_history.py

import os
from langchain.memory import ConversationSummaryBufferMemory
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_groq import ChatGroq

# We can Load environment variables from .env file
from dotenv import load_dotenv
load_dotenv()

# GROQ API key 
groq_api_key = os.getenv("GROQ_API_KEY")
if not groq_api_key:
    raise ValueError("GROQ_API_KEY not found in environment variables.")

# Initializing our Groq LLM, we can change model as per need.
llm = ChatGroq(temperature=0, model_name="llama3-70b-8192")

# Dictionary to hold memory per session
store = {}

from langchain.memory import ConversationSummaryBufferMemory

def get_session_memory(session_id: str) -> ConversationSummaryBufferMemory:
    if session_id not in store:
        # Creating a chat history instance
        chat_history = ChatMessageHistory()

        # Create summary buffer memory using Groq LLM
        memory = ConversationSummaryBufferMemory(
            llm=llm,
            max_token_limit= 50,  # Adjust based on desired summary size
            memory_key="chat_history",
            return_messages=True,
            chat_memory=chat_history,
            verbose = True
        )

        store[session_id] = memory

    return store[session_id]

test_session_memory.py

# test_session_memory.py
from session_history import get_session_memory

def test_summary_buffer():
    session_id = "demo-session-001"
    
    # Get memory for this session
    memory = get_session_memory(session_id)

    # Simulate a conversation using save_context to enable summarization
    memory.save_context(
        {"input": "Hello, who are you?"},
        {"output": "I'm an AI developed to help you."}
    )

    memory.save_context(
        {"input": "Tell me the entire process of how LangChain handles a memory class including message passing, token estimation, and persistence."},
        {"output": "LangChain is a framework for building LLM-powered apps."}
    )

    # Trigger summarization and retrieve output
    result = memory.load_memory_variables({})

    print("\n--- 🔁 Raw Messages in Buffer ---")
    for msg in memory.chat_memory.messages:
        print(f"{msg.type.capitalize()}: {msg.content}")

    print("\n--- 🧠 Moving Summary ---")
    print(memory.moving_summary_buffer)

    print("\n--- 📦 Final memory.load_memory_variables() Output ---")
    print(result["chat_history"])  # Used in the prompt if you’re chaining

if __name__ == "__main__":
    test_summary_buffer()

output

App Screenshot

Test Conclusion

It works perfectly without any errors.

Use of "ChatMessageHistory" was storing full role tagged messages and  it was thus, using a high context length which can apparently lead to increased latency as well as hallucination risk due to large context length. So, I have used "ConversationSummaryBufferMemory" in place of "ChatMessageHistory". This will summarize the older chats retaining all important parts of conversation and keep the recent messages with roles tagged  with them while strictly maintaining the provided max limit of token length which will surely reduce latency and hallucination risk due to short, exact and crisp summary of chat history instead of using full chat history in vanilla form. Groq has been used bcoz of its fast response to calls which will further reduce latency and help maintaining user-engagement.

Signed-off-by: Ayush Shaurya Jha <[email protected]>
Added langchain-core, langchain-groq and python-dotenv with repect to the changes made to session_history.py

Signed-off-by: Ayush Shaurya Jha <[email protected]>
Signed-off-by: Ayush Shaurya Jha <[email protected]>
Signed-off-by: Ayush Shaurya Jha <[email protected]>
@jhaayush2004
Copy link
Author

jhaayush2004 commented May 11, 2025

I have continuously been trying to improve and make the multi agent RAG Pipeline more efficient and user-engaging and want to contribute to it through LFX mentorship 2025. Kindly view the above PR. Waiting for your suggestions !

@jhaayush2004
Copy link
Author

@gcapuzzi Kindly review it and do the needful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant