A websocket client that processes messages from a socket and tries to dentable conversations and categorize them as possible calendar events
The Async flow uses qwq:32b llm running with ollama for the disentagling the stream of messages coming in before classification.
There are two problem statements for which we have models for,
This is a bert model trained over synthetic data generated for classifying a group of messages to have an intent to get on a call or not.
The saynthetic data generation and the model creation code can be found under notebooks folder
The process of converting a stream of messages into conversations is called conversation disentanglement
There are two models,
- Rule Based model, this looks at embedding similarity, user mentions and timestamp differences
- Last Six Message based model this model is based on the assumption that you can solve this problem by just looking at the last 6 messages of the stream and classify the new incoming message as part of either of them or the new one
- setup
.env
file withcp .env.example .env
and set the WS_SOCK variable, this is the websocket that gets ingested. - setup docker if you don't have it yet
- run
sh build_and_run.sh
- what it does
- creates model (this is where the model gets stored) and results (the output will be generated here) folder
- it checks if you have the model files if not downloads them
- run docker build
- run docker run command with the required attached volumes
- execute the client.py in docker, this will start processing the stream
- what it does
- setup the
.env
file with the WS_SOCK variable - setup uv https://docs.astral.sh/uv/getting-started/installation/
- create
model/bert_classifier_v1
andresults
folder - download the model files from
- run
uv run ingest
- after setting up uv, you can run
uv run pytest