A generalised, extendable and modular solution for LLM automated / assisted classification.
The main features offered by the package are;
- Vectorising - Creating text embeddings (vectors) from text using a variety of embedding models
- Indexing - The process of creating VectorStores from (large) text files
- Serving - Making a VectorStore available through a REST-API to search
pip install git+https://github.com/datasciencecampus/classifAIOr to install the built wheel
pip install git+https://github.com/datasciencecampus/classifAI/releases/download/v0.1.0/classifai-0.1.0-py3-none-any.whlFirst create a Vectoriser Model, which allows users to pass text to its .transform() method to convert the text to a vector.
#Create a vectoriser model
from classifai.vectorisers import HuggingfaceVectoriser
your_vectoriser = HuggingFaceVectoriser(model_name="sentence-transformers/all-MiniLM-L6-v2")
vector = your_vectoriser.transform("ClassifAI_package is the best classification tool ever!")
print(vector.shape, type(vector))Then pass the vectoriser and a CSV file to a VectorStore constructor to build a vector database that you can interact with through the class.
from classifai.indexers import VectorStore
your_vector_store = VectorStore(
file_name="<PATH_TO_YOUR_CSV_FILE>.csv",
data_type="csv",
embedder=your_vectoriser,
batch_size=8,
meta_data={'extra_column_1': int, 'extra_column_2': str},
output_dir="my_vector_store"
)You can 'search' the VectorStore on your local system.
your_vector_store.search("your query about your data goes here", n_results=5)
#other statistics about the vector store are available
your_vector_store.num_vectors
your_vector_store.vector_shapeThe vectors and metadata will be stored in the my_vector_store/ folder, to be quickly reloaded later.
reloaded_vector_store = VectorStore.from_filespace('my_vector_store', your_vectoriser)
reloaded_vector_store.search("your query about your data goes here")When you're happy with your VectorStore model, you can start a REST-API service:
from classifai.servers import start_api
start_api(vector_stores=[your_vector_store], endpoint_names=["your_data"], port=8000)This will run a FastAPI based REST-API service on your machine and you can find its docs:
http://localhost:8000
http://127.0.0.1:8000
- Clone the repo:
git clone [email protected]:datasciencecampus/classifAI.git
cd classifAI- Set up pre-commit hooks:
make setup-git-hooks(or, if you don't have Docker available)
make setup-git-hooks-no-docker- Create / activate the virtual environment:
uv lock
uv syncAnd that's you good to go!
During development, you might want to run linters / code vulnerability scans; you can do so at any point via
make check-python