Skip to content

Commit

Permalink
update readme for GPU
Browse files Browse the repository at this point in the history
  • Loading branch information
RJKeevil authored and riccardopinosio committed Apr 25, 2024
1 parent 06e8d2c commit 8afe37f
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ The goal of this library is to provide an easy, scalable, and hassle-free way to
2. Hassle-free and performant production use: we exclusively support onnx exports of huggingface models. Pytorch transformer models that don't have an onnx version can be easily exported to onnx via [huggingface optimum](https://huggingface.co/docs/optimum/index), and used with the library
3. Run on your hardware: this library is for those who want to run transformer models tightly coupled with their go applications, without the performance drawbacks of having to hit a rest API, or the hassle of setting up and maintaining e.g. a python RPC service that talks to go.

We support all GPU/accelerator backends supported by ONNXRuntime.

## Why

Developing and fine-tuning transformer models with the huggingface python library is a great experience, but if your production stack is golang-based being able to reliably deploy and scale the resulting pytorch models can be challenging and require quite some setup. This library aims to allow you to just lift-and-shift your python model and use the same huggingface pipelines you use for development for inference in a go application.
Expand All @@ -32,6 +34,15 @@ Implementations for additional pipelines will follow. We also very gladly accept

Hugot can be used both as a library and as a command-line application. See below for usage instructions.

Hugot now also supports the following accelerator backends:
- CUDA (tested)
- TensorRT (untested)
- DirectML (untested)
- CoreML (untested)
- OpenVINO (untested)

Please help us out by testing the untested options above and providing feedback, good or bad!

## Limitations

Apart from the fact that only the aforementioned pipelines are currently implemented, the current limitations are:
Expand Down Expand Up @@ -196,6 +207,8 @@ session, err := hugot.NewSession(

InterOpNumThreads and IntraOpNumThreads constricts each goroutine's call to a single core, greatly reducing locking and cache penalties. Disabling CpuMemArena and MemPattern skips pre-allocation of some memory structures, increasing latency, but also throughput efficiency.

For GPU the config above also applies. We are still testing the optimum GPU configuration, whether it is better to run in parallel or with a single thread, and what size of input batch is fastest.

## Contributing

### Development environment
Expand Down

0 comments on commit 8afe37f

Please sign in to comment.