-
Notifications
You must be signed in to change notification settings - Fork 258
Open
Labels
Description
There are several projects aiming to make inference on CPU efficient.
The first part is research:
- Which project works better,
- And compatible with Refact license,
- And doesn't bloat the docker too much,
- And allows to use scratchpads similar to how
inference_hf.py
does it (needs a callback that streams output and allows to stop), - Does it include Mac M1/M2 support, or does it make sense to address Mac separately.
Please finish the first part, get a "go-ahead" for the second part.
The second part is implementation:
- Script similar to
inference_hf.py
, - Little code,
- Not much dependencies,
- Demonstrate that it works with Refact-1.6b model, as well as StarCoder (at least the smaller sizes),
- Integration with UI and watchdog is a plus, but efficient inference is obviously the priority.
flexchar, nitrag and TwanLuttik