Skip to content

Commit 010fb71

Browse files
tjtanaayhcheong
andauthored
[Win] [Pyinstaller] [OpenVINO] Add OpenVINO Support (#21)
## ADDED Engine - Added OpenVINO support. #19 ### CHANGES / FIXES Ipex-LLM Engine - Model generation does not adhere to the max_tokens params. #20 --------- Co-authored-by: tjtanaa <[email protected]> Co-authored-by: yhcheong <[email protected]>
1 parent dc3805e commit 010fb71

File tree

10 files changed

+383
-190
lines changed

10 files changed

+383
-190
lines changed

CHANGELOG.md

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,38 @@
11
# CHANGELOG
22

3-
## [Unrelease]
3+
## [Unreleased]
4+
5+
## ADDED
6+
7+
Engine
8+
9+
- Added OpenVINO support. #19
10+
11+
### CHANGES / FIXES
12+
13+
Ipex-LLM Engine
14+
15+
- Model generation does not adhere to the max_tokens params. #20
16+
17+
## [v0.2.0a]
418

519
### ADDED
620

721
DOC
22+
823
- Update `README.md` to include usage of precompiled engine executable.
924

1025
### CHANGES / FIXES
1126

27+
Installation
28+
29+
- Fixed the `ipex-llm` pypi library version.
30+
1231
Engine
32+
1333
- Re-structure the configuration to specify which backend and device to launch the `ipex-llm` model.
1434
- Fixed Non-Streaming Mode of ONNX is returning the Prompt in the Response #12
1535

1636
PyInstaller Executable
17-
- Update the `ellm_api_server.spec` to support compilation of `ipex-llm` into executable. #14
37+
38+
- Update the `ellm_api_server.spec` to support compilation of `ipex-llm` into executable. #14

README.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,11 +69,13 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
6969
- **CPU:** `$env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu]`
7070
- **CUDA:** `$env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda]`
7171
- **IPEX:** `$env:ELLM_TARGET_DEVICE='ipex'; python setup.py develop`
72+
- **OpenVINO:** `$env:ELLM_TARGET_DEVICE='openvino'; pip install -e .[openvino]`
7273
- **With Web UI**:
7374
- **DirectML:** `$env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml,webui]`
7475
- **CPU:** `$env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu,webui]`
7576
- **CUDA:** `$env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda,webui]`
7677
- **IPEX:** `$env:ELLM_TARGET_DEVICE='ipex'; python setup.py develop; pip install -r requirements-webui.txt`
78+
- **OpenVINO:** `$env:ELLM_TARGET_DEVICE='openvino'; pip install -e .[openvino,webui]`
7779

7880
- **Linux**
7981

@@ -88,11 +90,13 @@ Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). E
8890
- **CPU:** `ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu]`
8991
- **CUDA:** `ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda]`
9092
- **IPEX:** `ELLM_TARGET_DEVICE='ipex' python setup.py develop`
93+
- **OpenVINO:** `ELLM_TARGET_DEVICE='openvino' pip install -e .[openvino]`
9194
- **With Web UI**:
9295
- **DirectML:** `ELLM_TARGET_DEVICE='directml' pip install -e .[directml,webui]`
9396
- **CPU:** `ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu,webui]`
9497
- **CUDA:** `ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda,webui]`
95-
- **IPEX:** `$env:ELLM_TARGET_DEVICE='ipex'; python setup.py develop; pip install -r requirements-webui.txt`
98+
- **IPEX:** `ELLM_TARGET_DEVICE='ipex' python setup.py develop; pip install -r requirements-webui.txt`
99+
- **OpenVINO:** `ELLM_TARGET_DEVICE='openvino' pip install -e .[openvino,webui]`
96100

97101
### Launch OpenAI API Compatible Server
98102

@@ -131,12 +135,29 @@ It is an interface that allows you to download and deploy OpenAI API compatible
131135
132136
## Compile OpenAI-API Compatible Server into Windows Executable
133137
138+
**NOTE:** OpenVINO packaging currently uses `torch==2.4.0`. It will not be able to run due to missing dependencies which is `libomp`. Make sure to install `libomp` and add the `libomp-xxxxxxx.dll` to `C:\\Windows\\System32`.
139+
134140
1. Install `embeddedllm`.
135141
2. Install PyInstaller: `pip install pyinstaller==6.9.0`.
136142
3. Compile Windows Executable: `pyinstaller .\ellm_api_server.spec`.
137143
4. You can find the executable in the `dist\ellm_api_server`.
138144
5. Use it like `ellm_server`. `.\ellm_api_server.exe --model_path <path/to/model/weight>`.
139145
146+
_Powershell/Terminal Usage_:
147+
148+
```powershell
149+
ellm_server --model_path <path/to/model/weight>
150+
151+
# DirectML
152+
ellm_server --model_path 'EmbeddedLLM_Phi-3-mini-4k-instruct-062024-onnx\onnx\directml\Phi-3-mini-4k-instruct-062024-int4' --port 5555
153+
154+
# IPEX-LLM
155+
ellm_server --model_path '.\meta-llama_Meta-Llama-3.1-8B-Instruct\' --backend 'ipex' --device 'xpu' --port 5555 --served_model_name 'meta-llama_Meta/Llama-3.1-8B-Instruct'
156+
157+
# OpenVINO
158+
ellm_server --model_path '.\meta-llama_Meta-Llama-3.1-8B-Instruct\' --backend 'openvino' --device 'gpu' --port 5555 --served_model_name 'meta-llama_Meta/Llama-3.1-8B-Instruct'
159+
```
160+
140161
## Prebuilt OpenAI API Compatible Windows Executable (Alpha)
141162

142163
You can find the prebuilt OpenAI API Compatible Windows Executable in the Release page.
@@ -151,6 +172,9 @@ _Powershell/Terminal Usage (Use it like `ellm_server`)_:
151172
152173
# IPEX-LLM
153174
.\ellm_api_server.exe --model_path '.\meta-llama_Meta-Llama-3.1-8B-Instruct\' --backend 'ipex' --device 'xpu' --port 5555 --served_model_name 'meta-llama_Meta/Llama-3.1-8B-Instruct'
175+
176+
# OpenVINO
177+
.\ellm_api_server.exe --model_path '.\meta-llama_Meta-Llama-3.1-8B-Instruct\' --backend 'openvino' --device 'gpu' --port 5555 --served_model_name 'meta-llama_Meta/Llama-3.1-8B-Instruct'
154178
```
155179

156180
## Acknowledgements

ellm_api_server.spec

Lines changed: 22 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ def get_embeddedllm_backend():
2020
version = importlib.metadata.version("embeddedllm")
2121

2222
# Use regex to extract the backend
23-
match = re.search(r"\+(directml|cpu|cuda|ipex)$", version)
23+
match = re.search(r"\+(directml|cpu|cuda|ipex|openvino)$", version)
2424

2525
if match:
2626
backend = match.group(1)
@@ -36,18 +36,17 @@ backend = get_embeddedllm_backend()
3636

3737
binaries_list = []
3838

39+
binaries_list.extend([
40+
(Path('C:\\Windows\\System32\\libomp140.x86_64.dll').as_posix(), '.'),
41+
(Path('C:\\Windows\\System32\\libomp140d.x86_64.dll').as_posix(), '.'),
42+
])
43+
3944
datas_list = [
4045
(Path("src/embeddedllm/entrypoints/api_server.py").resolve().as_posix(), 'embeddedllm/entrypoints'),
4146
]
4247
datas_list.extend(collect_data_files('torch', include_py_files=True))
4348

4449
hiddenimports_list = ['multipart']
45-
# Add missing hidden imports
46-
#hiddenimports_list.extend([
47-
# 'torch', 'torchvision', 'intel_extension_for_pytorch',
48-
# 'intel_extension_for_pytorch.xpu', 'intel_extension_for_pytorch.xpu.fp8',
49-
# 'intel_extension_for_pytorch.nn.utils'
50-
#])
5150

5251
pathex = []
5352

@@ -60,6 +59,7 @@ def add_package(package_name):
6059
if backend in ('directml', 'cpu', 'cuda'):
6160
add_package('onnxruntime')
6261
add_package('onnxruntime_genai')
62+
6363
elif backend == 'ipex':
6464
print(f"Backend IPEX")
6565
add_package('ipex_llm')
@@ -71,6 +71,21 @@ elif backend == 'ipex':
7171
add_package('numpy')
7272
binaries_list.append((f'{CONDA_PATH.parent}/Library/bin/*', '.'))
7373

74+
elif backend == 'openvino':
75+
print(f"Backend OpenVino")
76+
add_package('onnx')
77+
add_package('torch')
78+
add_package('torchvision')
79+
add_package('optimum')
80+
add_package('optimum.intel')
81+
add_package('embeddedllm')
82+
add_package('numpy')
83+
add_package('openvino')
84+
add_package('openvino-genai')
85+
add_package('openvino-telemetry')
86+
add_package('openvino-tokenizers')
87+
binaries_list.append((f'{CONDA_PATH.parent}/Library/bin/*', '.'))
88+
7489
print(binaries_list)
7590

7691
with open("binary.txt", 'w') as f:

requirements-openvino.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
optimum-intel[openvino,nncf]@git+https://github.com/huggingface/optimum-intel.git
2+
torch>=2.4
3+
onnx<=1.16.1
4+
transformers>=4.42

scripts/benchmark/benchmark_api_server.py

Whitespace-only changes.

setup.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,10 @@ def _is_ipex() -> bool:
5050
return ELLM_TARGET_DEVICE == "ipex"
5151

5252

53+
def _is_openvino() -> bool:
54+
return ELLM_TARGET_DEVICE == "openvino"
55+
56+
5357
class ELLMInstallCommand(install):
5458
def run(self):
5559
install.run(self)
@@ -157,7 +161,9 @@ def find_version(filepath: str) -> str:
157161

158162
def _read_requirements(filename: str) -> List[str]:
159163
with open(get_path(filename)) as f:
160-
requirements = f.read().strip().split("\n")
164+
# requirements = f.read().strip().split("\n")
165+
requirements = f.readlines()
166+
161167
resolved_requirements = []
162168
for line in requirements:
163169
if line.startswith("-r "):
@@ -178,6 +184,8 @@ def get_requirements() -> List[str]:
178184
requirements = _read_requirements("requirements-cpu.txt")
179185
elif _is_ipex():
180186
requirements = _read_requirements("requirements-ipex.txt")
187+
elif _is_openvino():
188+
requirements = _read_requirements("requirements-openvino.txt")
181189
else:
182190
raise ValueError("Unsupported platform, please use CUDA, ROCm, Neuron, or CPU.")
183191
return requirements
@@ -194,6 +202,8 @@ def get_ellm_version() -> str:
194202
version += "+cpu"
195203
elif _is_ipex():
196204
version += "+ipex"
205+
elif _is_openvino():
206+
version += "+openvino"
197207
else:
198208
raise RuntimeError("Unknown runtime environment")
199209

@@ -245,6 +255,7 @@ def get_ellm_version() -> str:
245255
"webui": _read_requirements("requirements-webui.txt"),
246256
"cuda": ["onnxruntime-genai-cuda==0.3.0rc2"],
247257
"ipex": [],
258+
"openvino": [],
248259
},
249260
dependency_links=dependency_links,
250261
entry_points={

0 commit comments

Comments
 (0)