-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Update Inductor windows tutorial with xpu support #3309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
27 commits
Select commit
Hold shift + click to select a range
1d49a44
update link to a intel compiler guide with performance data
ZhaoqiongZ d3f2ca0
update
ZhaoqiongZ 7335f5b
add title on the link
ZhaoqiongZ a773e9b
update
ZhaoqiongZ 53f3886
rephrase the sentence
ZhaoqiongZ ca95172
Merge branch 'main' into main
ZhaoqiongZ 372f0c9
Merge branch 'main' into main
svekars 7aae4e6
Merge branch 'pytorch:main' into main
ZhaoqiongZ 3f09390
update inductor windows with xpu support
ZhaoqiongZ ab6da28
Update conclusion inductor_windows.rst
ZhaoqiongZ 615f97b
Update inductor_windows.rst
ZhaoqiongZ 1ce21b9
Merge branch 'main' into main
ZhaoqiongZ 8cac362
add inductor_windows_cpu and redirect to inductor_windows
ZhaoqiongZ e40b266
update inductor_windows
ZhaoqiongZ 9bc6895
Merge branch 'main' into main
ZhaoqiongZ 8fff638
Merge branch 'main' into main
svekars a81c441
Update prototype_source/inductor_windows.rst
ZhaoqiongZ 836b2b9
Update prototype_source/inductor_windows.rst
ZhaoqiongZ 5af0cd5
Update prototype_source/inductor_windows.rst
ZhaoqiongZ c8f5f6c
Update prototype_source/inductor_windows.rst
ZhaoqiongZ 62358be
Update prototype_source/inductor_windows.rst
ZhaoqiongZ fcbd371
Update prototype_source/inductor_windows.rst
ZhaoqiongZ a90c38a
Update prototype_source/inductor_windows.rst
ZhaoqiongZ 8202be7
Update prototype_source/inductor_windows.rst
ZhaoqiongZ b41b86c
Update prototype_source/inductor_windows.rst
ZhaoqiongZ 2ce29a0
Update prototype_source/inductor_windows.rst
ZhaoqiongZ eec7ebf
Merge branch 'main' into main
svekars File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
How to use ``torch.compile`` on Windows CPU/XPU | ||
=============================================== | ||
|
||
**Author**: `Zhaoqiong Zheng <https://github.com/ZhaoqiongZ>`_, `Xu, Han <https://github.com/xuhancn>`_ | ||
|
||
|
||
Introduction | ||
------------ | ||
|
||
TorchInductor is the new compiler backend that compiles the FX Graphs generated by TorchDynamo into optimized C++/Triton kernels. | ||
|
||
This tutorial introduces the steps for using TorchInductor via ``torch.compile`` on Windows CPU/XPU. | ||
|
||
|
||
Software Installation | ||
--------------------- | ||
|
||
Now, we will walk you through a step-by-step tutorial for how to use ``torch.compile`` on Windows CPU/XPU. | ||
|
||
Install a Compiler | ||
^^^^^^^^^^^^^^^^^^ | ||
|
||
C++ compiler is required for TorchInductor optimization, let's take Microsoft Visual C++ (MSVC) as an example. | ||
|
||
1. Download and install `MSVC <https://visualstudio.microsoft.com/downloads/>`_. | ||
|
||
1. During Installation, select **Workloads** and then **Desktop & Mobile**. | ||
1. Select a checkmark on **Desktop Development with C++** and install. | ||
|
||
.. image:: ../_static/img/install_msvc.png | ||
|
||
|
||
.. note:: | ||
|
||
Windows CPU inductor also support C++ compiler `LLVM Compiler <https://github.com/llvm/llvm-project/releases>`_ and `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html>`_ for better performance. | ||
Please check `Alternative Compiler for better performance on CPU <#alternative-compiler-for-better-performance>`_. | ||
|
||
Set Up Environment | ||
^^^^^^^^^^^^^^^^^^ | ||
Next, let's configure our environment. | ||
|
||
#. Open a command line environment via cmd.exe. | ||
#. Activate ``MSVC`` via below command:: | ||
|
||
"C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat" | ||
#. Create and activate a virtual environment: :: | ||
#. Install `PyTorch 2.5 <https://pytorch.org/get-started/locally/>`_ or later for CPU Usage. Install PyTorch 2.7 or later refer to `Getting Started on Intel GPU <https://pytorch.org/docs/main/notes/get_start_xpu.html>`_ for XPU usage. | ||
#. Here is an example of how to use TorchInductor on Windows: | ||
.. code-block:: python | ||
|
||
import torch | ||
device="cpu" # or "xpu" for XPU | ||
def foo(x, y): | ||
a = torch.sin(x) | ||
b = torch.cos(x) | ||
return a + b | ||
opt_foo1 = torch.compile(foo) | ||
print(opt_foo1(torch.randn(10, 10).to(device), torch.randn(10, 10).to(device))) | ||
|
||
#. Below is the output of the above example:: | ||
|
||
tensor([[-3.9074e-02, 1.3994e+00, 1.3894e+00, 3.2630e-01, 8.3060e-01, | ||
1.1833e+00, 1.4016e+00, 7.1905e-01, 9.0637e-01, -1.3648e+00], | ||
[ 1.3728e+00, 7.2863e-01, 8.6888e-01, -6.5442e-01, 5.6790e-01, | ||
5.2025e-01, -1.2647e+00, 1.2684e+00, -1.2483e+00, -7.2845e-01], | ||
[-6.7747e-01, 1.2028e+00, 1.1431e+00, 2.7196e-02, 5.5304e-01, | ||
6.1945e-01, 4.6654e-01, -3.7376e-01, 9.3644e-01, 1.3600e+00], | ||
[-1.0157e-01, 7.7200e-02, 1.0146e+00, 8.8175e-02, -1.4057e+00, | ||
8.8119e-01, 6.2853e-01, 3.2773e-01, 8.5082e-01, 8.4615e-01], | ||
[ 1.4140e+00, 1.2130e+00, -2.0762e-01, 3.3914e-01, 4.1122e-01, | ||
8.6895e-01, 5.8852e-01, 9.3310e-01, 1.4101e+00, 9.8318e-01], | ||
[ 1.2355e+00, 7.9290e-02, 1.3707e+00, 1.3754e+00, 1.3768e+00, | ||
9.8970e-01, 1.1171e+00, -5.9944e-01, 1.2553e+00, 1.3394e+00], | ||
[-1.3428e+00, 1.8400e-01, 1.1756e+00, -3.0654e-01, 9.7973e-01, | ||
1.4019e+00, 1.1886e+00, -1.9194e-01, 1.3632e+00, 1.1811e+00], | ||
[-7.1615e-01, 4.6622e-01, 1.2089e+00, 9.2011e-01, 1.0659e+00, | ||
9.0892e-01, 1.1932e+00, 1.3888e+00, 1.3898e+00, 1.3218e+00], | ||
[ 1.4139e+00, -1.4000e-01, 9.1192e-01, 3.0175e-01, -9.6432e-01, | ||
-1.0498e+00, 1.4115e+00, -9.3212e-01, -9.0964e-01, 1.0127e+00], | ||
[ 5.7244e-04, 1.2799e+00, 1.3595e+00, 1.0907e+00, 3.7191e-01, | ||
1.4062e+00, 1.3672e+00, 6.8502e-02, 8.5216e-01, 8.6046e-01]]) | ||
|
||
Alternative Compiler for better performance on CPU | ||
-------------------------------------------------- | ||
|
||
To enhance performance for inductor on Windows CPU, you can use the Intel Compiler or LLVM Compiler. However, they rely on the runtime libraries from Microsoft Visual C++ (MSVC). Therefore, your first step should be to install MSVC. | ||
|
||
Intel Compiler | ||
^^^^^^^^^^^^^^ | ||
|
||
#. Download and install `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html>`_ with Windows version. | ||
#. Set Windows Inductor Compiler via environment variable ``set CXX=icx-cl``. | ||
|
||
LLVM Compiler | ||
^^^^^^^^^^^^^ | ||
|
||
#. Download and install `LLVM Compiler <https://github.com/llvm/llvm-project/releases>`_ and choose win64 version. | ||
#. Set Windows Inductor Compiler via environment variable ``set CXX=clang-cl``. | ||
|
||
Conclusion | ||
---------- | ||
|
||
In this tutorial, we introduce how to use Inductor on Windows CPU with PyTorch 2.5 or later, and on Windows XPU with PyTorch 2.7 or later. We can also use Intel Compiler or LLVM Compiler to get better performance on CPU. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,130 +1,7 @@ | ||
How to use TorchInductor on Windows CPU | ||
======================================= | ||
This tutorial has been moved to https://pytorch.org/tutorials/prototype/inductor_windows.html. | ||
|
||
**Author**: `Zhaoqiong Zheng <https://github.com/ZhaoqiongZ>`_, `Xu, Han <https://github.com/xuhancn>`_ | ||
Redirecting in 3 seconds... | ||
|
||
.. raw:: html | ||
|
||
|
||
TorchInductor is a compiler backend that transforms FX Graphs generated by TorchDynamo into highly optimized C++/Triton kernels. | ||
This tutorial will guide you through the process of using TorchInductor on a Windows CPU. | ||
|
||
.. grid:: 2 | ||
|
||
.. grid-item-card:: :octicon:`mortar-board;1em;` What you will learn | ||
:class-card: card-prerequisites | ||
|
||
* How to compile and execute a Python function with PyTorch, optimized for Windows CPU | ||
* Basics of TorchInductor's optimization using C++/Triton kernels. | ||
|
||
.. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites | ||
:class-card: card-prerequisites | ||
|
||
* PyTorch v2.5 or later | ||
* Microsoft Visual C++ (MSVC) | ||
* Miniforge for Windows | ||
|
||
Install the Required Software | ||
----------------------------- | ||
|
||
First, let's install the required software. C++ compiler is required for TorchInductor optimization. | ||
We will use Microsoft Visual C++ (MSVC) for this example. | ||
|
||
1. Download and install `MSVC <https://visualstudio.microsoft.com/downloads/>`_. | ||
|
||
2. During the installation, choose **Desktop Development with C++** in the **Desktop & Mobile** section in **Workloads** table. Then install the software | ||
|
||
.. note:: | ||
|
||
We recommend C++ compiler `Clang <https://github.com/llvm/llvm-project/releases>`_ and `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html>`_. | ||
Please check `Alternative Compiler for better performance <#alternative-compiler-for-better-performance>`_. | ||
|
||
3. Download and install `Miniforge3-Windows-x86_64.exe <https://github.com/conda-forge/miniforge/releases/latest/>`__. | ||
|
||
Set Up the Environment | ||
---------------------- | ||
|
||
#. Open the command line environment via ``cmd.exe``. | ||
#. Activate ``MSVC`` with the following command: | ||
|
||
.. code-block:: sh | ||
|
||
"C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat" | ||
#. Activate ``conda`` with the following command: | ||
|
||
.. code-block:: sh | ||
|
||
"C:/ProgramData/miniforge3/Scripts/activate.bat" | ||
#. Create and activate a custom conda environment: | ||
|
||
.. code-block:: sh | ||
|
||
conda create -n inductor_cpu_windows python=3.10 -y | ||
conda activate inductor_cpu_windows | ||
|
||
#. Install `PyTorch 2.5 <https://pytorch.org/get-started/locally/>`_ or later. | ||
|
||
Using TorchInductor on Windows CPU | ||
---------------------------------- | ||
|
||
Here’s a simple example to demonstrate how to use TorchInductor: | ||
|
||
.. code-block:: python | ||
|
||
|
||
import torch | ||
def foo(x, y): | ||
a = torch.sin(x) | ||
b = torch.cos(y) | ||
return a + b | ||
opt_foo1 = torch.compile(foo) | ||
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10))) | ||
|
||
Here is the sample output that this code might return: | ||
|
||
.. code-block:: sh | ||
|
||
tensor([[-3.9074e-02, 1.3994e+00, 1.3894e+00, 3.2630e-01, 8.3060e-01, | ||
1.1833e+00, 1.4016e+00, 7.1905e-01, 9.0637e-01, -1.3648e+00], | ||
[ 1.3728e+00, 7.2863e-01, 8.6888e-01, -6.5442e-01, 5.6790e-01, | ||
5.2025e-01, -1.2647e+00, 1.2684e+00, -1.2483e+00, -7.2845e-01], | ||
[-6.7747e-01, 1.2028e+00, 1.1431e+00, 2.7196e-02, 5.5304e-01, | ||
6.1945e-01, 4.6654e-01, -3.7376e-01, 9.3644e-01, 1.3600e+00], | ||
[-1.0157e-01, 7.7200e-02, 1.0146e+00, 8.8175e-02, -1.4057e+00, | ||
8.8119e-01, 6.2853e-01, 3.2773e-01, 8.5082e-01, 8.4615e-01], | ||
[ 1.4140e+00, 1.2130e+00, -2.0762e-01, 3.3914e-01, 4.1122e-01, | ||
8.6895e-01, 5.8852e-01, 9.3310e-01, 1.4101e+00, 9.8318e-01], | ||
[ 1.2355e+00, 7.9290e-02, 1.3707e+00, 1.3754e+00, 1.3768e+00, | ||
9.8970e-01, 1.1171e+00, -5.9944e-01, 1.2553e+00, 1.3394e+00], | ||
[-1.3428e+00, 1.8400e-01, 1.1756e+00, -3.0654e-01, 9.7973e-01, | ||
1.4019e+00, 1.1886e+00, -1.9194e-01, 1.3632e+00, 1.1811e+00], | ||
[-7.1615e-01, 4.6622e-01, 1.2089e+00, 9.2011e-01, 1.0659e+00, | ||
9.0892e-01, 1.1932e+00, 1.3888e+00, 1.3898e+00, 1.3218e+00], | ||
[ 1.4139e+00, -1.4000e-01, 9.1192e-01, 3.0175e-01, -9.6432e-01, | ||
-1.0498e+00, 1.4115e+00, -9.3212e-01, -9.0964e-01, 1.0127e+00], | ||
[ 5.7244e-04, 1.2799e+00, 1.3595e+00, 1.0907e+00, 3.7191e-01, | ||
1.4062e+00, 1.3672e+00, 6.8502e-02, 8.5216e-01, 8.6046e-01]]) | ||
|
||
Using an Alternative Compiler for Better Performance | ||
------------------------------------------- | ||
|
||
To enhance performance on Windows inductor, you can use the Intel Compiler or LLVM Compiler. However, they rely on the runtime libraries from Microsoft Visual C++ (MSVC). Therefore, your first step should be to install MSVC. | ||
|
||
Intel Compiler | ||
^^^^^^^^^^^^^^ | ||
|
||
#. Download and install `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html>`_ with Windows version. | ||
#. Set Windows Inductor Compiler with the CXX environment variable ``set CXX=icx-cl``. | ||
|
||
Intel also provides a comprehensive step-by-step guide, complete with performance data. Please check `Intel® oneAPI DPC++/C++ Compiler Boosts PyTorch* Inductor Performance on Windows* for CPU Devices <https://www.intel.com/content/www/us/en/developer/articles/technical/boost-pytorch-inductor-performance-on-windows.html>`_. | ||
|
||
LLVM Compiler | ||
^^^^^^^^^^^^^ | ||
|
||
#. Download and install `LLVM Compiler <https://github.com/llvm/llvm-project/releases>`_ and choose win64 version. | ||
#. Set Windows Inductor Compiler with the CXX environment variable ``set CXX=clang-cl``. | ||
|
||
Conclusion | ||
---------- | ||
|
||
In this tutorial, we have learned how to use Inductor on Windows CPU with PyTorch. In addition, we discussed | ||
further performance improvements with Intel Compiler and LLVM Compiler. | ||
<meta http-equiv="Refresh" content="3; url='https://pytorch.org/tutorials/prototype/inductor_windows.html'" /> |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.