-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Update Inductor windows tutorial with xpu support #3309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 13 commits
1d49a44
d3f2ca0
7335f5b
a773e9b
53f3886
ca95172
372f0c9
7aae4e6
3f09390
ab6da28
615f97b
1ce21b9
8cac362
e40b266
9bc6895
8fff638
a81c441
836b2b9
5af0cd5
c8f5f6c
62358be
fcbd371
a90c38a
8202be7
b41b86c
2ce29a0
eec7ebf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
How to use ``torch.compile`` on Windows CPU/XPU | ||
=============================================== | ||
|
||
**Author**: `Zhaoqiong Zheng <https://github.com/ZhaoqiongZ>`_, `Xu, Han <https://github.com/xuhancn>`_ | ||
|
||
|
||
Introduction | ||
------------ | ||
|
||
TorchInductor is the new compiler backend that compiles the FX Graphs generated by TorchDynamo into optimized C++/Triton kernels. | ||
|
||
This tutorial introduces the steps for utilizing TorchInductor via ``torch.compile`` on Windows CPU/XPU. | ||
|
||
|
||
Software Installation | ||
--------------------- | ||
|
||
Now, we will walk you through a step-by-step tutorial for how to use ``torch.compile`` on Windows CPU/XPU. | ||
|
||
Install a Compiler | ||
^^^^^^^^^^^^^^^^^^ | ||
|
||
C++ compiler is required for torchinductor optimization, let's take Microsoft Visual C++ (MSVC) as an example. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. keep formatting consistent, ie, TorchInductor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. keep formatting for all the TorchInductor |
||
|
||
Download and install `MSVC <https://visualstudio.microsoft.com/downloads/>`_. | ||
ZhaoqiongZ marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
During Installation, select ``Workloads`` table then ``Desktop & Mobile`` Section, check mark on ``Desktop Development with C++`` and then install. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Might be nice to have screenshots here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've added a screenshot here
ZhaoqiongZ marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
.. note:: | ||
|
||
Windows CPU inductor also support C++ compiler `LLVM Compiler <https://github.com/llvm/llvm-project/releases>`_ and `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html>`_ for better performance. | ||
Please check `Alternative Compiler for better performance on CPU <#alternative-compiler-for-better-performance>`_. | ||
|
||
Conda Installation | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Per pytorch/pytorch#149551, Conda is no longer being used. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove the conda installation and let user create and activate virtual environment on their own |
||
^^^^^^^^^^^^^^^^^^ | ||
|
||
Prepare Conda Environment by Miniforge or Anaconda. | ||
For example, download and install `Miniforge <https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Windows-x86_64.exe>`_. | ||
|
||
Set Up Environment | ||
^^^^^^^^^^^^^^^^^^ | ||
|
||
ZhaoqiongZ marked this conversation as resolved.
Show resolved
Hide resolved
|
||
#. Open a command line environment via cmd.exe. | ||
#. Activate ``MSVC`` via below command:: | ||
|
||
"C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat" | ||
#. Activate ``conda`` via below command:: | ||
|
||
"C:/ProgramData/miniforge3/Scripts/activate.bat" | ||
#. Create and activate customer conda environment:: | ||
|
||
conda create -n inductor_windows python=3.10 -y | ||
#. Activate customer conda environment:: | ||
|
||
conda activate inductor_windows | ||
#. Install `PyTorch 2.5 <https://pytorch.org/get-started/locally/>`_ or later for CPU Usage. Install PyTorch 2.7 or later refer to `Getting Started on Intel GPU <https://pytorch.org/docs/main/notes/get_start_xpu.html>`_ for XPU usage. | ||
#. Use torchinductor on Windows:: | ||
|
||
import torch | ||
device="cpu" # or "xpu" for XPU | ||
def foo(x, y): | ||
a = torch.sin(x) | ||
b = torch.cos(x) | ||
return a + b | ||
opt_foo1 = torch.compile(foo) | ||
print(opt_foo1(torch.randn(10, 10).to(device), torch.randn(10, 10).to(device))) | ||
|
||
#. Output of the above example:: | ||
ZhaoqiongZ marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
tensor([[-3.9074e-02, 1.3994e+00, 1.3894e+00, 3.2630e-01, 8.3060e-01, | ||
1.1833e+00, 1.4016e+00, 7.1905e-01, 9.0637e-01, -1.3648e+00], | ||
[ 1.3728e+00, 7.2863e-01, 8.6888e-01, -6.5442e-01, 5.6790e-01, | ||
5.2025e-01, -1.2647e+00, 1.2684e+00, -1.2483e+00, -7.2845e-01], | ||
[-6.7747e-01, 1.2028e+00, 1.1431e+00, 2.7196e-02, 5.5304e-01, | ||
6.1945e-01, 4.6654e-01, -3.7376e-01, 9.3644e-01, 1.3600e+00], | ||
[-1.0157e-01, 7.7200e-02, 1.0146e+00, 8.8175e-02, -1.4057e+00, | ||
8.8119e-01, 6.2853e-01, 3.2773e-01, 8.5082e-01, 8.4615e-01], | ||
[ 1.4140e+00, 1.2130e+00, -2.0762e-01, 3.3914e-01, 4.1122e-01, | ||
8.6895e-01, 5.8852e-01, 9.3310e-01, 1.4101e+00, 9.8318e-01], | ||
[ 1.2355e+00, 7.9290e-02, 1.3707e+00, 1.3754e+00, 1.3768e+00, | ||
9.8970e-01, 1.1171e+00, -5.9944e-01, 1.2553e+00, 1.3394e+00], | ||
[-1.3428e+00, 1.8400e-01, 1.1756e+00, -3.0654e-01, 9.7973e-01, | ||
1.4019e+00, 1.1886e+00, -1.9194e-01, 1.3632e+00, 1.1811e+00], | ||
[-7.1615e-01, 4.6622e-01, 1.2089e+00, 9.2011e-01, 1.0659e+00, | ||
9.0892e-01, 1.1932e+00, 1.3888e+00, 1.3898e+00, 1.3218e+00], | ||
[ 1.4139e+00, -1.4000e-01, 9.1192e-01, 3.0175e-01, -9.6432e-01, | ||
-1.0498e+00, 1.4115e+00, -9.3212e-01, -9.0964e-01, 1.0127e+00], | ||
[ 5.7244e-04, 1.2799e+00, 1.3595e+00, 1.0907e+00, 3.7191e-01, | ||
1.4062e+00, 1.3672e+00, 6.8502e-02, 8.5216e-01, 8.6046e-01]]) | ||
|
||
Alternative Compiler for better performance on CPU | ||
-------------------------------------------------- | ||
|
||
To enhance performance for inductor on Windows CPU, you can use the Intel Compiler or LLVM Compiler. However, they rely on the runtime libraries from Microsoft Visual C++ (MSVC). Therefore, your first step should be to install MSVC. | ||
|
||
Intel Compiler | ||
^^^^^^^^^^^^^^ | ||
|
||
#. Download and install `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html>`_ with Windows version. | ||
#. Set Windows Inductor Compiler via environment variable ``set CXX=icx-cl`` | ||
ZhaoqiongZ marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
LLVM Compiler | ||
^^^^^^^^^^^^^ | ||
|
||
#. Download and install `LLVM Compiler <https://github.com/llvm/llvm-project/releases>`_ and choose win64 version. | ||
#. Set Windows Inductor Compiler via environment variable ``set CXX=clang-cl`` | ||
ZhaoqiongZ marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Conclusion | ||
---------- | ||
|
||
With this tutorial, we introduce how to use Inductor on Windows CPU with PyTorch 2.5 later, and on Windows XPU with PyTorch 2.7 or later. We can also use Intel Compiler or LLVM Compiler to get better performance on CPU. | ||
ZhaoqiongZ marked this conversation as resolved.
Show resolved
Hide resolved
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,130 +1,7 @@ | ||
How to use TorchInductor on Windows CPU | ||
======================================= | ||
svekars marked this conversation as resolved.
Show resolved
Hide resolved
|
||
This tutorial has been moved to https://pytorch.org/tutorials/prototype/inductor_windows.html. | ||
|
||
**Author**: `Zhaoqiong Zheng <https://github.com/ZhaoqiongZ>`_, `Xu, Han <https://github.com/xuhancn>`_ | ||
Redirecting in 3 seconds... | ||
|
||
.. raw:: html | ||
|
||
|
||
TorchInductor is a compiler backend that transforms FX Graphs generated by TorchDynamo into highly optimized C++/Triton kernels. | ||
This tutorial will guide you through the process of using TorchInductor on a Windows CPU. | ||
|
||
.. grid:: 2 | ||
|
||
.. grid-item-card:: :octicon:`mortar-board;1em;` What you will learn | ||
:class-card: card-prerequisites | ||
|
||
* How to compile and execute a Python function with PyTorch, optimized for Windows CPU | ||
* Basics of TorchInductor's optimization using C++/Triton kernels. | ||
|
||
.. grid-item-card:: :octicon:`list-unordered;1em;` Prerequisites | ||
:class-card: card-prerequisites | ||
|
||
* PyTorch v2.5 or later | ||
* Microsoft Visual C++ (MSVC) | ||
* Miniforge for Windows | ||
|
||
Install the Required Software | ||
----------------------------- | ||
|
||
First, let's install the required software. C++ compiler is required for TorchInductor optimization. | ||
We will use Microsoft Visual C++ (MSVC) for this example. | ||
|
||
1. Download and install `MSVC <https://visualstudio.microsoft.com/downloads/>`_. | ||
|
||
2. During the installation, choose **Desktop Development with C++** in the **Desktop & Mobile** section in **Workloads** table. Then install the software | ||
|
||
.. note:: | ||
|
||
We recommend C++ compiler `Clang <https://github.com/llvm/llvm-project/releases>`_ and `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html>`_. | ||
Please check `Alternative Compiler for better performance <#alternative-compiler-for-better-performance>`_. | ||
|
||
3. Download and install `Miniforge3-Windows-x86_64.exe <https://github.com/conda-forge/miniforge/releases/latest/>`__. | ||
|
||
Set Up the Environment | ||
---------------------- | ||
|
||
#. Open the command line environment via ``cmd.exe``. | ||
#. Activate ``MSVC`` with the following command: | ||
|
||
.. code-block:: sh | ||
|
||
"C:/Program Files/Microsoft Visual Studio/2022/Community/VC/Auxiliary/Build/vcvars64.bat" | ||
#. Activate ``conda`` with the following command: | ||
|
||
.. code-block:: sh | ||
|
||
"C:/ProgramData/miniforge3/Scripts/activate.bat" | ||
#. Create and activate a custom conda environment: | ||
|
||
.. code-block:: sh | ||
|
||
conda create -n inductor_cpu_windows python=3.10 -y | ||
conda activate inductor_cpu_windows | ||
|
||
#. Install `PyTorch 2.5 <https://pytorch.org/get-started/locally/>`_ or later. | ||
|
||
Using TorchInductor on Windows CPU | ||
---------------------------------- | ||
|
||
Here’s a simple example to demonstrate how to use TorchInductor: | ||
|
||
.. code-block:: python | ||
|
||
|
||
import torch | ||
def foo(x, y): | ||
a = torch.sin(x) | ||
b = torch.cos(y) | ||
return a + b | ||
opt_foo1 = torch.compile(foo) | ||
print(opt_foo1(torch.randn(10, 10), torch.randn(10, 10))) | ||
|
||
Here is the sample output that this code might return: | ||
|
||
.. code-block:: sh | ||
|
||
tensor([[-3.9074e-02, 1.3994e+00, 1.3894e+00, 3.2630e-01, 8.3060e-01, | ||
1.1833e+00, 1.4016e+00, 7.1905e-01, 9.0637e-01, -1.3648e+00], | ||
[ 1.3728e+00, 7.2863e-01, 8.6888e-01, -6.5442e-01, 5.6790e-01, | ||
5.2025e-01, -1.2647e+00, 1.2684e+00, -1.2483e+00, -7.2845e-01], | ||
[-6.7747e-01, 1.2028e+00, 1.1431e+00, 2.7196e-02, 5.5304e-01, | ||
6.1945e-01, 4.6654e-01, -3.7376e-01, 9.3644e-01, 1.3600e+00], | ||
[-1.0157e-01, 7.7200e-02, 1.0146e+00, 8.8175e-02, -1.4057e+00, | ||
8.8119e-01, 6.2853e-01, 3.2773e-01, 8.5082e-01, 8.4615e-01], | ||
[ 1.4140e+00, 1.2130e+00, -2.0762e-01, 3.3914e-01, 4.1122e-01, | ||
8.6895e-01, 5.8852e-01, 9.3310e-01, 1.4101e+00, 9.8318e-01], | ||
[ 1.2355e+00, 7.9290e-02, 1.3707e+00, 1.3754e+00, 1.3768e+00, | ||
9.8970e-01, 1.1171e+00, -5.9944e-01, 1.2553e+00, 1.3394e+00], | ||
[-1.3428e+00, 1.8400e-01, 1.1756e+00, -3.0654e-01, 9.7973e-01, | ||
1.4019e+00, 1.1886e+00, -1.9194e-01, 1.3632e+00, 1.1811e+00], | ||
[-7.1615e-01, 4.6622e-01, 1.2089e+00, 9.2011e-01, 1.0659e+00, | ||
9.0892e-01, 1.1932e+00, 1.3888e+00, 1.3898e+00, 1.3218e+00], | ||
[ 1.4139e+00, -1.4000e-01, 9.1192e-01, 3.0175e-01, -9.6432e-01, | ||
-1.0498e+00, 1.4115e+00, -9.3212e-01, -9.0964e-01, 1.0127e+00], | ||
[ 5.7244e-04, 1.2799e+00, 1.3595e+00, 1.0907e+00, 3.7191e-01, | ||
1.4062e+00, 1.3672e+00, 6.8502e-02, 8.5216e-01, 8.6046e-01]]) | ||
|
||
Using an Alternative Compiler for Better Performance | ||
------------------------------------------- | ||
|
||
To enhance performance on Windows inductor, you can use the Intel Compiler or LLVM Compiler. However, they rely on the runtime libraries from Microsoft Visual C++ (MSVC). Therefore, your first step should be to install MSVC. | ||
|
||
Intel Compiler | ||
^^^^^^^^^^^^^^ | ||
|
||
#. Download and install `Intel Compiler <https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compiler-download.html>`_ with Windows version. | ||
#. Set Windows Inductor Compiler with the CXX environment variable ``set CXX=icx-cl``. | ||
|
||
Intel also provides a comprehensive step-by-step guide, complete with performance data. Please check `Intel® oneAPI DPC++/C++ Compiler Boosts PyTorch* Inductor Performance on Windows* for CPU Devices <https://www.intel.com/content/www/us/en/developer/articles/technical/boost-pytorch-inductor-performance-on-windows.html>`_. | ||
|
||
LLVM Compiler | ||
^^^^^^^^^^^^^ | ||
|
||
#. Download and install `LLVM Compiler <https://github.com/llvm/llvm-project/releases>`_ and choose win64 version. | ||
#. Set Windows Inductor Compiler with the CXX environment variable ``set CXX=clang-cl``. | ||
|
||
Conclusion | ||
---------- | ||
|
||
In this tutorial, we have learned how to use Inductor on Windows CPU with PyTorch. In addition, we discussed | ||
further performance improvements with Intel Compiler and LLVM Compiler. | ||
<meta http-equiv="Refresh" content="3; url='https://pytorch.org/tutorials/prototype/inductor_windows.html'" /> |
Uh oh!
There was an error while loading. Please reload this page.