Skip to content

Commit 51018ba

Browse files
Docs preview for PR #2660.
1 parent 33fc8f9 commit 51018ba

File tree

6 files changed

+11
-11
lines changed

6 files changed

+11
-11
lines changed

pr-2660/_sources/using/backends/sims/svsims.rst.txt

+2-2
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ setting the target. It is worth drawing attention to gate fusion, a powerful too
107107
- Description
108108
* - ``CUDAQ_FUSION_MAX_QUBITS``
109109
- positive integer
110-
- The max number of qubits used for gate fusion. The default value depends on `GPU Compute Capability <https://developer.nvidia.com/cuda-gpus>`__ (CC). Specifically, the default is 5 for CC 8, 6 for CC 9, and 4 for CC 10. All others will use a default value of `4`.
110+
- The max number of qubits used for gate fusion. The default value depends on `GPU Compute Capability <https://developer.nvidia.com/cuda-gpus>`__ (CC) and the floating point precision selected for the simulator. Specifically, for CC 8.0, 9.0, and 10.0 the defaults are `4`, `5`, and `5` for `FP32`. For `FP64` the corresponding defaults are `5`, `6`, and `4`. For all other CC, the default is `4` for both precision modes.
111111
* - ``CUDAQ_FUSION_DIAGONAL_GATE_MAX_QUBITS``
112112
- integer greater than or equal to -1
113113
- The max number of qubits used for diagonal gate fusion. The default value is set to `-1` and the fusion size will be automatically adjusted for the better performance. If 0, the gate fusion for diagonal gates is disabled.
@@ -232,7 +232,7 @@ prior to setting the target.
232232
- The qubit count threshold where state vector distribution is activated. Below this threshold, simulation is performed as independent (non-distributed) tasks across all MPI processes for optimal performance. Default is 25.
233233
* - ``CUDAQ_MGPU_FUSE``
234234
- positive integer
235-
- The max number of qubits used for gate fusion. The default value depends on `GPU Compute Capability <https://developer.nvidia.com/cuda-gpus>`__ (CC). Specifically, the default is 5 for CC 8, 6 for CC 9, and 4 for CC 10. All others will use a default value of `6` if there are more than one MPI processes or `4` otherwise.
235+
- The max number of qubits used for gate fusion. The default value depends on `GPU Compute Capability <https://developer.nvidia.com/cuda-gpus>`__ (CC) and the floating point precision selected for the simulator. Specifically, for CC 8.0, 9.0, and 10.0 the defaults are `4`, `5`, and `5` for `FP32`. For `FP64` the corresponding defaults are `5`, `6`, and `4`. For all other CC, the default is `4` for both precision modes.
236236
* - ``CUDAQ_MGPU_P2P_DEVICE_BITS``
237237
- positive integer
238238
- Specify the number of GPUs that can communicate by using GPUDirect P2P. Default value is 0 (P2P communication is disabled).

pr-2660/applications/python/deutschs_algorithm.html

+2-2
Original file line numberDiff line numberDiff line change
@@ -816,7 +816,7 @@ <h2>XOR <span class="math notranslate nohighlight">\(\oplus\)</span><a class="he
816816
</section>
817817
<section id="Quantum-oracles">
818818
<h2>Quantum oracles<a class="headerlink" href="#Quantum-oracles" title="Permalink to this heading"></a></h2>
819-
<p><img alt="c269f65917f043258df3cd656b7c3ef6" class="no-scaled-link" src="../../_images/oracle.png" style="width: 300px; height: 150px;" /></p>
819+
<p><img alt="cd459d1f95934fdea70a8623142b1b46" class="no-scaled-link" src="../../_images/oracle.png" style="width: 300px; height: 150px;" /></p>
820820
<p>Suppose we have <span class="math notranslate nohighlight">\(f(x): \{0,1\} \longrightarrow \{0,1\}\)</span>. We can compute this function on a quantum computer using oracles which we treat as black box functions that yield the output with an appropriate sequence of logical gates.</p>
821821
<p>Above you see an oracle represented as <span class="math notranslate nohighlight">\(U_f\)</span> which allows us to transform the state <span class="math notranslate nohighlight">\(\ket{x}\ket{y}\)</span> into:</p>
822822
<div class="math notranslate nohighlight">
@@ -864,7 +864,7 @@ <h2>Quantum parallelism<a class="headerlink" href="#Quantum-parallelism" title="
864864
<h2>Deutsch’s Algorithm:<a class="headerlink" href="#Deutsch's-Algorithm:" title="Permalink to this heading"></a></h2>
865865
<p>Our aim is to find out if <span class="math notranslate nohighlight">\(f: \{0,1\} \longrightarrow \{0,1\}\)</span> is a constant or a balanced function? If constant, <span class="math notranslate nohighlight">\(f(0) = f(1)\)</span>, and if balanced, <span class="math notranslate nohighlight">\(f(0) \neq f(1)\)</span>.</p>
866866
<p>We step through the circuit diagram below and follow the math after the application of each gate.</p>
867-
<p><img alt="706e0cff78994efa904661fd25febe5c" class="no-scaled-link" src="../../_images/deutsch.png" style="width: 500px; height: 210px;" /></p>
867+
<p><img alt="1399566aaf23404e84b6657cedc13fa3" class="no-scaled-link" src="../../_images/deutsch.png" style="width: 500px; height: 210px;" /></p>
868868
<div class="math notranslate nohighlight">
869869
\[\ket{\psi_0} = \ket{01}
870870
\tag{1}\]</div>

pr-2660/examples/python/performance_optimizations.html

+2-2
Original file line numberDiff line numberDiff line change
@@ -744,9 +744,9 @@ <h1>Optimizing Performance<a class="headerlink" href="#Optimizing-Performance" t
744744
<section id="Gate-Fusion">
745745
<h2>Gate Fusion<a class="headerlink" href="#Gate-Fusion" title="Permalink to this heading"></a></h2>
746746
<p>Gate fusion is an optimization technique where consecutive gates are combined into a single gate operation to improve the efficiency of the simulation (See figure below). By targeting the <code class="docutils literal notranslate"><span class="pre">nvidia-mgpu</span></code> backend and setting the <code class="docutils literal notranslate"><span class="pre">CUDAQ_MGPU_FUSE</span></code> environment variable, you can select the degree of fusion that takes place. A full command line example would look like <code class="docutils literal notranslate"><span class="pre">CUDAQ_MGPU_FUSE=4</span> <span class="pre">python</span> <span class="pre">c2h2VQE.py</span> <span class="pre">--target</span> <span class="pre">nvidia</span> <span class="pre">--target-option</span> <span class="pre">fp64,mgpu</span></code></p>
747-
<p><img alt="6ee9c13954c146d2985e006149a91e47" src="../../_images/gate-fuse.png" /></p>
747+
<p><img alt="e1669ee09c184782b012d8cb72e7782d" src="../../_images/gate-fuse.png" /></p>
748748
<p>The importance of gate fusion is system dependent, but can have a large influence on the performance of the simulation. See the example below for a 24 qubit VQE experiment where changing the fusion level resulted in significant performance boosts.</p>
749-
<p><img alt="ce7ed8128dd141d0a0b1ad7118abeb1e" src="../../_images/gatefusion.png" /></p>
749+
<p><img alt="780fa1179ff7454abb9771bd0da1d7e2" src="../../_images/gatefusion.png" /></p>
750750
</section>
751751
</section>
752752

pr-2660/searchindex.js

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pr-2660/sphinx/using/backends/sims/svsims.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ setting the target. It is worth drawing attention to gate fusion, a powerful too
107107
- Description
108108
* - ``CUDAQ_FUSION_MAX_QUBITS``
109109
- positive integer
110-
- The max number of qubits used for gate fusion. The default value depends on `GPU Compute Capability <https://developer.nvidia.com/cuda-gpus>`__ (CC). Specifically, the default is 5 for CC 8, 6 for CC 9, and 4 for CC 10. All others will use a default value of `4`.
110+
- The max number of qubits used for gate fusion. The default value depends on `GPU Compute Capability <https://developer.nvidia.com/cuda-gpus>`__ (CC) and the floating point precision selected for the simulator. Specifically, for CC 8.0, 9.0, and 10.0 the defaults are `4`, `5`, and `5` for `FP32`. For `FP64` the corresponding defaults are `5`, `6`, and `4`. For all other CC, the default is `4` for both precision modes.
111111
* - ``CUDAQ_FUSION_DIAGONAL_GATE_MAX_QUBITS``
112112
- integer greater than or equal to -1
113113
- The max number of qubits used for diagonal gate fusion. The default value is set to `-1` and the fusion size will be automatically adjusted for the better performance. If 0, the gate fusion for diagonal gates is disabled.
@@ -232,7 +232,7 @@ prior to setting the target.
232232
- The qubit count threshold where state vector distribution is activated. Below this threshold, simulation is performed as independent (non-distributed) tasks across all MPI processes for optimal performance. Default is 25.
233233
* - ``CUDAQ_MGPU_FUSE``
234234
- positive integer
235-
- The max number of qubits used for gate fusion. The default value depends on `GPU Compute Capability <https://developer.nvidia.com/cuda-gpus>`__ (CC). Specifically, the default is 5 for CC 8, 6 for CC 9, and 4 for CC 10. All others will use a default value of `6` if there are more than one MPI processes or `4` otherwise.
235+
- The max number of qubits used for gate fusion. The default value depends on `GPU Compute Capability <https://developer.nvidia.com/cuda-gpus>`__ (CC) and the floating point precision selected for the simulator. Specifically, for CC 8.0, 9.0, and 10.0 the defaults are `4`, `5`, and `5` for `FP32`. For `FP64` the corresponding defaults are `5`, `6`, and `4`. For all other CC, the default is `4` for both precision modes.
236236
* - ``CUDAQ_MGPU_P2P_DEVICE_BITS``
237237
- positive integer
238238
- Specify the number of GPUs that can communicate by using GPUDirect P2P. Default value is 0 (P2P communication is disabled).

pr-2660/using/backends/sims/svsims.html

+2-2
Original file line numberDiff line numberDiff line change
@@ -820,7 +820,7 @@ <h2>Single-GPU<a class="headerlink" href="#single-gpu" title="Permalink to this
820820
</tr>
821821
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">CUDAQ_FUSION_MAX_QUBITS</span></code></p></td>
822822
<td><p>positive integer</p></td>
823-
<td><p>The max number of qubits used for gate fusion. The default value depends on <a class="reference external" href="https://developer.nvidia.com/cuda-gpus">GPU Compute Capability</a> (CC). Specifically, the default is 5 for CC 8, 6 for CC 9, and 4 for CC 10. All others will use a default value of <code class="code docutils literal notranslate"><span class="pre">4</span></code>.</p></td>
823+
<td><p>The max number of qubits used for gate fusion. The default value depends on <a class="reference external" href="https://developer.nvidia.com/cuda-gpus">GPU Compute Capability</a> (CC) and the floating point precision selected for the simulator. Specifically, for CC 8.0, 9.0, and 10.0 the defaults are <code class="code docutils literal notranslate"><span class="pre">4</span></code>, <code class="code docutils literal notranslate"><span class="pre">5</span></code>, and <code class="code docutils literal notranslate"><span class="pre">5</span></code> for <code class="code docutils literal notranslate"><span class="pre">FP32</span></code>. For <code class="code docutils literal notranslate"><span class="pre">FP64</span></code> the corresponding defaults are <code class="code docutils literal notranslate"><span class="pre">5</span></code>, <code class="code docutils literal notranslate"><span class="pre">6</span></code>, and <code class="code docutils literal notranslate"><span class="pre">4</span></code>. For all other CC, the default is <code class="code docutils literal notranslate"><span class="pre">4</span></code> for both precision modes.</p></td>
824824
</tr>
825825
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">CUDAQ_FUSION_DIAGONAL_GATE_MAX_QUBITS</span></code></p></td>
826826
<td><p>integer greater than or equal to -1</p></td>
@@ -938,7 +938,7 @@ <h2>Multi-node multi-GPU<a class="headerlink" href="#multi-node-multi-gpu" title
938938
</tr>
939939
<tr class="row-odd"><td><p><code class="docutils literal notranslate"><span class="pre">CUDAQ_MGPU_FUSE</span></code></p></td>
940940
<td><p>positive integer</p></td>
941-
<td><p>The max number of qubits used for gate fusion. The default value depends on <a class="reference external" href="https://developer.nvidia.com/cuda-gpus">GPU Compute Capability</a> (CC). Specifically, the default is 5 for CC 8, 6 for CC 9, and 4 for CC 10. All others will use a default value of <code class="code docutils literal notranslate"><span class="pre">6</span></code> if there are more than one MPI processes or <code class="code docutils literal notranslate"><span class="pre">4</span></code> otherwise.</p></td>
941+
<td><p>The max number of qubits used for gate fusion. The default value depends on <a class="reference external" href="https://developer.nvidia.com/cuda-gpus">GPU Compute Capability</a> (CC) and the floating point precision selected for the simulator. Specifically, for CC 8.0, 9.0, and 10.0 the defaults are <code class="code docutils literal notranslate"><span class="pre">4</span></code>, <code class="code docutils literal notranslate"><span class="pre">5</span></code>, and <code class="code docutils literal notranslate"><span class="pre">5</span></code> for <code class="code docutils literal notranslate"><span class="pre">FP32</span></code>. For <code class="code docutils literal notranslate"><span class="pre">FP64</span></code> the corresponding defaults are <code class="code docutils literal notranslate"><span class="pre">5</span></code>, <code class="code docutils literal notranslate"><span class="pre">6</span></code>, and <code class="code docutils literal notranslate"><span class="pre">4</span></code>. For all other CC, the default is <code class="code docutils literal notranslate"><span class="pre">4</span></code> for both precision modes.</p></td>
942942
</tr>
943943
<tr class="row-even"><td><p><code class="docutils literal notranslate"><span class="pre">CUDAQ_MGPU_P2P_DEVICE_BITS</span></code></p></td>
944944
<td><p>positive integer</p></td>

0 commit comments

Comments
 (0)