NVIDIA
diff --git a/‎pr-2632/_images/qpus.png
95.4 KB b/‎pr-2632/_images/qpus.png
95.4 KB
diff --git a/‎pr-2632/_sources/api/languages/python_api.rst.txt
+1 b/‎pr-2632/_sources/api/languages/python_api.rst.txt
+1
diff --git a/‎pr-2632/_sources/specification/cudaq/algorithmic_primitives.rst.txt
+60-24 b/‎pr-2632/_sources/specification/cudaq/algorithmic_primitives.rst.txt
+60-24
diff --git a/‎pr-2632/_sources/using/backends/cloud/braket.rst.txt
+2-2 b/‎pr-2632/_sources/using/backends/cloud/braket.rst.txt
+2-2
diff --git a/‎pr-2632/_sources/using/backends/hardware/neutralatom.rst.txt
+6-4 b/‎pr-2632/_sources/using/backends/hardware/neutralatom.rst.txt
+6-4
diff --git a/‎pr-2632/_sources/using/backends/sims/noisy.rst.txt
+103-3 b/‎pr-2632/_sources/using/backends/sims/noisy.rst.txt
+103-3
diff --git a/‎pr-2632/_sources/using/backends/simulators.rst.txt
+3-3 b/‎pr-2632/_sources/using/backends/simulators.rst.txt
+3-3
diff --git a/‎pr-2632/api/api.html
+1 b/‎pr-2632/api/api.html
+1
diff --git a/‎pr-2632/api/default_ops.html
+1 b/‎pr-2632/api/default_ops.html
+1
@@ -218,6 +218,7 @@ Noisy Simulation
 
 .. autoclass:: cudaq::NoiseModel
     :members:
+    :exclude-members: register_channel
     :special-members: __init__
 
 .. autoclass:: cudaq::BitFlipChannel
 
@@ -130,12 +130,16 @@ extract the result information in the following manner:
       };
     }
 
-**[7]** The :code:`sample_result` type enables one to encode measurement results from a 
-quantum circuit sampling task. It keeps track of a list of sample results, each 
-one corresponding to a measurement action during the sampling process and represented 
-by a unique register name. It also tracks a unique global register, the implicit sampling 
-of the state at the end of circuit execution. The API gives fine-grain access 
-to the measurement results for each register. To illustrate this, observe 
+**[7]** By default the :code:`sample_result` type enables one to encode 
+measurement results from a quantum circuit sampling task. It keeps track of a 
+list of sample results, each one corresponding to a measurement action during 
+the sampling process and represented by a unique register name. It also tracks 
+a unique global register, which by default, contains the implicit sampling of 
+the state at the end of circuit execution. If the :code:`explicit_measurements` 
+sample option is enabled, the global register contains all measurements 
+concatenated together in the order the measurements occurred in the kernel. 
+The API gives fine-grain access to the measurement results for each register. 
+To illustrate this, observe 
 
 .. tab:: C++ 
 
@@ -148,8 +152,14 @@ to the measurement results for each register. To illustrate this, observe
       reset (q);
       x(q);
     };
+    
+    printf("Default - no explicit measurements\n");
     cudaq::sample(kernel).dump();
 
+    cudaq::sample_options options{.explicit_measurements = true};
+    printf("Setting `explicit_measurements` option\n");
+    cudaq::sample(options, kernel).dump();
+
 .. tab:: Python 
 
   .. code-block:: python 
@@ -162,30 +172,41 @@ to the measurement results for each register. To illustrate this, observe
        reset(q)
        x(q)
     
+    print("Default - no explicit measurements")
     cudaq.sample(kernel).dump()
 
+    print("Setting `explicit_measurements` option")
+    cudaq.sample(kernel, explicit_measurements=True).dump() 
+
 should produce 
 
 .. code-block:: bash 
 
+    Default - no explicit measurements
     { 
       __global__ : { 1:1000 }
-      reg1 : { 0:501 1:499 }
+      reg1 : { 0:506 1:494 }
     }
 
+    Setting `explicit_measurements` option
+    { 0:479 1:521 }
+
 Here we see that we have measured a qubit in a uniform superposition to a 
 register named :code:`reg1`, and followed it with a reset and the application 
-of an NOT operation. The :code:`sample_result` returned for this sampling 
-tasks contains the default :code:`__global__` register as well as the user 
-specified :code:`reg1` register. 
+of an NOT operation. By default the :code:`sample_result` returned for this 
+sampling tasks contains the default :code:`__global__` register as well as the 
+user specified :code:`reg1` register. 
 
 The contents of the :code:`__global__` register will depend on how your kernel
 is written:
 
-1. If no measurements appear in the kernel, then the :code:`__global__`
-   register is formed with implicit measurements being added for *all* the
-   qubits defined in the kernel, and the measurements all occur at the end of
-   the kernel. The order of the bits in the bitstring corresponds to the qubit
+1. If no measurements appear in the kernel, then the :code:`__global__` 
+   register is formed with implicit measurements being added for *all* the 
+   qubits defined in the kernel, and the measurements all occur at the end of 
+   the kernel. This is not supported when sampling with the 
+   :code:`explicit_measurements` option; kernels executed with 
+   :code:`explicit_measurements` mode must contain measurements.   
+   The order of the bits in the bitstring corresponds to the qubit
    allocation order specified in the kernel.  That is - the :code:`[0]` element
    in the :code:`__global__` bitstring corresponds with the first declared qubit
    in the kernel. For example,
@@ -222,12 +243,15 @@ should produce
 2. Conversely, if any measurements appear in the kernel, then only the measured
    qubits will appear in the :code:`__global__` register. Similar to #1, the 
    bitstring corresponds to the qubit allocation order specified in the kernel.
-   Also (again, similar to #1), the values of the sampled qubits always
-   correspond to the values *at the end of the kernel execution*. That is - if a
-   qubit is measured in the middle of a kernel and subsequent operations change
-   the state of the qubit, the qubit will be implicitly re-measured at the end
-   of the kernel, and that re-measured value is the value that will appear in
-   the :code:`__global__` register. For example,
+   Also (again, similar to #1), the values of the sampled qubits always 
+   correspond to the values *at the end of the kernel execution*, unless the 
+   :code:`explicit_measurements` option is enabled. That is - if a qubit is 
+   measured in the middle of a kernel and subsequent operations change the state
+   of the qubit, the qubit will be implicitly re-measured at the end of the 
+   kernel, and that re-measured value is the value that will appear in the 
+   :code:`__global__` register. If the sampling option :code:`explicit_measurements` 
+   is enabled, then no re-measurements occur, and the global register contains 
+   the concatenated measurements in the order they were executed in the kernel.
 
 .. tab:: C++ 
 
@@ -239,8 +263,14 @@ should produce
          mz(b);
          mz(a);
        };
+       
+       printf("Default - no explicit measurements\n");
        cudaq::sample(kernel).dump();
 
+       cudaq::sample_options options{.explicit_measurements = true};
+       printf("Setting `explicit_measurements` option\n");
+       cudaq::sample(options, kernel).dump();
+
 .. tab:: Python 
 
   .. code-block:: python 
@@ -252,15 +282,21 @@ should produce
         mz(b)
         mz(a)
 
-    cudaq.sample(kernel).dump() 
+    print("Default - no explicit measurements")
+    cudaq.sample(kernel).dump()
+
+    print("Setting `explicit_measurements` option")
+    cudaq.sample(kernel, explicit_measurements=True).dump()
   
 should produce 
 
    .. code-block:: bash 
 
-       { 
-         __global__ : { 10:1000 }
-       }
+       Default - no explicit measurements
+       { 10:1000 }
+
+       Setting `explicit_measurements` option
+       { 01:1000 }
 
 .. note::
 
 
@@ -9,11 +9,11 @@ circuit simulators, and secure, on-demand access to various quantum computers.
 To get started users must enable Amazon Braket in their AWS account by following 
 `these instructions <https://docs.aws.amazon.com/braket/latest/developerguide/braket-enable-overview.html>`__.
 To learn more about Amazon Braket, you can view the `Amazon Braket Documentation <https://docs.aws.amazon.com/braket/>`__ 
-and `Amazon Braket Examples <https://github.com/amazon-braket/amazon-braket-examples>`__.
+and `Amazon Braket Examples <https://github.com/amazon-braket/amazon-braket-examples/tree/main/examples/nvidia_cuda_q>`__.
 A list of available devices and regions can be found `here <https://docs.aws.amazon.com/braket/latest/developerguide/braket-devices.html>`__. 
 
 Users can run CUDA-Q programs on Amazon Braket with `Hybrid Job <https://docs.aws.amazon.com/braket/latest/developerguide/braket-what-is-hybrid-job.html>`__.
-See `this guide <https://docs.aws.amazon.com/braket/latest/developerguide/braket-jobs-first.html>`__ to get started.
+See `this guide <https://docs.aws.amazon.com/braket/latest/developerguide/braket-jobs-first.html>`__ to get started with Hybrid Jobs and `this guide <https://docs.aws.amazon.com/braket/latest/developerguide/braket-using-cuda-q.html>`__ on how to use CUDA-Q with Amazon Braket.
 
 Setting Credentials
 ```````````````````
 
@@ -133,12 +133,14 @@ Submitting
 Pasqal
 ++++++++++++++++
 
-Pasqal is a quantum computing hardware company that builds quantum processors from ordered neutral atoms in 2D and 3D arrays to bring a practical quantum advantage to its customers and address real-world problems.
-The currently available Pasqal QPUs are analog quantum computers.
+Pasqal is a quantum computing hardware company that builds quantum processors from ordered neutral atoms in 2D and 3D
+arrays to bring a practical quantum advantage to its customers and address real-world problems.
+The currently available Pasqal QPUs are analog quantum computers, and one, named Fresnel, is available through our cloud
+portal.
 
 In order to access Pasqal's devices you need an account for `Pasqal's cloud platform <https://portal.pasqal.cloud>`__
-and an active project. Although a different interface `Pasqal's Pulser library <https://pulser.readthedocs.io/en/latest/>`__ is a good
-way of getting started with analog neutral atom quantum computing. For support you can also use `Pasqal Community <https://community.pasqal.com/>`__.
+and an active project. Although a different interface, `Pasqal's Pulser library <https://pulser.readthedocs.io/en/latest/>`__, is a good
+resource for getting started with analog neutral atom quantum computing. For support you can also use `Pasqal Community <https://community.pasqal.com/>`__.
 
 
 .. _pasqal-backend:
 
@@ -1,7 +1,106 @@
-
-Density Matrix Simulators
+Noisy Simulators
 ==================================
 
+Trajectory Noisy Simulation
+++++++++++++++++++++++++++++++++++
+
+The :code:`nvidia` target supports noisy quantum circuit simulations using 
+quantum trajectory method across all configurations: single GPU, multi-node 
+multi-GPU, and with host memory. When simulating many trajectories with small 
+state vectors, the simulation is batched for optimal performance.
+
+When a :code:`noise_model` is provided to CUDA-Q, the :code:`nvidia` target 
+will incorporate quantum noise into the quantum circuit simulation according 
+to the noise model specified.
+
+
+.. tab:: Python
+
+    .. literalinclude:: ../../../snippets/python/using/backends/trajectory.py
+        :language: python
+        :start-after: [Begin Docs]
+
+    .. code:: bash 
+        
+        python3 program.py
+        { 00:15 01:92 10:81 11:812 }
+
+.. tab:: C++
+
+    .. literalinclude:: ../../../snippets/cpp/using/backends/trajectory.cpp
+        :language: cpp
+        :start-after: [Begin Documentation]
+
+    .. code:: bash 
+
+        nvq++ --target nvidia program.cpp [...] -o program.x
+        ./program.x
+        { 00:15 01:92 10:81 11:812 }
+
+
+In the case of bit-string measurement sampling as in the above example, each measurement 'shot' is executed as a trajectory, whereby Kraus operators specified in the noise model are sampled.
+
+For observable expectation value estimation, the statistical error scales asymptotically as :math:`1/\sqrt{N_{trajectories}}`, where :math:`N_{trajectories}` is the number of trajectories.
+Hence, depending on the required level of accuracy, the number of trajectories can be specified accordingly.
+
+.. tab:: Python
+
+    .. literalinclude:: ../../../snippets/python/using/backends/trajectory_observe.py
+        :language: python
+        :start-after: [Begin Docs]
+
+    .. code:: bash 
+        
+        python3 program.py
+        Noisy <Z> with 1024 trajectories = -0.810546875
+        Noisy <Z> with 8192 trajectories = -0.800048828125
+
+.. tab:: C++
+
+    .. literalinclude:: ../../../snippets/cpp/using/backends/trajectory_observe.cpp
+        :language: cpp
+        :start-after: [Begin Documentation]
+
+    .. code:: bash 
+
+        nvq++ --target nvidia program.cpp [...] -o program.x
+        ./program.x
+        Noisy <Z> with 1024 trajectories = -0.810547
+        Noisy <Z> with 8192 trajectories = -0.800049
+
+
+The following environment variable options are applicable to the :code:`nvidia` target for trajectory noisy simulation. Any environment variables must be set
+prior to setting the target.
+
+.. list-table:: **Additional environment variable options for trajectory simulation**
+  :widths: 20 30 50
+
+  * - Option
+    - Value
+    - Description
+  * - ``CUDAQ_OBSERVE_NUM_TRAJECTORIES``
+    - positive integer
+    - The default number of trajectories for observe simulation if none was provided in the `observe` call. The default value is 1000.
+  * - ``CUDAQ_BATCH_SIZE``
+    - positive integer or `NONE`
+    - The number of state vectors in the batched mode. If `NONE`, the batch size will be calculated based on the available device memory. Default is `NONE`.
+  * - ``CUDAQ_BATCHED_SIM_MAX_BRANCHES``
+    - positive integer
+    - The number of trajectory branches to be tracked simultaneously in the gate fusion. Default is 16. 
+  * - ``CUDAQ_BATCHED_SIM_MAX_QUBITS``
+    - positive integer
+    - The max number of qubits for batching. If the qubit count in the circuit is more than this value, batched trajectory simulation will be disabled. The default value is 20.
+  * - ``CUDAQ_BATCHED_SIM_MIN_BATCH_SIZE``
+    - positive integer
+    - The minimum number of trajectories for batching. If the number of trajectories is less than this value, batched trajectory simulation will be disabled. Default value is 4.
+
+.. note::
+    
+    Batched trajectory simulation is only available on the single-GPU execution mode of the :code:`nvidia` target. 
+    
+    If batched trajectory simulation is not activated, e.g., due to problem size, number of trajectories, or the nature of the circuit (dynamic circuits with mid-circuit measurements and conditional branching), the required number of trajectories will be executed sequentially.  
+
+
 
 Density Matrix 
 ++++++++++++++++
@@ -70,7 +169,8 @@ To execute a program on the :code:`stim` target, use the following commands:
         ./program.x
 
 .. note::
-    CUDA-Q currently executes kernels using a "shot-by-shot" execution approach.
+    By default CUDA-Q executes kernels using a "shot-by-shot" execution approach.
     This allows for conditional gate execution (i.e. full control flow), but it
     can be slower than executing Stim a single time and generating all the shots
     from that single execution.
+    Set the `explicit_measurements` flag with `sample` API for efficient execution.
@@ -24,11 +24,11 @@ technical details and code examples for using each circuit simulator.
      - State Vector
      - Testing and small applications
      - CPU
-     - single
+     - double
      - < 28
    * - `nvidia`
      - State Vector
-     - General purpose (default)
+     - General purpose (default); Trajectory simulation for noisy circuits
      - Single GPU
      - single / double
      - < 33 / 32 (64 GB)
@@ -72,7 +72,7 @@ technical details and code examples for using each circuit simulator.
      - Density Matrix
      - Noisy simulations
      - CPU
-     - single
+     - double
      - < 14
    * - `stim`
      - Stabilizer 
 
@@ -365,6 +365,7 @@
 </ul>
 </li>
 <li class="toctree-l3"><a class="reference internal" href="../using/backends/sims/noisy.html">     Noisy Simulators</a><ul>
+<li class="toctree-l4"><a class="reference internal" href="../using/backends/sims/noisy.html#trajectory-noisy-simulation">Trajectory Noisy Simulation</a></li>
 <li class="toctree-l4"><a class="reference internal" href="../using/backends/sims/noisy.html#density-matrix">Density Matrix</a></li>
 <li class="toctree-l4"><a class="reference internal" href="../using/backends/sims/noisy.html#stim">Stim</a></li>
 </ul>
 
@@ -367,6 +367,7 @@
 </ul>
 </li>
 <li class="toctree-l3"><a class="reference internal" href="../using/backends/sims/noisy.html">     Noisy Simulators</a><ul>
+<li class="toctree-l4"><a class="reference internal" href="../using/backends/sims/noisy.html#trajectory-noisy-simulation">Trajectory Noisy Simulation</a></li>
 <li class="toctree-l4"><a class="reference internal" href="../using/backends/sims/noisy.html#density-matrix">Density Matrix</a></li>
 <li class="toctree-l4"><a class="reference internal" href="../using/backends/sims/noisy.html#stim">Stim</a></li>
 </ul>