You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Presentations/quantum_inspired_sampling.tex
+80-32
Original file line number
Diff line number
Diff line change
@@ -101,6 +101,17 @@
101
101
102
102
\section{Machine Learning}
103
103
104
+
\begin{frame}
105
+
\frametitle{Today's talk}
106
+
\begin{itemize}
107
+
\item In general, quantum machine learning algorithms convert quantum input states to the desired quantum output states.
108
+
\item In practice, data is initially stored classically and the algorithm's output must be accessed classically as well.
109
+
\item Today's focus: A practical way to make comparisons between classical and quantum algorithms is to analyze classical algorithms under $\ell^2$ sampling conditions
110
+
\item Tang: linear algebra problems in low-dimensional spaces (say constant or polylogarithmic) likely can be solved "efficiently" under these conditions
111
+
\item Many of the initial practical applications of quantum machine learning were to problems of this type (e.g. Quantum Recommendation Systems - Kerendis, Prakash, 2016)
\item HHL algorithm: application of phase estimation and Hamiltonian simulation to solve linear system.
159
-
\item We can use HHL as a subroutine to compute $A^+ \ket{b} = \ket{x}$ in ($\ket{x}$ is the least-square solution).
160
-
\item Note that $\ket{x}$ is a quantum state. Hence, we may efficiently measure an expectation value $x^T M x$ where $M$ is some p.s.d operator.
161
-
\item Runtime bound $\tilde{O}(log(N)(s^3\kappa^6)/ \epsilon)$ time (query complexity)
169
+
%\item HHL algorithm: application of phase estimation and Hamiltonian simulation to solve linear system.
170
+
\item We can compute $A^+ \ket{b} = \ket{x_{LS}}$ in $\tilde{O}(log(N)(s^3\kappa^6)/ \epsilon)$ time (query complexity)
171
+
\item Uses a quantum algorithm based on phase estimation and Hamiltonian simulation
162
172
\item Assumption: $A$ is sparse with low condition number $\kappa$. Hamiltonian ($\hat{H}$) simulation is efficient when $\hat{H}$ is sparse. No low-rank assumptions are necessary.
163
173
\item"Key" assumption: the quantum state $\ket{b}$ can be prepared efficiently.
\item How can we compare the speed of quantum algorithms with quantum input and quantum output to classical algorithms with classical input and classical output?
176
-
\item Quantum machine learning algorithms can be exponentially faster than the best standard classical algorithms for similar tasks, but this comparison is unfair because the quantum algorithms get outside help through input state preparation.
177
-
\itemWe want a classical model that helps its algorithms stand a chance against quantum algorithms, while still ensuring that they can be run in nearly all circumstances one would run the quantum algorithm.
185
+
\item Quantum machine learning algorithms can be exponentially faster than the best standard classical algorithms for similar tasks, but quantum algorithms get help through input state preparation.
186
+
\itemWant a practical classical model that helps its algorithms offer similar guarantees to quantum algorithms, while still ensuring that they can be run in nearly all circumstances one would run the quantum algorithm.
178
187
\pause
179
188
\item Solution (Tang): compare quantum algorithms with quantum state preparation to classical algorithms with sample and query access to input.
Then $\vert Y - \mu\vert\leq\epsilon\sigma$ with probability $\geq1-\delta$, using only $O(\frac{1}{\epsilon^2}\log\frac{1}{\delta})$ samples.
268
277
\end{fact}
269
278
270
279
\begin{itemize}
271
-
\item In words: We may create a mean estimator from $6/\epsilon^2$ samples of $X$. Here we compute the median of $6\log1/\delta$ such estimators
280
+
\item In words: We may create a mean estimator from $1/\epsilon^2$ samples of $X$. We compute the median of $\log1/\delta$ such estimators
272
281
\pause
273
282
\item Catoni (2012) shows that Chebyshev's inequality is the best guarantee one can provide when considering pure empirical mean estimators for an unknown distribution (and finite $\mu, \sigma$)
274
283
\item"Median of means" provides an exponential improvement in probability of success ($1 - \delta$) guarantee
\begin{corollary} For $x,y \in\CC^n$, given $x \in\mathcal{SQ}$ and $y \in\mathcal{Q}$, we can estimate $\langle x,y\rangle$ to $\epsilon\|x\|\|y\|$ error with probability $\geq1-\delta$ with query complexity $O(\frac{1}{\epsilon^2}\log\frac{1}{\delta})$
288
297
\end{corollary}
289
298
\pause
290
-
\begin{proof}Sample an \textbf{index} $s$ from $x$. Then, define $Z := x_s v_s\frac{\|v\|^2}{|v_s|^2}$. Apply the Fact with $X_{i,j}$ being independent samples $Z$.
299
+
\begin{proof}Sample an \textbf{index} $s$ from $x$. Then, define $Z := x_s y_s\frac{\|y\|^2}{|y_s|^2}$. Apply the Fact with $X_{i,j}$ being independent samples $Z$.
For $V \in\RR^{n\times k}$ and $w \in\RR^k$, given $V^\dag\in\mathcal{SQ}$ and $w \in\mathcal{Q}$, we can simulate $Vw \in\mathcal{SQ}$ with expected query complexity $O(k^2C(V,w))$, where
For $V \in\RR^{n\times k}$ and $w \in\RR^k$, given $V^\dag\in\mathcal{SQ}$ and $w \in\mathcal{Q}$, we can simulate $Vw \in\mathcal{SQ}$ with expected query complexity $\tilde{O}((\frac{1}{\epsilon^2}\log\frac{1}{\delta}))$
341
348
342
349
We can compute entries $(Vw)_i$ with $O(k)$ queries.
343
350
344
351
We can sample using rejection sampling:
345
352
346
353
\begin{itemize}
347
-
\item$P$ is the distribution formed by sampling from $V_{(\cdot, j)}$ with probability proportional to $\|V_{(\cdot, j)}w_j\|^2$
354
+
\item$P$ is the distribution formed by sampling from $V_{(\cdot, j)}$.
348
355
349
356
\item$Q$ is the target $Vw$.
357
+
\item Hence, compute $r_s$ to be a constant factor of $Q / P$
\item Notice that we can compute these $r_i$'s (in fact, despite that we cannot compute probabilities from the target distribution), and that the rejection sampling guarantee is satisfied (via Cauchy-Schwarz).
361
369
362
-
\itemThe probability of success is $\frac{\|Vw\|^2}{k\sum_{i=1}^k\|w_iV^{(i)}\|^2}$. Thus, to estimate the norm of $Vw$, it suffices to estimate the probability of success of this rejection sampling process.
370
+
\itemSince the probability of success is $\|Vw\|^2/ \| w\|^2$, it suffices to estimate the probability of success of this rejection sampling process to estimate this norm.
363
371
364
-
\item Through a Chernoff bound, we see that the average of $O(kC(V,w)(\frac{1}{\epsilon^2}\log\frac{1}{\delta}))$"coin flips" is in $[(1-\epsilon)\|Vw\|,(1+\epsilon)\|Vw\|]$ with probability $\geq1-\delta$, where each coin flip costs $k$ queries and samples.
372
+
\item Through a Chernoff bound, we see that the average of $O(\|w\|^2(\frac{1}{\epsilon^2}\log\frac{1}{\delta}))$"coin flips" is in $[(1-\epsilon)\|Vw\|,(1+\epsilon)\|Vw\|]$ with probability $\geq1-\delta$.
\item For $A \in\CC^{m\times n}$, given $A \in\mathcal{SQ}$ and some threshold $k$, we can output a description of a low-rank approximation of $A$ with $\text{poly}(k)$ queries.
374
-
\item Specifically, we output two matrices $S,\hat{U}\in\mathcal{SQ}$ where $S \in\CC^{\ell\times n}$, $\hat{U} \in\CC^{\ell\times k}$ ($\ell = \text{poly}(k,\frac{1}{\epsilon}$), and this implicitly describes the low-rank approximation to $A$, $D := A(S^\dagger\hat{U})(S^\dagger\hat{U})^\dag$ ($\implies$ rank $D \leq k$).
382
+
\item Specifically, we output two matrices $S,\hat{U}\in\mathcal{SQ}$ where $S \in\CC^{\ell\times n}$, $\hat{U} \in\CC^{\ell\times k}$ ($\ell = \text{poly}(k,\frac{1}{\epsilon}$)), and this implicitly describes the low-rank approximation to $A$, $D := A(S^\dagger\hat{U})(S^\dagger\hat{U})^\dag$ ($\implies$ rank $D \leq k$).
375
383
376
384
\item This matrix satisfies the following low-rank guarantee with probability $\geq1-\delta$: for $\sigma := \sqrt{2/k}\|A\|_F$, and $A_{\sigma} := \sum_{\sigma_i \geq\sigma} \sigma_iu_iv_i^\dag$ (using SVD),
\begin{problem} For a low-rank matrix $A \in\RR^{m\times n}$
388
-
and a vector $x\in\RR^n$, given $x, A \in\mathcal{SQ}$, (approximately) simulate $A^+x\in\mathcal{SQ}$.
426
+
and a vector $b\in\RR^n$, given $b, A \in\mathcal{SQ}$, (approximately) simulate $A^+b\in\mathcal{SQ}$.
389
427
\end{problem}
390
428
\pause
391
429
\begin{algorithm}
392
430
\begin{itemize}
393
431
\item Low-rank approximation (3) gives us $S,\hat{U} \in\mathcal{SQ}$.
394
432
395
-
\item Applying thin-matrix vector (2), we get $\hat{V} \in\mathcal{SQ}$, where $\hat{V} := S^T\hat{U}$; we can show that the columns of $\hat{V}$ behave like the right singular vectors of $A$.
433
+
\item Applying thin-matrix vector (2), we get $\hat{V} \in\mathcal{SQ}$, where $\hat{V} := S^T\hat{U}$; we can show that the columns of $\hat{V}$ behave like the right singular vectors of $A$.
434
+
\item Let $\hat{U}$ have columns $\{\hat{u}_i\}$. Hence, $\hat{V}$ has columns $\{ S \hat{u}_i \}$. Write its $i$th column as $\hat{v}_i := S\hat{u}_i$.
396
435
397
-
\item Low-rank approximation (3) also outputs their approximate singular values $\hat{\sigma}_i$
398
-
399
-
\item Hence, we can approximate the vector we wish to sample:
\item We approximate $\hat{v}_i^TA^Tx$ to additive error for all by noticing that $\hat{v}_i^TA^Tx = \tr(A^Tx\hat{v}_i^T)$ is an inner product of the order two tensors $A^T$ and $x\hat{v}_i^T$.
455
+
\item We approximate $\hat{v}_i^TA^Tb$ to additive error for all by noticing that $\hat{v}_i^TA^Tb = \tr(A^Tb\hat{v}_i^T)$ is an inner product of $A^T$ and $b\hat{v}_i^T$.
410
456
\item Thus, we can apply (1), since being given $A \in\mathcal{SQ}$ implies $A^T \in\mathcal{SQ}$ for $A^T$ viewed as a long vector.
411
-
\item Finally, using (2), sample from the linear combination using these estimates and $\hat{\sigma}_i$.
457
+
\item Define the approximation of $\hat{v}_i^TA^Tb$ to be $\hat{\lambda}_i$. At this point we have (recalling that $\hat{v}_i := S\hat{u}_i$)
458
+
459
+
$$A^+b \approx\sum_{i=1}^k \frac{1}{\hat{\sigma}_i^2}\hat{v}_i\hat{\lambda}_i = S \sum_{i=1}^k \frac{1}{\hat{\sigma}_i^2}\hat{u}_i\hat{\lambda}_i$$
460
+
461
+
\item Finally, using (2) to provide sample access to each $S \hat{u}_i$, we are done ! $\tilde{O}(\kappa^{16}k^6\|A\|^6_F / \epsilon^6)$ complexity.
412
462
\end{itemize}
413
463
\end{frame}
414
464
@@ -418,7 +468,7 @@ \section{Remarks}
418
468
\frametitle{Thoughts}
419
469
420
470
\begin{itemize}
421
-
\item Claim (Tang): For machine learning problems, SQ assumptions are more reasonable than state preparation assumptions.
471
+
\item Claim (Tang): For machine learning problems, $\mathcal{SQ}$ assumptions are more reasonable than state preparation assumptions.
422
472
\item We discussed pseudo-inverse which inverts singular values, but in principle we could have applied any function to the singular values
423
473
\item Gilyen et. al (2018) show that many quantum machine learning algorithms indeed apply polynomial functions to singular values
424
474
\item Our discussion suggests that exponential quantum speedups are tightly related to problems where high-rank matrices play a crucial role (e.g. Hamiltonian simulation or QFT)
@@ -440,8 +490,6 @@ \section{Remarks}
440
490
\begin{frame}
441
491
\frametitle{Read the Fine Print}
442
492
\begin{itemize}
443
-
\item In general QML algorithms convert quantum input states to the desired quantum output state.
444
-
\item In practice, data is initially stored classically and the algorithm's output must be accessed classically as well.
445
493
\item This poses two problems if seek to use these algorithms: the "state preparation" and "readout" problems.
446
494
\item Even if we ignore the readout problem, can we at least find a state preparation routine that maintains a speedup for the discussed quantum algorithms? Open question!
447
495
\item See "Quantum Machine Learning Algorithms: Read the Fine Print" by Aaronson
0 commit comments