Skip to content

Commit 4938fa3

Browse files
committed
cleanup
1 parent 6906b7a commit 4938fa3

File tree

3 files changed

+80
-32
lines changed

3 files changed

+80
-32
lines changed
226 KB
Binary file not shown.

Diff for: Presentations/quantum_inspired_sampling.pdf

-2.62 KB
Binary file not shown.

Diff for: Presentations/quantum_inspired_sampling.tex

+80-32
Original file line numberDiff line numberDiff line change
@@ -101,6 +101,17 @@
101101

102102
\section{Machine Learning}
103103

104+
\begin{frame}
105+
\frametitle{Today's talk}
106+
\begin{itemize}
107+
\item In general, quantum machine learning algorithms convert quantum input states to the desired quantum output states.
108+
\item In practice, data is initially stored classically and the algorithm's output must be accessed classically as well.
109+
\item Today's focus: A practical way to make comparisons between classical and quantum algorithms is to analyze classical algorithms under $\ell^2$ sampling conditions
110+
\item Tang: linear algebra problems in low-dimensional spaces (say constant or polylogarithmic) likely can be solved "efficiently" under these conditions
111+
\item Many of the initial practical applications of quantum machine learning were to problems of this type (e.g. Quantum Recommendation Systems - Kerendis, Prakash, 2016)
112+
\end{itemize}
113+
\end{frame}
114+
104115
\begin{frame}
105116
\frametitle{Machine Learning}
106117
\framesubtitle{Introduction}
@@ -152,17 +163,15 @@ \section{Machine Learning}
152163
\section{Quantum Machine Learning}
153164

154165
\begin{frame}
155-
\frametitle{Moore-Penrose Pseuodinverse}
166+
\frametitle{Moore-Penrose Pseuodinverse (Quantum)}
156167
\framesubtitle{Harrow, Hassidim, Lloyd (orig.) Wiebe, Braun}
157168
\begin{itemize}
158-
\item HHL algorithm: application of phase estimation and Hamiltonian simulation to solve linear system.
159-
\item We can use HHL as a subroutine to compute $A^+ \ket{b} = \ket{x}$ in ($\ket{x}$ is the least-square solution).
160-
\item Note that $\ket{x}$ is a quantum state. Hence, we may efficiently measure an expectation value $x^T M x$ where $M$ is some p.s.d operator.
161-
\item Runtime bound $\tilde{O}(log(N)(s^3\kappa^6)/ \epsilon)$ time (query complexity)
169+
%\item HHL algorithm: application of phase estimation and Hamiltonian simulation to solve linear system.
170+
\item We can compute $A^+ \ket{b} = \ket{x_{LS}}$ in $\tilde{O}(log(N)(s^3\kappa^6)/ \epsilon)$ time (query complexity)
171+
\item Uses a quantum algorithm based on phase estimation and Hamiltonian simulation
162172
\item Assumption: $A$ is sparse with low condition number $\kappa$. Hamiltonian ($\hat{H}$) simulation is efficient when $\hat{H}$ is sparse. No low-rank assumptions are necessary.
163173
\item "Key" assumption: the quantum state $\ket{b}$ can be prepared efficiently.
164-
165-
174+
\item What happens if we assume low rank?
166175
\end{itemize}
167176
\end{frame}
168177

@@ -173,8 +182,8 @@ \section{Classical $\ell^2$ sampling}
173182

174183
\begin{itemize}
175184
\item How can we compare the speed of quantum algorithms with quantum input and quantum output to classical algorithms with classical input and classical output?
176-
\item Quantum machine learning algorithms can be exponentially faster than the best standard classical algorithms for similar tasks, but this comparison is unfair because the quantum algorithms get outside help through input state preparation.
177-
\item We want a classical model that helps its algorithms stand a chance against quantum algorithms, while still ensuring that they can be run in nearly all circumstances one would run the quantum algorithm.
185+
\item Quantum machine learning algorithms can be exponentially faster than the best standard classical algorithms for similar tasks, but quantum algorithms get help through input state preparation.
186+
\item Want a practical classical model that helps its algorithms offer similar guarantees to quantum algorithms, while still ensuring that they can be run in nearly all circumstances one would run the quantum algorithm.
178187
\pause
179188
\item Solution (Tang): compare quantum algorithms with quantum state preparation to classical algorithms with sample and query access to input.
180189
\end{itemize}
@@ -262,13 +271,13 @@ \section{Classical $\ell^2$ sampling}
262271
\framesubtitle{Method 1: Inner product estimation (Tang, 2018)}
263272
\begin{fact} For $\{X_{i,j}\}$ i.i.d random variables with mean $\mu$ and variance $\sigma^2$, let
264273

265-
$$Y := \underset{j \in [6\log 1/\delta]}{\operatorname{median}}\;\underset{i \in [6/\epsilon^2]}{\operatorname{mean}}\;X_{i,j}$$
274+
$$Y := \underset{j \in [\log 1/\delta]}{\operatorname{median}}\;\underset{i \in [1/\epsilon^2]}{\operatorname{mean}}\;X_{i,j}$$
266275

267276
Then $\vert Y - \mu\vert \leq \epsilon\sigma$ with probability $\geq 1-\delta$, using only $O(\frac{1}{\epsilon^2}\log\frac{1}{\delta})$ samples.
268277
\end{fact}
269278

270279
\begin{itemize}
271-
\item In words: We may create a mean estimator from $6/\epsilon^2$ samples of $X$. Here we compute the median of $6\log 1/\delta$ such estimators
280+
\item In words: We may create a mean estimator from $1/\epsilon^2$ samples of $X$. We compute the median of $\log 1/\delta$ such estimators
272281
\pause
273282
\item Catoni (2012) shows that Chebyshev's inequality is the best guarantee one can provide when considering pure empirical mean estimators for an unknown distribution (and finite $\mu, \sigma$)
274283
\item "Median of means" provides an exponential improvement in probability of success ($1 - \delta$) guarantee
@@ -287,7 +296,7 @@ \section{Classical $\ell^2$ sampling}
287296
\begin{corollary} For $x,y \in\CC^n$, given $x \in \mathcal{SQ}$ and $y \in \mathcal{Q}$, we can estimate $\langle x,y\rangle$ to $\epsilon\|x\|\|y\|$ error with probability $\geq 1-\delta$ with query complexity $O(\frac{1}{\epsilon^2}\log\frac{1}{\delta})$
288297
\end{corollary}
289298
\pause
290-
\begin{proof}Sample an \textbf{index} $s$ from $x$. Then, define $Z := x_s v_s\frac{\|v\|^2}{|v_s|^2}$. Apply the Fact with $X_{i,j}$ being independent samples $Z$.
299+
\begin{proof}Sample an \textbf{index} $s$ from $x$. Then, define $Z := x_s y_s\frac{\|y\|^2}{|y_s|^2}$. Apply the Fact with $X_{i,j}$ being independent samples $Z$.
291300
\end{proof}
292301
\end{frame}
293302

@@ -335,21 +344,20 @@ \section{Classical $\ell^2$ sampling}
335344
\frametitle{Dequantization Toolbox}
336345
\framesubtitle{Method 2: Thin Matrix-Vector (Tang, 2018)}
337346
\begin{proposition}
338-
For $V \in \RR^{n\times k}$ and $w \in \RR^k$, given $V^\dag \in \mathcal{SQ}$ and $w \in \mathcal{Q}$, we can simulate $Vw \in \mathcal{SQ}$ with expected query complexity $O(k^2C(V,w))$, where
339-
340-
$$C(V,w) := \frac{\sum_{i=1}^k\|V_{(\cdot, i)}w_i\|^2}{\|Vw\|^2}$$
347+
For $V \in \RR^{n\times k}$ and $w \in \RR^k$, given $V^\dag \in \mathcal{SQ}$ and $w \in \mathcal{Q}$, we can simulate $Vw \in \mathcal{SQ}$ with expected query complexity $\tilde{O}((\frac{1}{\epsilon^2}\log\frac{1}{\delta}))$
341348

342349
We can compute entries $(Vw)_i$ with $O(k)$ queries.
343350

344351
We can sample using rejection sampling:
345352

346353
\begin{itemize}
347-
\item $P$ is the distribution formed by sampling from $V_{(\cdot, j)}$ with probability proportional to $\|V_{(\cdot, j)}w_j\|^2$
354+
\item $P$ is the distribution formed by sampling from $V_{(\cdot, j)}$.
348355

349356
\item $Q$ is the target $Vw$.
357+
\item Hence, compute $r_s$ to be a constant factor of $Q / P$
350358
\end{itemize}
351359

352-
$$r_i = \frac{(Vw)_i^2}{k \sum_{j=1}^k (V_{ij}w_j)^2} = \frac{Q(i)}{kC(V,w)P(i)}$$
360+
$$r_i = \frac{\|w^T V_{\cdot, i}\|^2}{\|w\|^2\|V_{\cdot, i}\|^2}$$
353361
\end{proposition}
354362
\end{frame}
355363

@@ -359,9 +367,9 @@ \section{Classical $\ell^2$ sampling}
359367
\begin{itemize}
360368
\item Notice that we can compute these $r_i$'s (in fact, despite that we cannot compute probabilities from the target distribution), and that the rejection sampling guarantee is satisfied (via Cauchy-Schwarz).
361369

362-
\item The probability of success is $\frac{\|Vw\|^2}{k\sum_{i=1}^k\|w_iV^{(i)}\|^2}$. Thus, to estimate the norm of $Vw$, it suffices to estimate the probability of success of this rejection sampling process.
370+
\item Since the probability of success is $\|Vw\|^2/ \| w\|^2$, it suffices to estimate the probability of success of this rejection sampling process to estimate this norm.
363371

364-
\item Through a Chernoff bound, we see that the average of $O(kC(V,w)(\frac{1}{\epsilon^2}\log\frac{1}{\delta}))$ "coin flips" is in $[(1-\epsilon)\|Vw\|,(1+\epsilon)\|Vw\|]$ with probability $\geq 1-\delta$, where each coin flip costs $k$ queries and samples.
372+
\item Through a Chernoff bound, we see that the average of $O(\|w\|^2(\frac{1}{\epsilon^2}\log\frac{1}{\delta}))$ "coin flips" is in $[(1-\epsilon)\|Vw\|,(1+\epsilon)\|Vw\|]$ with probability $\geq 1-\delta$.
365373
\end{itemize}
366374
\end{frame}
367375

@@ -371,44 +379,86 @@ \section{Classical $\ell^2$ sampling}
371379
\framesubtitle{Method 3: Low-Rank Approximation (Frieze, Kannan, Vempala, 1998)}
372380
\begin{itemize}
373381
\item For $A \in \CC^{m\times n}$, given $A \in \mathcal{SQ}$ and some threshold $k$, we can output a description of a low-rank approximation of $A$ with $\text{poly}(k)$ queries.
374-
\item Specifically, we output two matrices $S,\hat{U}\in \mathcal{SQ}$ where $S \in \CC^{\ell \times n}$, $\hat{U} \in \CC^{\ell \times k}$ ($\ell = \text{poly}(k,\frac{1}{\epsilon}$), and this implicitly describes the low-rank approximation to $A$, $D := A(S^\dagger\hat{U})(S^\dagger\hat{U})^\dag$ ($\implies$ rank $D \leq k$).
382+
\item Specifically, we output two matrices $S,\hat{U}\in \mathcal{SQ}$ where $S \in \CC^{\ell \times n}$, $\hat{U} \in \CC^{\ell \times k}$ ($\ell = \text{poly}(k,\frac{1}{\epsilon}$)), and this implicitly describes the low-rank approximation to $A$, $D := A(S^\dagger\hat{U})(S^\dagger\hat{U})^\dag$ ($\implies$ rank $D \leq k$).
375383

376384
\item This matrix satisfies the following low-rank guarantee with probability $\geq 1-\delta$: for $\sigma := \sqrt{2/k}\|A\|_F$, and $A_{\sigma} := \sum_{\sigma_i \geq \sigma} \sigma_iu_iv_i^\dag$ (using SVD),
377385
$$\|A - D\|_F^2 \leq \|A - A_\sigma\|_F^2 + \epsilon^2\|A\|_F^2$$
378-
\item Pay special attention to the $\|A - A_\sigma\|_F^2$ term. This says that our guarantee is weak if $A$ has no large singular values.
386+
\item Note the $\|A - A_\sigma\|_F^2$ term. This says that our guarantee is weak if $A$ has no large singular values.
379387
\item Quantum analog: phase estimation
380388
\end{itemize}
381389
\end{frame}
382390

391+
\begin{frame}
392+
\frametitle{Dequantization Toolbox}
393+
394+
$$
395+
\begin{bmatrix}
396+
\\
397+
\cdots A \cdots
398+
\\
399+
\\
400+
\end{bmatrix}
401+
\begin{bmatrix}
402+
\\
403+
S^\dag
404+
\\
405+
\\
406+
\end{bmatrix}
407+
\begin{bmatrix}
408+
\hat{U}
409+
\end{bmatrix}
410+
\begin{bmatrix}
411+
\hat{U^\dag}
412+
\end{bmatrix}
413+
\begin{bmatrix}
414+
\cdots S \cdots
415+
\end{bmatrix}
416+
$$
417+
418+
\end{frame}
419+
420+
383421
\begin{frame}
384422
\frametitle{Moore-Penrose Pseudoinverse (low-rank)}
385423
\framesubtitle{Application (Lloyd, Tang, 2018)}
386424

387425
\begin{problem} For a low-rank matrix $A \in \RR^{m\times n}$
388-
and a vector $x \in \RR^n$, given $x, A \in \mathcal{SQ}$, (approximately) simulate $A^+x \in \mathcal{SQ}$.
426+
and a vector $b \in \RR^n$, given $b, A \in \mathcal{SQ}$, (approximately) simulate $A^+b \in \mathcal{SQ}$.
389427
\end{problem}
390428
\pause
391429
\begin{algorithm}
392430
\begin{itemize}
393431
\item Low-rank approximation (3) gives us $S,\hat{U} \in \mathcal{SQ}$.
394432

395-
\item Applying thin-matrix vector (2), we get $\hat{V} \in \mathcal{SQ}$, where $\hat{V} := S^T\hat{U}$; we can show that the columns of $\hat{V}$ behave like the right singular vectors of $A$.
433+
\item Applying thin-matrix vector (2), we get $\hat{V} \in \mathcal{SQ}$, where $\hat{V} := S^T\hat{U}$; we can show that the columns of $\hat{V}$ behave like the right singular vectors of $A$.
434+
\item Let $\hat{U}$ have columns $\{ \hat{u}_i\}$. Hence, $\hat{V}$ has columns $\{ S \hat{u}_i \}$. Write its $i$th column as $\hat{v}_i := S\hat{u}_i$.
396435

397-
\item Low-rank approximation (3) also outputs their approximate singular values $\hat{\sigma}_i$
398-
399-
\item Hence, we can approximate the vector we wish to sample:
400-
$$A^+x = (A^TA)^+A^Tx \approx \sum_{i=1}^k \frac{1}{\hat{\sigma}_i^2}\hat{v}_i\hat{v}_i^T A^Tx$$
436+
\item Low-rank approximation (3) also outputs the approximate singular values $\hat{\sigma}_i$ of $A$
401437
\end{itemize}
402438
\end{algorithm}
403439
\end{frame}
404440

441+
\begin{frame}
442+
\frametitle{Moore-Penrose Pseudoinverse (low-rank) cont.}
443+
\framesubtitle{Application (Lloyd, Tang, 2018)}
444+
445+
Now, we can write the approximate vector we wish to sample in terms of these approximations:
446+
447+
$$A^+b = (A^TA)^+A^Tb \approx \sum_{i=1}^k \frac{1}{\hat{\sigma}_i^2}\hat{v}_i\hat{v}_i^T A^Tb$$
448+
\end{frame}
449+
450+
405451
\begin{frame}
406452
\frametitle{Moore-Penrose Pseudoinverse (low-rank) cont.}
407453
\framesubtitle{Application (Lloyd, Tang, 2018)}
408454
\begin{itemize}
409-
\item We approximate $\hat{v}_i^TA^Tx$ to additive error for all by noticing that $\hat{v}_i^TA^Tx = \tr(A^Tx\hat{v}_i^T)$ is an inner product of the order two tensors $A^T$ and $x\hat{v}_i^T$.
455+
\item We approximate $\hat{v}_i^TA^Tb$ to additive error for all by noticing that $\hat{v}_i^TA^Tb = \tr(A^Tb\hat{v}_i^T)$ is an inner product of $A^T$ and $b\hat{v}_i^T$.
410456
\item Thus, we can apply (1), since being given $A \in \mathcal{SQ}$ implies $A^T \in \mathcal{SQ}$ for $A^T$ viewed as a long vector.
411-
\item Finally, using (2), sample from the linear combination using these estimates and $\hat{\sigma}_i$.
457+
\item Define the approximation of $\hat{v}_i^TA^Tb$ to be $\hat{\lambda}_i$. At this point we have (recalling that $\hat{v}_i := S\hat{u}_i$)
458+
459+
$$A^+b \approx \sum_{i=1}^k \frac{1}{\hat{\sigma}_i^2}\hat{v}_i\hat{\lambda}_i = S \sum_{i=1}^k \frac{1}{\hat{\sigma}_i^2}\hat{u}_i\hat{\lambda}_i$$
460+
461+
\item Finally, using (2) to provide sample access to each $S \hat{u}_i$, we are done ! $\tilde{O}(\kappa^{16}k^6 \|A\|^6_F / \epsilon^6)$ complexity.
412462
\end{itemize}
413463
\end{frame}
414464

@@ -418,7 +468,7 @@ \section{Remarks}
418468
\frametitle{Thoughts}
419469

420470
\begin{itemize}
421-
\item Claim (Tang): For machine learning problems, SQ assumptions are more reasonable than state preparation assumptions.
471+
\item Claim (Tang): For machine learning problems, $\mathcal{SQ}$ assumptions are more reasonable than state preparation assumptions.
422472
\item We discussed pseudo-inverse which inverts singular values, but in principle we could have applied any function to the singular values
423473
\item Gilyen et. al (2018) show that many quantum machine learning algorithms indeed apply polynomial functions to singular values
424474
\item Our discussion suggests that exponential quantum speedups are tightly related to problems where high-rank matrices play a crucial role (e.g. Hamiltonian simulation or QFT)
@@ -440,8 +490,6 @@ \section{Remarks}
440490
\begin{frame}
441491
\frametitle{Read the Fine Print}
442492
\begin{itemize}
443-
\item In general QML algorithms convert quantum input states to the desired quantum output state.
444-
\item In practice, data is initially stored classically and the algorithm's output must be accessed classically as well.
445493
\item This poses two problems if seek to use these algorithms: the "state preparation" and "readout" problems.
446494
\item Even if we ignore the readout problem, can we at least find a state preparation routine that maintains a speedup for the discussed quantum algorithms? Open question!
447495
\item See "Quantum Machine Learning Algorithms: Read the Fine Print" by Aaronson

0 commit comments

Comments
 (0)