Skip to content

Commit c35db20

Browse files
committed
tang
1 parent 4938fa3 commit c35db20

8 files changed

+247
-3
lines changed

Notes/images/swap_test.png

40 KB
Loading

Notes/main.pdf

55.4 KB
Binary file not shown.

Notes/main.tex

+14
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,17 @@
22
\usepackage{subfiles}
33
\usepackage[toc,page]{appendix}
44

5+
\usepackage{tikz}
6+
\usetikzlibrary{trees}
7+
\tikzset{
8+
invisible/.style={opacity=0},
9+
visible on/.style={alt={#1{}{invisible}}},
10+
alt/.code args={<#1>#2#3}{%
11+
\alt<#1>{\pgfkeysalso{#2}}{\pgfkeysalso{#3}} % \pgfkeysalso doesn't change the path
12+
},
13+
properties/.style={green, ultra thick},
14+
}
15+
516
\oddsidemargin=17pt \evensidemargin=17pt
617
\headheight=9pt \topmargin=26pt
718
\textheight=564pt \textwidth=433.8pt
@@ -72,6 +83,9 @@
7283
\newtheorem{proposition}[theorem]{Proposition}
7384
\newtheorem{exercise}[theorem]{Exercise}
7485
\newtheorem{definition}[theorem]{Definition}
86+
\newtheorem{fact}[theorem]{Fact}
87+
\newtheorem{algorithm}[theorem]{Algorithm}
88+
\newtheorem{example}[theorem]{Example}
7589

7690
\title{Quantum Algorithms and Learning Theory\\\textit{Notes and Exercises}}
7791
\author{Faris Sbahi}

Notes/quantum_learning_notes.tex

+233-3
Original file line numberDiff line numberDiff line change
@@ -105,20 +105,240 @@ \section{Singular Value Transformation using Quantum-Inspired Length-Square Samp
105105

106106
As we've seen, most well-known QML algorithms convert input quantum states to a desired output state or value. Thus, they do not provide a routine to get necessary copies of these input states (a state preparation routine) and a strategy to extract information from an output state. Both are essential to making the algorithm useful.
107107

108-
\subsection{Sampling Model}
108+
\begin{itemize}
109+
%\item HHL algorithm: application of phase estimation and Hamiltonian simulation to solve linear system.
110+
\item We can compute $A^+ \ket{b} = \ket{x_{LS}}$ in $\tilde{O}(log(N)(s^3\kappa^6)/ \epsilon)$ time (query complexity)
111+
\item Uses a quantum algorithm based on phase estimation and Hamiltonian simulation
112+
\item Assumption: $A$ is sparse with low condition number $\kappa$. Hamiltonian ($\hat{H}$) simulation is efficient when $\hat{H}$ is sparse. No low-rank assumptions are necessary.
113+
\item "Key" assumption: the quantum state $\ket{b}$ can be prepared efficiently.
114+
\item What happens if we assume low rank?
115+
\end{itemize}
116+
117+
\begin{itemize}
118+
\item In general, quantum machine learning algorithms convert quantum input states to the desired quantum output states.
119+
\item In practice, data is initially stored classically and the algorithm's output must be accessed classically as well.
120+
\item Today's focus: A practical way to make comparisons between classical and quantum algorithms is to analyze classical algorithms under $\ell^2$ sampling conditions
121+
\item Tang: linear algebra problems in low-dimensional spaces (say constant or polylogarithmic) likely can be solved "efficiently" under these conditions
122+
\item Many of the initial practical applications of quantum machine learning were to problems of this type (e.g. Quantum Recommendation Systems - Kerendis, Prakash, 2016)
123+
\end{itemize}
124+
125+
\begin{itemize}
126+
\item How can we compare the speed of quantum algorithms with quantum input and quantum output to classical algorithms with classical input and classical output?
127+
\item Quantum machine learning algorithms can be exponentially faster than the best standard classical algorithms for similar tasks, but quantum algorithms get help through input state preparation.
128+
\item Want a practical classical model that helps its algorithms offer similar guarantees to quantum algorithms, while still ensuring that they can be run in nearly all circumstances one would run the quantum algorithm.
129+
\item Solution (Tang): compare quantum algorithms with quantum state preparation to classical algorithms with sample and query access to input.
130+
\end{itemize}
109131

110132
\subsubsection{Definitions}
111133

134+
\begin{definition}
135+
We have "query access" to $x \in \CC^n$ if, given $i \in [n]$, we can efficiently compute $x_i$. We say that $x \in \mathcal{Q}$.
136+
\end{definition}
137+
\begin{definition} We have sample \textbf{and} query access to $x \in \CC^n$ if
138+
139+
\begin{enumerate}
140+
\item We have query access to $x$ i.e. $x\in \mathcal{Q}$ ($\implies$ $\mathcal{SQ} \subset \mathcal{Q}$)
141+
\item can produce independent random samples $i \in [n]$ where we sample $i$ with probability $|x_i|^2/\|x\|^2$ and can query for $\|x\|$.
142+
\end{enumerate}
143+
We say that $x \in \mathcal{SQ}$.
144+
\end{definition}
145+
\begin{definition} For $A \in \CC^{m\times n}$, $A \in \mathcal{SQ}$ (abuse) if
146+
147+
\begin{enumerate}
148+
\item $A_i \in \mathcal{SQ}$ where $A_i$ is the $i$th row of $A$
149+
\item $\tilde{A} \in \mathcal{SQ}$ for $\tilde{A}$ the vector of row norms (so $\tilde{A}_i = \|A_i\|$).
150+
\end{enumerate}
151+
\end{definition}
152+
153+
154+
\begin{example}
155+
Say we have the vector $\vec{x} = (2, 0, 1, 3)$ and $\vec{x} \in \mathcal{SQ}$. Consider the following binary tree data structure.
156+
157+
\begin{tikzpicture}[level distance=1.5cm,
158+
level 1/.style={sibling distance=5.5cm},
159+
level 2/.style={sibling distance=3cm},
160+
level 3/.style={sibling distance=3cm}]
161+
\node (1){$\| x \|^2 = 14$}
162+
child {node {$x_1^2 + x_2^2 = 4$}
163+
child {node {$x_1^2 = 4$}
164+
child {node {$\text{sgn}(x_1) = +1$}}
165+
edge from parent node [left] {\tiny $1$}
166+
}
167+
child {node {$x_2^2 = 0$}
168+
child {node {$\text{sgn}(x_2) = +1$}}
169+
edge from parent node [right] {\tiny $0$}
170+
}
171+
edge from parent node [left] {\tiny $4/14$}
172+
}
173+
child {node(2) {$x_3^2 + x_4^2 = 10$}
174+
child {node {$x_3^2 = 1$}
175+
child {node {$\text{sgn}(x_3) = +1$}}
176+
edge from parent node [left] {\tiny $1/10$}
177+
}
178+
child {node(3) {$x_4^2 = 9$}
179+
child {node {$\text{sgn}(x_4) = +1$}}
180+
edge from parent node [right] {\tiny $9/10$}
181+
}
182+
edge from parent node [right] {\tiny $10/14$}
183+
};
184+
\end{tikzpicture}
185+
\end{example}
186+
187+
112188
\subsubsection{Low-Rank Estimation}
113189

190+
\begin{itemize}
191+
\item For $A \in \CC^{m\times n}$, given $A \in \mathcal{SQ}$ and some threshold $k$, we can output a description of a low-rank approximation of $A$ with $\text{poly}(k)$ queries.
192+
\item Specifically, we output two matrices $S,\hat{U}\in \mathcal{SQ}$ where $S \in \CC^{\ell \times n}$, $\hat{U} \in \CC^{\ell \times k}$ ($\ell = \text{poly}(k,\frac{1}{\epsilon}$)), and this implicitly describes the low-rank approximation to $A$, $D := A(S^\dagger\hat{U})(S^\dagger\hat{U})^\dag$ ($\implies$ rank $D \leq k$).
193+
194+
\item This matrix satisfies the following low-rank guarantee with probability $\geq 1-\delta$: for $\sigma := \sqrt{2/k}\|A\|_F$, and $A_{\sigma} := \sum_{\sigma_i \geq \sigma} \sigma_iu_iv_i^\dag$ (using SVD),
195+
$$\|A - D\|_F^2 \leq \|A - A_\sigma\|_F^2 + \epsilon^2\|A\|_F^2$$
196+
\item Note the $\|A - A_\sigma\|_F^2$ term. This says that our guarantee is weak if $A$ has no large singular values.
197+
\item Quantum analog: phase estimation
198+
\end{itemize}
199+
200+
201+
$$
202+
\begin{bmatrix}
203+
\\
204+
\cdots A \cdots
205+
\\
206+
\\
207+
\end{bmatrix}
208+
\begin{bmatrix}
209+
\\
210+
S^\dag
211+
\\
212+
\\
213+
\end{bmatrix}
214+
\begin{bmatrix}
215+
\hat{U}
216+
\end{bmatrix}
217+
\begin{bmatrix}
218+
\hat{U^\dag}
219+
\end{bmatrix}
220+
\begin{bmatrix}
221+
\cdots S \cdots
222+
\end{bmatrix}
223+
$$
224+
114225
\subsubsection{Trace Inner Product Estimation}
115226

227+
\begin{itemize}
228+
\item For $x, y \in \CC^n$, if we are given that $x \in \mathcal{SQ}$ and $y \in \mathcal{Q}$, then we can estimate $\< x, y\>$ with probability $\geq 1 - \delta$ and error $\epsilon \|x\|\|y\|$
229+
\item Quantum analog: SWAP test
230+
\end{itemize}
231+
\begin{figure}
232+
\includegraphics[width= 0.5\linewidth]{images/swap_test.png}
233+
\end{figure}
234+
235+
\begin{fact} For $\{X_{i,j}\}$ i.i.d random variables with mean $\mu$ and variance $\sigma^2$, let
236+
237+
$$Y := \underset{j \in [\log 1/\delta]}{\operatorname{median}}\;\underset{i \in [1/\epsilon^2]}{\operatorname{mean}}\;X_{i,j}$$
238+
239+
Then $\vert Y - \mu\vert \leq \epsilon\sigma$ with probability $\geq 1-\delta$, using only $O(\frac{1}{\epsilon^2}\log\frac{1}{\delta})$ samples.
240+
\end{fact}
241+
242+
\begin{itemize}
243+
\item In words: We may create a mean estimator from $1/\epsilon^2$ samples of $X$. We compute the median of $\log 1/\delta$ such estimators
244+
\item Catoni (2012) shows that Chebyshev's inequality is the best guarantee one can provide when considering pure empirical mean estimators for an unknown distribution (and finite $\mu, \sigma$)
245+
\item "Median of means" provides an exponential improvement in probability of success ($1 - \delta$) guarantee
246+
\end{itemize}
247+
248+
\begin{corollary} For $x,y \in\CC^n$, given $x \in \mathcal{SQ}$ and $y \in \mathcal{Q}$, we can estimate $\langle x,y\rangle$ to $\epsilon\|x\|\|y\|$ error with probability $\geq 1-\delta$ with query complexity $O(\frac{1}{\epsilon^2}\log\frac{1}{\delta})$
249+
\end{corollary}
250+
\begin{proof}Sample an \textbf{index} $s$ from $x$. Then, define $Z := x_s y_s\frac{\|y\|^2}{|y_s|^2}$. Apply the Fact with $X_{i,j}$ being independent samples $Z$.
251+
\end{proof}
252+
116253
\subsubsection{Least-Square Sample Generation}
117254

118-
Rejection Sampling
255+
\begin{itemize}
256+
\item For $V \in \CC^{n\times k}, w \in \CC^k$, given $V^\dagger \in \mathcal{SQ}$ (\textit{column}-wise sampling of $V$) and $w \in \mathcal{Q}$, we can simulate $Vw \in \mathcal{SQ}$ with $\text{poly}(k)$ queries
257+
\item In words: if we can least-square sample the columns of matrix $V$ and query the entries of vector $w$, then
258+
\begin{enumerate}
259+
\item We can query entries of their multiplication ($Vw$)
260+
\item We can least-square sample from a distribution that emulates their multiplication
261+
\end{enumerate}
262+
263+
\item Hence, as long as $k \ll n$, we can perform each using a number of steps polynomial in the number of columns of $V$.
264+
265+
\end{itemize}
266+
267+
\begin{definition}
268+
Rejection sampling
269+
\end{definition}
270+
\begin{algorithm}
271+
Input: Samples from distribution $P$
272+
273+
Output: Samples from distribution $Q$
274+
\begin{itemize}
275+
\item Sample $s$ from $P$
276+
\item Compute $r_s = \frac{1}{N}\frac{Q(s)}{P(s)}$, for fixed constant $N$
277+
\item Output $s$ with probability $r_s$ and restart otherwise
278+
\end{itemize}
279+
\end{algorithm}
280+
281+
\begin{fact}
282+
Fact. If $r_i \leq 1, \forall i$, then the above procedure is well-defined and outputs a sample from $Q$ in $N$ iterations in expectation.
283+
\end{fact}
284+
285+
286+
\begin{proposition}
287+
For $V \in \RR^{n\times k}$ and $w \in \RR^k$, given $V^\dag \in \mathcal{SQ}$ and $w \in \mathcal{Q}$, we can simulate $Vw \in \mathcal{SQ}$ with expected query complexity $\tilde{O}((\frac{1}{\epsilon^2}\log\frac{1}{\delta}))$
288+
289+
We can compute entries $(Vw)_i$ with $O(k)$ queries.
290+
291+
We can sample using rejection sampling:
292+
293+
\begin{itemize}
294+
\item $P$ is the distribution formed by sampling from $V_{(\cdot, j)}$.
295+
296+
\item $Q$ is the target $Vw$.
297+
\item Hence, compute $r_s$ to be a constant factor of $Q / P$
298+
\end{itemize}
299+
300+
$$r_i = \frac{\|w^T V_{\cdot, i}\|^2}{\|w\|^2\|V_{\cdot, i}\|^2}$$
301+
\end{proposition}
302+
303+
\begin{itemize}
304+
\item Notice that we can compute these $r_i$'s (in fact, despite that we cannot compute probabilities from the target distribution), and that the rejection sampling guarantee is satisfied (via Cauchy-Schwarz).
305+
306+
\item Since the probability of success is $\|Vw\|^2/ \| w\|^2$, it suffices to estimate the probability of success of this rejection sampling process to estimate this norm.
307+
308+
\item Through a Chernoff bound, we see that the average of $O(\|w\|^2(\frac{1}{\epsilon^2}\log\frac{1}{\delta}))$ "coin flips" is in $[(1-\epsilon)\|Vw\|,(1+\epsilon)\|Vw\|]$ with probability $\geq 1-\delta$.
309+
\end{itemize}
119310

120311
\subsubsection{Application: Stochastic Regression}
121312

313+
For a low-rank matrix $A \in \RR^{m\times n}$
314+
and a vector $b \in \RR^n$, given $b, A \in \mathcal{SQ}$, (approximately) simulate $A^+b \in \mathcal{SQ}$.
315+
316+
\begin{algorithm}
317+
\begin{itemize}
318+
\item Low-rank approximation (3) gives us $S,\hat{U} \in \mathcal{SQ}$.
319+
320+
\item Applying thin-matrix vector (2), we get $\hat{V} \in \mathcal{SQ}$, where $\hat{V} := S^T\hat{U}$; we can show that the columns of $\hat{V}$ behave like the right singular vectors of $A$.
321+
\item Let $\hat{U}$ have columns $\{ \hat{u}_i\}$. Hence, $\hat{V}$ has columns $\{ S \hat{u}_i \}$. Write its $i$th column as $\hat{v}_i := S\hat{u}_i$.
322+
323+
\item Low-rank approximation (3) also outputs the approximate singular values $\hat{\sigma}_i$ of $A$
324+
\end{itemize}
325+
\end{algorithm}
326+
327+
Now, we can write the approximate vector we wish to sample in terms of these approximations:
328+
329+
$$A^+b = (A^TA)^+A^Tb \approx \sum_{i=1}^k \frac{1}{\hat{\sigma}_i^2}\hat{v}_i\hat{v}_i^T A^Tb$$
330+
331+
\begin{itemize}
332+
\item We approximate $\hat{v}_i^TA^Tb$ to additive error for all by noticing that $\hat{v}_i^TA^Tb = \tr(A^Tb\hat{v}_i^T)$ is an inner product of $A^T$ and $b\hat{v}_i^T$.
333+
\item Thus, we can apply (1), since being given $A \in \mathcal{SQ}$ implies $A^T \in \mathcal{SQ}$ for $A^T$ viewed as a long vector.
334+
\item Define the approximation of $\hat{v}_i^TA^Tb$ to be $\hat{\lambda}_i$. At this point we have (recalling that $\hat{v}_i := S\hat{u}_i$)
335+
336+
$$A^+b \approx \sum_{i=1}^k \frac{1}{\hat{\sigma}_i^2}\hat{v}_i\hat{\lambda}_i = S \sum_{i=1}^k \frac{1}{\hat{\sigma}_i^2}\hat{u}_i\hat{\lambda}_i$$
337+
338+
\item Finally, using (2) to provide sample access to each $S \hat{u}_i$, we are done ! $\tilde{O}(\kappa^{16}k^6 \|A\|^6_F / \epsilon^6)$ complexity.
339+
\end{itemize}
340+
341+
122342
\subsubsection{Definitions and Assumptions}
123343

124344
Let $b \in \CC^m$ and $A \in \CC^{m \times n}$ s.t. $\Vert A \Vert \leq 1$ where $\Vert \cdot \Vert$ signifies the operator norm (or spectral norm). Furthermore, require that $\rank(A) = k$ and $\Vert A^+ \Vert \leq \kappa$ where $A^+$ is the pseudoinverse of $A$. Hence, observe that $\Vert A \Vert \leq 1$ is equivalent to $A$ having maximum singular value $1$\footnote{To see this, simply consider Spectral Theorem applied to Hermitian matrix $A^\dag A$}. Similarly, $A^+$ has inverted singular values from $A$ and so $\Vert A^+ \Vert$ is equal to the reciprocal of the minimum nonzero singular value. Therefore, the condition number of $A$ is given by $\Vert A \Vert \Vert A^+ \Vert \leq \kappa$.
@@ -392,6 +612,16 @@ \subsubsection{Computing Approximate Singular Vectors}
392612
\end{proof}
393613
\end{theorem}
394614

615+
616+
\subsection{Conclusions}
617+
618+
\begin{itemize}
619+
\item Claim (Tang): For machine learning problems, $\mathcal{SQ}$ assumptions are more reasonable than state preparation assumptions.
620+
\item We discussed pseudo-inverse which inverts singular values, but in principle we could have applied any function to the singular values
621+
\item Gilyen et. al (2018) show that many quantum machine learning algorithms indeed apply polynomial functions to singular values
622+
\item Our discussion suggests that exponential quantum speedups are tightly related to problems where high-rank matrices play a crucial role (e.g. Hamiltonian simulation or QFT)
623+
\end{itemize}
624+
395625
\section{Optimal Quantum Sample Complexity}
396626

397627
This paper\cite{arunachalam2016optimal} provides an instructive example of how one can use quantum information theory to discuss the ability of a quantum learning algorithm to learn from a distribution of quantum states. Perhaps, this is unsurprising given the surface-level connection with quantum state discrimination which has been closely studied in quantum information theory.
@@ -400,7 +630,7 @@ \subsection{Definitions}
400630

401631
\subsubsection{Quantum Learning Models: PAC Setting}
402632

403-
a quantum example oracle $QPEX(c,D)$ acts on $\ket{0}^{\otimes n}\ket{0}$ and produces a quantum example $\sum_{x\in\{0,1\}^n} D(x)\ket{x,c(x)}$.
633+
A quantum example oracle $QPEX(c,D)$ acts on $\ket{0}^{\otimes n}\ket{0}$ and produces a quantum example $\sum_{x\in\{0,1\}^n} D(x)\ket{x,c(x)}$.
404634

405635
A quantum learner is given access to some copies of the state generated by $QPEX(c,D)$ and performs a POVM where each outcome is associated with a hypothesis.
406636

Binary file not shown.
Binary file not shown.

0 commit comments

Comments
 (0)