Skip to content

Commit

Permalink
Merge pull request #9 from n1o/duplex
Browse files Browse the repository at this point in the history
Release duplex
  • Loading branch information
n1o authored Feb 10, 2025
2 parents c68371e + 891455e commit b6b480d
Show file tree
Hide file tree
Showing 8 changed files with 125 additions and 1 deletion.
4 changes: 3 additions & 1 deletion content/posts/rl-bite-exploration-vs-exploitation.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,9 @@ The obvious downside is that we do not do any exploration. To fix this there are

Greedy policies above have the downside that they explore actions with equal probability. Boltzman exploration tries to fix it by exploring more promising actions with higher probability.

$$ \pi_{\tau}(a|s) = \frac{\exp(\hat{R}_t(s_t, a)/\tau)}{\sum_{a'}\exp(\hat{R}_t(s_t, a')/\tau)}$$
$$ \sum_{a'}\exp(\hat{R}_t(s_t, a')/\tau)$$

$$\pi_{\tau}(a|s) = \frac{\exp(\hat{R}_t(s_t, a) / \tau )}{\sum_{a'} }$$

- $\tau > 0$ is a temperature parameter, as it gets closer to 0 we get a greedy distribution and with higher temperatures we get a more uniform distribution.

Expand Down
122 changes: 122 additions & 0 deletions content/posts/tldr-duplex.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
+++
draft = false
date = 2025-02-10T09:22:15+01:00
title = "TLDR; Duplex: Dual GAT for Complex Embeddings of Directed Graphs"
description = ""
slug = ""
authors = []
tags = ["TLDR", "GNN", "graph-attention", "directed-graphs"]
categories = ["TLDR", "GNN", "graph-attention", "directed-graphs"]
externalLink = ""
series = ["TLDR", "GNN", "graph-attention", "directed-graphs"]
+++

# Source
- Paper link: https://arxiv.org/abs/2406.05391
- Github Implementation: https://github.com/alipay/DUPLEX

# Abstract
I am a huge fan of Graph Machine Learning, it has a lot of cool applications, and I am particularly interested in Source Code understanding and Vulnerability Detection, where Graph Neural Networks (GNN) are unambiguous. One of the obvious downsides of general GNNs is that they mostly focus on undirected graphs, which makes their approach somewhat limiting for Digraphs (fancy name for directed graphs). This TLDR; is about DUPLEX (I already wrote about its application in [GaLLa](https://codebreakers.re/articles/llm-and-security/galla-graph-aligned-llm)) which is a cool technique to learn node representations in a self-supervised way (can be extended with arbitrary objectives).

# TLDR; DUPLEX
It is a technique for Digraphs where we learn low-dimensional Node representations that can be used for downstream tasks. To fully capture the directionality and the information flow we make the learned Node representations complex valued and we learn them with a dual Graph Attention Network (GAT) encoder. We then reconstruct the learned complex node embedding using two parameter-free decoders.

![Overview](/images/duplex_overview.png)

# Hermitian Adjacency Matrix (HAM)

Normally we can use the Adjacency matrix of a Graph to describe the connection (edges) between nodes, however this does not tell us anything useful about the direction of these edges. To capture the direction we are going to use Hermitian Adjacency Matrix (HAM) where its entry for a pair of nodes $u,v$ is $H_{u,v} \in \{ i, -i, 1, 0\}$ which represents a forward, reverse, bidirectional and no edge between these nodes.

$$ H = A_s \odot \exp(i \frac{\pi}{w} \Theta)$$

- $i$ is the imaginary unit
- $\pi$ is just 3.14...... pi :)
- $\odot$ is the Hadamard (element wise) product
- $A_s$ is the undirected symmetric Adjacency matrix


$$ \Theta(u, v) = \begin{cases} 1, & \text{if } (u, v) \in \mathcal{E}, \\\ -1, & \text{if } (v, u) \in \mathcal{E}, \\\ 0, & \text{otherwise} \end{cases}$$


So the HAM can represent all the possible link directions, but it can also be decomposed as:

$$H = X^T \tilde{X}$$


$$x_{u} = a_{u} \odot \exp (i \frac{\pi}{2} \theta_{u})$$
$$\tilde{x_u} = a_{u} \odot \exp (-i \frac{\pi}{2} \theta_{u}) $$

- $a_u$ is the amplitude and $\theta_u$ is the phase of $x_u$
- $x_u \in C^{d \times 1}$ is complex embedding and $\tilde{x}_u$ is its complex conjugate

Now what we want is to learn for each node is to learn $a_u$ and $\theta_u$ from which we can then construct the complex embedding!

# Dual GAT Encoder
From above it should be obvious we need:

1. **Amplitude Encoder**
2. **Phase Encoder**

Both of them will use GAT under the hood for message passing and we need an extra **Fusion Layer** to share information between them.

## Amplitude Encoder
Here we learn an embedding $a_u$ for node $u$, which captures only the connection information (we do not care about the direction)

$$ a_{u}^{\prime} = \phi (\sum_{v \in \mathcal{N}(u)} f_a(a_{u}, a_{v}) \psi_a(a_{v} ) )$$

- $N(u)$ is the neighborhood of u
- $\phi$ is the activation function here we use ReLU
- $f_a, \psi_a$ is the learnable attention mechanism

## Phase Encoder
We learn an embedding $\theta_u$, the approach is similar to amplitude but with the difference that here we care about the direction information.

![Phase Encoder](/images/duplex_phase_encoder.png)

- the important difference is that there is a subtraction between the in-neighborhood information and out-neighborhood information, this is due to their asymmetry

## Fusion Layer
We combine the information from the amplitude and phase embedding to update the **amplitude embedding** (only)

![Fusion Layer](/images/duplex_fusion_layer.png)

- just a sum of two attention layers passed through a non-linearity, the first GAT derives the key using the amplitude embedding and the second uses phase embedding as key, in both cases we take the whole node neighborhood into account discarding the direction information

This is an example of "mid-fusion", where we integrate embeddings at the network's intermediate layers. We do this instead of early fusion because if there are no node attributes it would introduce only random noise and late-fusion (at the terminal layer) can dilute the unique attributes of the amplitude and phase embeddings.

### Notes
We technically can replace GAT with any other Spatial GNN, also Mamba!

# Two Parameter free Decoders
We learned an amplitude embedding $a_u$ and a phase embedding $\theta_u$ which we can use to construct the complex embedding $x_u$. From these 3 embeddings (well actually we only use $x_u$ and $a_u$) we are going to train two decoders:

1. **Direction aware Decoder**
2. **Connection aware Decoder**

Each has its own supervised loss function, with the total loss of the model defined as the sum of individual losses $\mathcal{L} = \mathcal{L}_d + \lambda \mathcal{L}_c$.

### Direction aware decoder
This decoder focuses on reconstructing the complex-valued HAM of the Digraph:

![Direction Aware Decoder](/images/duplex_direction_aware_decoder.png)

- this calculates the probability of having an edge between (u,v) and the edge type $r$

With the loss is defined as:

![Direction Aware Decoder Loss](/images/duplex_direction_aware_decoder_loss.png)


- here $x_u, \bar{x}_u$ are node embeddings in polar form

### Connection aware decoder
This decoder focuses only on the existence of connections, it can be viewed as an auxiliary to the Direction aware decoder:

![Connection Aware Decoder](/images/duplex_connection_aware_decoder_loss.png)

- $\sigma$ is the sigmoid function with $\hat{A}$ is the estimated undirected Adjacency matrix

- the loss is the same negative sum log likelihood as in direction aware.

# Final Remarks
First of all, in terms of performance it is state of the art when it comes to Digraphs (most papers are! at least during the time of their publishing). The biggest benefit is using GAT for the encoders, this gives a huge benefit since it is enough to aggregate neighborhood information making the model scale to graphs that are similar to the ones inside the training set. Second, the self-supervised method allows pretraining in absence of any labeled data and we can then build models on top of these representations, where we concatenate the phase and amplitude embedding. In case we have labels we can easily extend the learning objective to take them into account!
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/duplex_direction_aware_decoder.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/duplex_fusion_layer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/duplex_overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added static/images/duplex_phase_encoder.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit b6b480d

Please sign in to comment.