Provexa: Enabling Efficient Attack Investigation via Human-in-the-Loop Security Analysis

Overview

Provexa is used to carry out attack investigation from collected system audit logs. It first builds the system provenance data, saves it to a database and provides a Domain-Specific Language (DSL) called ProvQL to carry out the investigation.

Our research paper has been accepted for publication at the 51st International Conference on Very Large Databases (VLDB 2025).

Demo Video

Watch the demo video to learn how to use Provexa for investigation using the WebUI and executing ProvQL queries:

Environment Setup

1. Prerequisites

Install Java 11. This repo is tested with openjdk 11.0.27.

2. Configuration Setup

Copy the example config to setup Postgres and Neo4j configurations:

cp cfg/db.properties.example cfg/db.properties

3. Database Installation

Choose and install your preferred database:

A. PostgreSQL Setup

Install PostgreSQL from the Apt Repository. Once installed, configure the following settings inside cfg/db.properties in the Postgres configs section:

url=jdbc:postgresql://127.0.0.1:5432/ 
dbName=test
username=postgres
password=postgres

Configuration Notes:

If you used the default installation, the url should be left as is
The dbName will be used as the database name when parsing your logs and saving it to the Postgres database
username and password are the credentials you set up when installing Postgres

B. Neo4j Setup

A detailed guide on how to install and configure Neo4j database is provided here.

4. Log Generation

This repo supports audit logs collected from Sysdig. Generate logs in a custom format aligned with our parsing module:

sudo sysdig -p "%evt.num %evt.rawtime.s.%evt.rawtime.ns %evt.cpu %proc.name (%proc.pid) %evt.dir %evt.type cwd=%proc.cwd %evt.args latency=%evt.latency exepath=%proc.exepath cmd= %proc.exeline"  "proc.name!=tmux and (evt.type=read or evt.type=readv or evt.type=write or evt.type=writev or evt.type=fcntl or evt.type=accept or evt.type=execve or evt.type=clone or evt.type=pipe or evt.type=rename or evt.type=sendmsg or evt.type=recvmsg) and proc.name!=sysdig" > output_file.txt

5. Log Parsing and Database Population

Parse the generated logs and populate them to your database:

mvn -q exec:java -Dexec.mainClass=main.SysdigMain -Dexec.args="--logfile=output_file.txt"

Note: A detailed guide on how to setup Sysdig, commandline arguments you can pass and other details is provided here.

Start Investigation

Once the database is populated with the logs, you can use the Provexa WebUI to query the database and investigate the logs.

1. Start the WebUI

cd frontend
npm install  # For first run only
npm start

# if the above fails due to openssl compatibility, try this:
# npm run start-legacy

2. Backend Server

Start the backend server with default settings (port 8080 and Postgres database):

mvn -q exec:java -Dexec.mainClass=main.WebMain

You can pass arguments to customize the server settings. Supported databases are Postgres and Neo4j.

# Use Neo4j database with custom port
mvn -q exec:java -Dexec.mainClass=main.WebMain -Dexec.args="--port=9090 --db=neo4j"

# Use default settings but with Neo4j
mvn -q exec:java -Dexec.mainClass=main.WebMain -Dexec.args="--db=neo4j"

# Show help
mvn -q exec:java -Dexec.mainClass=main.WebMain -Dexec.args="--help"

3. Begin Investigation

Navigate to the WebUI and start writing your ProvQL queries to begin the investigation. ProvQL queries used in the demo video are provided as an exampe.

Command-Line Investigation (Without WebUI)

You can execute ProvQL queries directly from the command line without using the WebUI.

1. Start the backend server

Start the backend server (see Backend Server section in step 2 above).

2. Running Queries

Execute queries using the following command. By default, it uses an example input file:

mvn -q exec:java -Dexec.mainClass=main.ExecutorMain

Use the help option to see how to pass custom input files or use Neo4j database:

mvn -q exec:java -Dexec.mainClass=main.ExecutorMain -Dexec.args="--help"

Citation

If you use our tool, or find it helpful for your research, please cite us using:

@article{tsegai2025provexa,
    author = {Tsegai, Saimon Amanuel and Yang, Xinyu and Liu, Haoyuan and Gao, Peng},
    title = {Enabling Efficient Attack Investigation via Human-in-the-Loop Security Analysis},
    year = {2025},
    issue_date = {July 2025},
    journal = {Proceedings of the VLDB Endowment},
    volume = {18},
    number = {11},
    doi = {10.14778/3749646.3749653},
    pages = {3771–3783},
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
cfg		cfg
docs		docs
examples		examples
frontend		frontend
src/main/java		src/main/java
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Provexa: Enabling Efficient Attack Investigation via Human-in-the-Loop Security Analysis

Overview

Demo Video

Environment Setup

1. Prerequisites

2. Configuration Setup

3. Database Installation

A. PostgreSQL Setup

B. Neo4j Setup

4. Log Generation

5. Log Parsing and Database Population

Start Investigation

1. Start the WebUI

2. Backend Server

3. Begin Investigation

Command-Line Investigation (Without WebUI)

1. Start the backend server

2. Running Queries

Citation

About

Uh oh!

Releases

Packages

Languages

License

peng-gao-lab/provexa

Folders and files

Latest commit

History

Repository files navigation

Provexa: Enabling Efficient Attack Investigation via Human-in-the-Loop Security Analysis

Overview

Demo Video

Environment Setup

1. Prerequisites

2. Configuration Setup

3. Database Installation

A. PostgreSQL Setup

B. Neo4j Setup

4. Log Generation

5. Log Parsing and Database Population

Start Investigation

1. Start the WebUI

2. Backend Server

3. Begin Investigation

Command-Line Investigation (Without WebUI)

1. Start the backend server

2. Running Queries

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages