Skip to content

First version of documentation #192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions .gitattributes

This file was deleted.

28 changes: 0 additions & 28 deletions .github/workflows/deploy-pr-preview.yml

This file was deleted.

37 changes: 37 additions & 0 deletions .github/workflows/doc_preview.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: Documentation Preview

on:
pull_request:
types:
- opened
- closed
- synchronize
- reopened
paths:
- mkdocs.yml
- docs/**
- .github/workflows/doc_preview.yml

concurrency:
group: gh-pages

jobs:
deploy-preview:
name: Preview documentation
runs-on: ubuntu-latest
permissions:
contents: write
pull-requests: write

steps:
- name: Checkout Repository
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Create Documentation Preview
uses: secure-software-engineering/actions/documentation/handle-pr-preview@develop
with:
token: ${{ secrets.GITHUB_TOKEN }}
preview-name: pr-${{ github.event.pull_request.number }}
preview-title: Preview for PR-${{ github.event.pull_request.number }}
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,4 @@ shippable/
*.prefs
*.xml
**/target
**/site
169 changes: 169 additions & 0 deletions docs/boomerang/allocation_sites.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,169 @@
# Defining Allocation Sites

Boomerang provides an interface that allows the definition of individual allocation sites. An allocation site is a value that should be considered as a points-to object.


## Allocation Site Interface

To define an individual allocation site, we have to implement the `IAllocationSite` interface and override its method `getAllocationSite(...)` that returns an optional `AllocVal`.
An `AllocVal` represents an allocation site and acts as a wrapper for the allocation site statement and value.
If the optional is present, the `AllocVal` is added to the resulting allocation sites.

When performing a backward analysis, Boomerang calls this method on each statement on each data-flow path.
It provides three parameters to the method `getAllocationSite`:

- Method: The current method
- Statement: The current statement that may contain an allocation site
- Val: The current propagated data-flow fact

These parameters necessitate two checks that should be part of each allocation site implementation:

- Check whether the statement is an assignment
- Check whether the left operand of the assignment is equal to the propagated data-flow fact

The first point is relevant because an allocation site is defined as an assignment.
The second aspect is relevant to avoid returning statements that are not relevant to the points-to analysis.
Boomerang propagates only data-flow facts that are relevant to or alias with the query variable.
Therefore, one can exclude irrelevant assignments with the second check.

To this end, a self-defined allocation site should have at least the following code:

```java
public class ExtendedAllocationSite implements IAllocationSite {

@Override
public Optional<AllocVal> getAllocationSite(Method method, Statement statement, Val fact) {
// Check for assignments
if (!statement.isAssignStmt()) {
return Optional.empty();
}

Val leftOp = statement.getLeftOp();
Val rightOp = statement.getRightOp();
// Check for correct data-flow fact
if (!leftOp.equals(fact)) {
return Optional.empty();
}

// rightOp is a potential allocation site
...
}
}
```

Last, to use our self-defined allocation site, we need to add it to the options:

```java
BoomerangOptions options =
BoomerangOptions.builder()
.withAllocationSite(new ExtendedAllocationSite())
...
.build();
```

## Simple Allocation Site

To show how an implementation of the `IAllocationSite` interface may look like, we consider the following simple example:

Assume our program requires *constants* and *new expressions* as allocation sites.
Then, the interface implementation may look like this:

```java
public class SimpleAllocationSite implements IAllocationSite {

@Override
public Optional<AllocVal> getAllocationSite(Method method, Statement statement, Val fact) {
// Check for assignments
if (!statement.isAssignStmt()) {
return Optional.empty();
}

Val leftOp = statement.getLeftOp();
Val rightOp = statement.getRightOp();
// Check for correct data-flow fact
if (!leftOp.equals(fact)) {
return Optional.empty();
}

// Constant allocation sites: var = <constant>
if (rightOp.isConstant()) {
AllocVal allocVal = new AllocVal(leftOp, statement, rightOp);
return Optional.of(allocVal);
}

// New expressions: var = new java.lang.Object
if (rightOp.isNewExpr()) {
AllocVal allocVal = new AllocVal(leftOp, statement, rightOp);
return Optional.of(allocVal);
}

return Optional.empty();
}
}
```

Using this allocation site implementation, Boomerang returns values that are either *new expressions* (e.g. `new java.lang.Object`) or *constants* (e.g. int, String etc.).

## Allocation Site with DataFlowScope

In many cases, we are interested in finding an allocation site to analyze it.
However, a common scenario where Boomerang cannot find an allocation site occurs when a data-flow path ends because we have a function call that is not part of the application.
For example, using the `SimpleAllocationSite` from the previous section, Boomerang would not find an allocation site in the following program:

```java
String s = System.getProperty("property"); // Most precise allocation site
...
queryFor(s);
```

Boomerang does not compute an allocation site because `System.getProperty("property")` is not a *constant* or a *new expression*.
Additionally, we may be interested in analyzing only our own application, that is, we do not load the JDK class `java.lang.System` and exclude it in the `DataFlowScope`.
In this case, Boomerang returns an empty results set because the data-flow path ends at the call `System.getProperty("property")`.

To cover these scenarios, we can include the `DataFlowScope` in the allocation site implementation.
For example, we can extend the [DefaultAllocationSite](https://github.com/secure-software-engineering/Boomerang/blob/develop/boomerangPDS/src/main/java/boomerang/options/DefaultAllocationSite.java) as follows:

```java
public class ExtendedDataFlowScope extends DefaultAllocationSite {

private final DataFlowScope dataFlowScope;

public ExtendedDataFlowScope(DataFlowScope dataFlowScope) {
this.dataFlowScope = dataFlowScope;
}

@Override
public Optional<AllocVal> getAllocationSite(Method method, Statement statement, Val fact) {
// Check for assignments
if (!statement.isAssignStmt()) {
return Optional.empty();
}

Val leftOp = statement.getLeftOp();
Val rightOp = statement.getRightOp();
// Check for correct data-flow fact
if (!leftOp.equals(fact)) {
return Optional.empty();
}

// Check for function calls that would end the data-flow path
// If the function call is not excluded, Boomerang can continue with the analysis
if (statement.containsInvokeExpr()) {
InvokeExpr invokeExpr = statement.getInvokeExpr();
DeclaredMethod declaredMethod = invokeExpr.getDeclaredMethod();

if (dataFlowScope.isExcluded(declaredMethod)) {
// rightOp is the invoke expression
AllocVal allocVal = new AllocVal(leftOp, statement, rightOp);
return Optional.of(allocVal);
}
}

// If the statement does not contain a function call, we continue with the default behavior
return super.getAllocationSite(method, statement, fact);
}
}
```

With this implementation, we cover function calls that would end the analysis, and we can conclude that the allocation site cannot be computed precisely.
For example, having `System.getProperty("property")` as allocation site indicates that the query variable points to some object that depends on some system variables at runtime.
103 changes: 103 additions & 0 deletions docs/boomerang/boomerang_setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
# Boomerang Setup

Boomerang's purpose is the computation of points-to information for a variable on-demand.
Starting at a specific statement, it traverses the program and its data-flow paths backwards until it finds an allocation site for the desired variable.
While doing that, it computes relevant alias information.

In the following sections, we give an overview of relevant constructs and API calls.
We highly recommend to take a look at the [Examples](./../boomerang/examples.md) to see the best way to combine these constructs.

## Backward Queries

Boomerang uses *backward queries* to compute relevant points-to information.
A **BackwardQuery** consists of a statement `s` and a variable `v`. `s` is the starting statement where the backwards analysis starts and `v` is the data-flow fact to solve for.

Backward queries can be easily constructed.
However, due to Boomerang's scope implementation, we need to specify the corresponding control-flow graph edge with the starting statement `s` as target (see the [Boomerang Scopes](./../general/boomerang_scope.md)).
With that, we can construct a backward query as follows:

```java
public void createBackwardQuery(ControlFlowGraph.Edge, edge, Val fact) {
BackwardQuery query = BackwardQuery.make(edge, fact);
}
```

## Running Boomerang

Boomerang requires a [FrameworkScope](./../general/framework_scopes.md) and a set [Options](./../boomerang/options.md). With that, we can solve a backward query as follows:

```java
public void solveQuery(
BackwardQuery query,
FrameworkScope scope,
BoomerangOptions options) {
Boomerang solver = new Boomerang(scope, options);
BackwardBoomerangResults<NoWeight> results = solver.solve(query);
}
```

The call to `solve` solves the query and returns a wrapper for the results.

!!! Important:
A `Boomerang` instance can be used to solve exactly one query.
If you want to solve multiple queries with the same instance, you have to set [allowMultipleQueries]() in the options to `true` and you have to call `unregisterAllListeners()` after each call to `solve`.
This may look like this:

```java
public void solveQueries(
Collection<BackwardQuery> queries,
FrameworkScope scope,
BoomerangOptions options) {
Boomerang solver = new Boomerang(scope, options);

for (BackwardQuery query : queries) {
BackwardBoomerangResults<NoWeight> results = solver.solve(query);
// <Process or store the results>
solver.unregisterAllListeners();
}
}
```

## Extracting Allocation Sites

After running Boomerang, we can use the results to compute the allocation sites, i.e. the objects the query variable points to. An allocation site `AllocVal` is wrapped into a `ForwardQuery` object. Note that the computed allocation sites heavily depend on the used [AllocationSite](./../boomerang/allocation_sites.md) definition. We can extract the corresponding `AllocVal` objects as follows:

```java
public void extractAllocationSites(BackwardBoomerangResults<NoWeight> results) {
// Compute the allocation sites
Collection<ForwardQuery> allocationSites = results.getAllocationSites().keySet();

for (ForwardQuery query : allocationSites) {
// This is a single allocation site
AllocVal allocVal = query.getAllocVal();
System.out.println(
"Query variable points to "
+ allocVal.getAllocVal()
+ " @ statement"
+ allocVal.getAllocStatement()
+ " @ line "
+ allocVal.getAllocStatement().getLineNumber()
+ " in method "
+ allocVal.getAllocStatement().getMethod());
}
}
```

## Extracting Aliases

Beside the allocation sites, we can use the results to compute the aliases for the query variable. An alias is represented by an `AccessPath` that holds the base variable and the field chain. For example, an alias `x.f.g` is represented by an `AccessPath` with the base `x` and the field chain `[f, g]`. We can compute the access paths as follows:

```java
public void extractAliases(BackwardBoomerangResults<NoWeight> results) {
Collection<AccessPath> aliases = results.getAllAliases();

System.out.println("Found the following aliases:")
for (AccessPath alias : aliases) {
// 'toCompactString()' transforms the access path into a basic String, e.g. x.f.g
System.out.println(alias.toCompactString());
}

}
```

// TODO Aliases at specific statement
19 changes: 19 additions & 0 deletions docs/boomerang/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Examples

## Taint Analysis
A **Taint Analysis** is a common use case for Boomerang. Our goal is to decide whether a variable points to a specific object (*source*) (e.g. a password) that is unintentionally used as parameter in a method call (*sink*) (e.g. a print statement).

Assume we have the following program:

```java
A a1 = new A(); // Object o
A a2 = a1; // Create an alias, i.e. a1 and a2 point to o

Object s = source(); // Read some tainted value
a1.f = s; // Store tainted value in field of o

Object z = a2.f; // Read the field from o
sink(z); // Is the tainted value used in the sink?
```

In this program, the variable `s` points to some tainted value that should not be used in a sink. Although `s` aliases with the field `f` of `a1` and we read the field `f` of `a2`, the tainted value `s` is still used in the sink because `a1` and `a2` alias.
1 change: 1 addition & 0 deletions docs/boomerang/options.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# Boomerang Options
1 change: 0 additions & 1 deletion docs/examples.md

This file was deleted.

Loading