Skip to content

Commit

Permalink
Add readme about open-source datasets (y-scope#91)
Browse files Browse the repository at this point in the history
  • Loading branch information
kirkrodrigues authored Dec 8, 2022
1 parent ac98b4f commit 1b55438
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 0 deletions.
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ the compressed logs without decompression. To learn more about it, you can read
You can download a release from the [releases](https://github.com/y-scope/clp/releases) page or you can build the latest by using the
[packager](tools/packager/README.md).

For some logs you can use to test CLP, check out our open-source
[datasets](docs/Datasets.md).

## Project Structure

CLP is currently split across a few different components in the [components](components)
Expand Down
16 changes: 16 additions & 0 deletions docs/Datasets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Datasets

This page lists a few large log datasets you can use to try out CLP and evaluate
its compression ratio against other tools. Each dataset is gzipped for more
efficient downloads. We will be uploading more datasets over time.

For evaluation results comparing CLP and other tools, see our
[paper](https://www.usenix.org/system/files/osdi21-rodrigues.pdf).

| Dataset | Uncompressed size | Download size |
|---------------------------------------------------------------------------------|-------------------|---------------|
| [hadoop-14TB-part1<sup>†</sup>](https://zenodo.org/record/7114847#.Y5JbHn3MKHs) | 428.94 GB | 20.33 GB |
| [openstack-24hr](https://zenodo.org/record/7094972#.Y5JbH33MKHs) | 33.00 GB | 2.06 GB |
| [hive-24hr](https://zenodo.org/record/7094921#.Y5JbH33MKHs) | 2.07 GB | 122.54 MB |

*<sup>†</sup> We will upload the other parts soon.*

0 comments on commit 1b55438

Please sign in to comment.