Activity
feat(CCIndexWarcExport): increase number of retries fetching a WARC r…
feat(CCIndexWarcExport): increase number of retries fetching a WARC r…
build: update dependency versions
build: update dependency versions
Force push
Add unit tests for EOT CDX-to-Parquet converter
Add unit tests for EOT CDX-to-Parquet converter
build: update unit test JVM args for Java 17 and 21
build: update unit test JVM args for Java 17 and 21
Force push
fix(javadoc): add missing param and return documentation
fix(javadoc): add missing param and return documentation
build: disable Java 21 in Github workflow because not supported by Sp…
build: disable Java 21 in Github workflow because not supported by Sp…
build: add Github workflow to verify proper compilation and build
build: add Github workflow to verify proper compilation and build
filter out null lines/entries
filter out null lines/entries
Roll back to commons-cli 1.2 to be compatible with Hadoop 3.3.4
Roll back to commons-cli 1.2 to be compatible with Hadoop 3.3.4
Force push
Add CDX-to-Parquet converter prototype for the end-of-term archive
Add CDX-to-Parquet converter prototype for the end-of-term archive
Modified convert_url_index.sh to prefer user classes over those inclu…
Modified convert_url_index.sh to prefer user classes over those inclu…
2023 Sept/Oct crawl, 700k homepages
2023 Sept/Oct crawl, 700k homepages
random sample to create an extracsted warc
random sample to create an extracsted warc
Roll back to commons-cli 1.2 to be compatible with Hadoop 3.3.4
Roll back to commons-cli 1.2 to be compatible with Hadoop 3.3.4
Force push
Fix ordering of function parameters in example SQL query
Fix ordering of function parameters in example SQL query
Bump guava from 31.1-jre to 32.0.0-jre
Bump guava from 31.1-jre to 32.0.0-jre
on Jun 14, 2023
Bump spark-core_2.12 from 3.3.2 to 3.4.0
Bump spark-core_2.12 from 3.3.2 to 3.4.0
on Apr 21, 2023
Add link to Spark documentation to the README, add Trino to Presto
Add link to Spark documentation to the README, add Trino to Presto
Roll back to commons-cli 1.2 to be compatible with Hadoop 3.3.4
Roll back to commons-cli 1.2 to be compatible with Hadoop 3.3.4
Use crawler-commons development version
Use crawler-commons development version
Force push
IndexTable: replace deprecated APIs (gson, commons-cli)
IndexTable: replace deprecated APIs (gson, commons-cli)
IndexTable: replace deprecated APIs (gson, commons-cli)
IndexTable: replace deprecated APIs (gson, commons-cli)
Use crawler-commons development version
Use crawler-commons development version
Force push