Skip to content

What's new in URLFrontier 1.1

Compare
Choose a tag to compare
@jnioche jnioche released this 18 Feb 13:42
· 236 commits to master since this release

This is the initial work towards URL Frontier 2, which is being funded through the NGI0 Discovery Fund.

Please note that the service implementation is now available from Maven, making it easier to write standalone service implementations to extend it.

Logging configuration

The logging is done with Logback. A default configuration is loaded and will dump logs on the console at INFO level and above but the configuration of the logging can be overridden by specifying a configuration file when launching a frontier service, e.g.

java -Dlogback.configurationFile=log-conf.xml ...

The API also has a new endpoint SetLogLevel, which allows changing the level of the logs generated by a running frontier service dynamically. The changes are not persisted between runs of the service.

This is typically done using the CLI

Usage: Client SetLogLevel [-l=STRING] -p=STRING
Change the log level of a package in the Frontier service
  -l, --level=STRING     Log level [TRACE, DEBUG, INFO, WARN, ERROR]
  -p, --package=STRING   package name

for instance

java -jar ~/urlfrontier-client-*.jar SetLogLevel -p crawlercommons.urlfrontier.service -l DEBUG

will ask the Frontier to generate logs at level DEBUG for any class within the crawlercommons.urlfrontier.service package.

Multi-tenancy with crawlIDs

A Frontier instance can now support multi-tenancy in URLFrontier by introducing a concept of crawlID,
therefore handling logical crawls separately e.g. generic crawl vs specific ones. This affects pretty much every endpoint in the API as well as the service implementation.

Please note that these changes are not backward compatible and as a result, an existing frontier generated with a version < 1.1 can be loaded with URLFrontier 1.1 and above.

Two new endpoints have been added to the API in order to deal with crawls as a whole:

  1. ListCrawls
  2. DeleteCrawl