Home

Chalab Wiki

Chalab is a tool, which helps you design data science or machine learning challenges. A step-by-step Wizard guides you through the process. When you are done, you can compile your challenge as a self-contained zip file (competition bundle) and upload it to a challenge platform. Currently, only Codalab accepts competition bundles created by Chalab. You can view a sample competition of the style you can create with Chalab.

Although Codalab allows you to design very elaborate challenges (with many datasets and phases, and elaborate means of scoring results), this version of Chalab follows a rather rigid pattern to generate "classic" data science challenges. However, such challenges are both with result and CODE submission. This permits comparing solutions proposed by participants in a fair way, using the same computational resources. Instructors using challenges in their classes can easily evaluate and check solutions submitted.

Codalab allows you to submit any kind of Linux executable. You can run your code in a docker of your choice. For simplificy however, all the examples we provide are in Python. We make use of Jupyter notebook for the starting kit and the [Scikit-learn](http://scikit-learn.org/stable/) machine learning library (which includes an excellent machine learning tutorial).

Mini challenge tutorial

Your point of entry into Chalab is the wizard home page, which allows you to select a challenge to edit or create a new challenge. You are then led to the Wizard page allowing you to design a challenge, one step at a time! Conveniently, you may use as template challenges previously created by others (i.e. there is a lot of information already filled in that provides you with further guidance). To understand how to select or create a template, see the Profile and the Group pages.

The Chalab challenge design includes 6 steps:

Data:

Data science challenges designed with Chalab propose supervised learning tasks of CLASSIFICATION or REGRESSION. You must prepare your dataset in the AutoML challenge format, which supports data represented as feature vectors. Full and sparse ([LIBSVM-style])(http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f307)) formats are supported. See the data page for details. We supply several example datasets, which you can choose from in a menu, if you are not ready yet to upload your own data.

Split:

Chalab wants to split your data 3-way into a:

training set (with labels supplied to the participants to train their learning machine)
validation set (with labels concealed to the participants who must predict them)
(final) test set (also with labels concealed to the participants)

The two last sets are both test sets. We need two test sets because we let participants practice solving the problem by making many submissions during a first "development phase", which may last several weeks. They can make up to 5 submissions per day. So in the end, they can "overfit" the validation set easily (basically learn it by heart). We use the (final) test set during the final phase to perform some blind testing: only one try! ### Metric: ### Protocol: ### Baseline: ### Documentation:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Chalab Wiki

Mini challenge tutorial

Data:

Split:

Clone this wiki locally