-
Notifications
You must be signed in to change notification settings - Fork 4
Home
Chalab is a tool, which helps you design data science or machine learning challenges. A step-by-step Wizard guides you through the process. When you are done, you can compile your challenge as a self-contained zip file (competition bundle) and upload it to a challenge platform. Currently, only Codalab accepts competition bundles created by Chalab. You can view a sample competition of the style you can create with Chalab.
Although Codalab allows you to design very elaborate challenges (with many datasets and phases, and elaborate means of scoring results), this version of Chalab follows a rather rigid pattern to generate "classic" data science challenges. However, such challenges are both with result and CODE submission. This permits comparing solutions proposed by participants in a fair way, using the same computational resources. Instructors using challenges in their classes can easily evaluate and check solutions submitted.
Codalab allows you to submit any kind of Linux executable. You can run your code in a docker of your choice. For simplificy however, all the examples we provide are in Python. We make use of Jupyter notebook for the starting kit and the [Scikit-learn](http://scikit-learn.org/stable/) machine learning library (which includes an excellent machine learning tutorial).
Your point of entry into Chalab is the wizard home page, which allows you to select a challenge to edit or create a new challenge. You are then led to the Wizard page allowing you to design a challenge, one step at a time! Conveniently, you may use as template challenges previously created by others (i.e. there is a lot of information already filled in that provides you with further guidance). To understand how to select or create a template, see the Profile and the Group pages.
The Chalab challenge design includes 6 steps:
Data science challenges designed with Chalab propose supervised learning tasks of CLASSIFICATION or REGRESSION. You must prepare your dataset in the AutoML challenge format, which supports data represented as feature vectors. Full and sparse ([LIBSVM-style])(http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f307)) formats are supported. See the data page for details. We supply several example datasets, which you can choose from in a menu, if you are not ready yet to upload your own data.
Chalab wants to split your data 3-way into a:
- training set (with labels supplied to the participants to train their learning machine)
- validation set (with labels concealed to the participants who must predict them)
- (final) test set (also with labels concealed to the participants)