Create openml_hard_id_list.txt to include 36 hardest datasets in Table 4 #104

JerryLife · 2025-03-13T05:00:02Z

To address issue #103 , the full list of hardest IDs are included in a text file.

This is achieved by a fuzzy matching, the non-precisely matched files are

For colic, there are two datasets (openml__colic__25 and openml__colic__27). After checking metadata, openml__colic__25 has 26 features, while openml__colic__27 only has 22 features. The number of features in Table 4 is 27, which aligns more with openml__colic__25 (maybe inlcuding label column), thus openml__colic__25 is kept in the list.
For GesturePhase, the closest match is openml__GesturePhaseSegmentationProcessed__14969.
For 100-plants-texture, the closest match is openml__one-hundred-plants-texture__9956.

…Table 4

Create openml_hard_id_list.txt, including the 36 hardest datasets in …

322b66f

…Table 4

Provide feedback