Create openml_hard_id_list.txt to include 36 hardest datasets in Table 4 #104
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
To address issue #103 , the full list of hardest IDs are included in a text file.
This is achieved by a fuzzy matching, the non-precisely matched files are
For colic, there are two datasets (
openml__colic__25andopenml__colic__27). After checking metadata,openml__colic__25has 26 features, whileopenml__colic__27only has 22 features. The number of features in Table 4 is 27, which aligns more withopenml__colic__25(maybe inlcuding label column), thusopenml__colic__25is kept in the list.For GesturePhase, the closest match is
openml__GesturePhaseSegmentationProcessed__14969.For 100-plants-texture, the closest match is
openml__one-hundred-plants-texture__9956.