Replies: 1 comment
-
Hi! IMO it makes the most sense to have one row per subject and, therefore, define the features as follows: features = Features(
{
"subject_name": Value("string"),
"rest": Array2D(...),
"motor_imagery_arrays": Array3D(shape=(None, ...), ...)}), # 3D to represent a list of 2D arrays; Sequence(Array2D(...)) is not supported at the moment
"motor_imagery_labels": Sequence(ClassLabel(...)),
}
) @lhoestq This is one instance where support for |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everybody,
I'm trying to create a hierarchical dataset and I'd like to hear some opinions on how to do it best. Here's a tree structure diagram that shows what it looks like:
The dataset comprises a collection of EEG recordings from over fifty different people (subjects). Each subject in the dataset has two types of recordings: resting state recordings and motor imagery recordings. The resting state recordings are just a large 2D array, whereas the motor imagery recordings comprise many labeled EEG segments.
What I've considered
At first, I thought about segmenting the dataset using dataset configurations. However, I'd need to create a configuration for each
(subject, recording type)
pair to do that, resulting in more than a hundred different configurations, each of each would have to be encoded by some name (string), which seems even more inappropriate. I also considered creating a nestedDatasetDict
somehow, but haven't managed to find a reasonable way to do that.Question
How should I go about creating a dataset like this with Huggingface
datasets
?Dataset
?rest
andmotor-imagery
as dataset configs, flatten thesubject
dimension, and addsubject_name
as a feature of each recording?I'd like to emphasize that, during the training loop, I'll need to fetch the corresponding resting state recordings for any given motor imagery recording I access. In other words, whenever I fetch some motor imagery recording from
subject_03
, I'll need to getsubject_03
's resting state recordings.Beta Was this translation helpful? Give feedback.
All reactions