You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently when making an OBNBDataset object the splitter allows the user to specify the gene hold out method using different partitions to create training/validation/testing splits. We would like to be able to hold out all labels associated with a term to evaluate zero-shot learning.
For now, we are envisioning the two splitting strategies below:
Terms are used to create training, validation and testing splits. The first feature we would like to be able to use is to specify a list of term IDs that should be used for each split.
First we would like to specify the list of term IDs for the training and testing splits, then use the existing methods of gene partitioning (RandomRatioHoldout or RatioHoldout) to determine which genes are used for training and validation
Eventually we would also like something similar to the RandomRatioHoldout to partition terms and ultimately RatioHoldout based on different term properties (similar to using PubMedCount to partition genes according to study bias).
My priority would be to get both strategies working with a user specified list of term IDs for each split before moving on to the RandomRatioHoldout for terms.
The text was updated successfully, but these errors were encountered:
Forgot to include this in the original post and if you feel it is different enough that it should be a separate issue please let me know and I'll create one. But it is related to evaluating on unseen terms.
It would be good to be able to add in additional terms to the test set that would not be considered as part of the training set. For example, in the term filtering process first a set of non-redundant terms is found. Then terms whose gene set size is above or below a threshold are removed. It would be nice to be able to add these small gene sets that fall below the threshold and do not have enough examples to train on back into the test set so we can still evaluate on them.
Currently when making an
OBNBDataset
object thesplitter
allows the user to specify the gene hold out method using different partitions to create training/validation/testing splits. We would like to be able to hold out all labels associated with a term to evaluate zero-shot learning.For now, we are envisioning the two splitting strategies below:
Terms are used to create training, validation and testing splits. The first feature we would like to be able to use is to specify a list of term IDs that should be used for each split.
First we would like to specify the list of term IDs for the training and testing splits, then use the existing methods of gene partitioning (
RandomRatioHoldout
orRatioHoldout
) to determine which genes are used for training and validationEventually we would also like something similar to the
RandomRatioHoldout
to partition terms and ultimatelyRatioHoldout
based on different term properties (similar to usingPubMedCount
to partition genes according to study bias).My priority would be to get both strategies working with a user specified list of term IDs for each split before moving on to the
RandomRatioHoldout
for terms.The text was updated successfully, but these errors were encountered: