|
| 1 | +---------------------------------------------- |
| 2 | +-- WORDNET TENSOR DATA -- A. Bordes -- 2013 -- |
| 3 | +---------------------------------------------- |
| 4 | + |
| 5 | +------------------ |
| 6 | +OUTLINE: |
| 7 | +1. Introduction |
| 8 | +2. Content |
| 9 | +3. Data Format |
| 10 | +4. Data Statistics |
| 11 | +5. How to Cite |
| 12 | +6. License |
| 13 | +7. Contact |
| 14 | +------------------- |
| 15 | + |
| 16 | + |
| 17 | +1. INTRODUCTION: |
| 18 | + |
| 19 | +This WORDNET TENSOR DATA consists of a collection of triplets (synset, relation_type, |
| 20 | +triplet) extracted from WordNet 3.0 (http://wordnet.princeton.edu). This data set can |
| 21 | +be seen as a 3-mode tensor depicting ternary relationships between synsets. |
| 22 | + |
| 23 | + |
| 24 | +2. CONTENT: |
| 25 | + |
| 26 | +The data archive contains 6 files: |
| 27 | + - README 3K |
| 28 | + - wordnet-mlj12-definitions.txt 4,2M |
| 29 | + - wordnet-mlj12-train.txt 4,5M |
| 30 | + - wordnet-mlj12-valid.txt 165K |
| 31 | + - wordnet-mlj12-test.txt 165K |
| 32 | + |
| 33 | +The 3 files wordnet-mlj12-*.txt contain the triplets (training, validation |
| 34 | +and test sets), while the file wordnet-mlj12-definitions.txt lists the WordNet |
| 35 | +synsets definitions. |
| 36 | + |
| 37 | + |
| 38 | +3. DATA FORMAT |
| 39 | + |
| 40 | +The definitions file (wordnet-mlj12-definitions.txt) contains one synset |
| 41 | +per line with the following format: synset_id (a 8-digit unique identifier) |
| 42 | +intelligible name (word+POS_tag+sense_index), definition. The previous 3 |
| 43 | +pieces of information are separated by a tab ('\t'). |
| 44 | + |
| 45 | +All wordnet-mlj12-*.txt files contain one triplet per line, with 2 synset_ids |
| 46 | +and relation type identifier in a tab separated format. The first element is the |
| 47 | +synset_id of the left hand side of the relation triple, the third one is the |
| 48 | +synset_id of the right hand side and the second element is the name of the type |
| 49 | +of relations between them. |
| 50 | + |
| 51 | + |
| 52 | +4. DATA STATISTICS |
| 53 | + |
| 54 | +There are 40,943 synsets and 18 relation types among them. The training set contains |
| 55 | +141,442 triplets, the validation set 5,000 and the test set 5,000. |
| 56 | + |
| 57 | +All triplets are unique and we made sure that all synsets appearing in |
| 58 | +the validation or test sets were occurring in the training set. |
| 59 | + |
| 60 | +5. HOW TO CITE |
| 61 | + |
| 62 | +When using this data, one should cite the original paper: |
| 63 | + @article{bordes-mlj13, |
| 64 | + title = {A Semantic Matching Energy Function for Learning with Multi-relational Data}, |
| 65 | + author = {Antoine Bordes and Xavier Glorot and Jason Weston and Yoshua Bengio}, |
| 66 | + journal={Machine Learning}, |
| 67 | + publisher={Springer}, |
| 68 | + year={2013}, |
| 69 | + note={to appear} |
| 70 | + } |
| 71 | + |
| 72 | +One should also point at the project page with either the long URL: |
| 73 | +https://www.hds.utc.fr/everest/doku.php?id=en:smemlj12 , or the short |
| 74 | +one: http://goo.gl/bHWsK . |
| 75 | + |
| 76 | +6. LICENSE: |
| 77 | + |
| 78 | +WordNet data follows the attach license agreement. |
| 79 | + |
| 80 | +7. CONTACT |
| 81 | + |
| 82 | +For all remarks or questions please contact Antoine Bordes: antoine |
| 83 | +(dot) bordes (at) utc (dot) fr . |
| 84 | + |
| 85 | + |
| 86 | + |
0 commit comments