diff --git a/README.md b/README.md index 3de46f1..2c3cdfd 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,13 @@ -Persian news corpus contains more than 120 million sentences from tnews. -you can download corpus from [here](https://sbuacir-my.sharepoint.com/personal/se_mahmoudi_sbu_ac_ir/Documents/Forms/All.aspx?slrid=5cbcb09e%2D9091%2D7000%2Db143%2D92a4031b9417&RootFolder=%2Fpersonal%2Fse%5Fmahmoudi%5Fsbu%5Fac%5Fir%2FDocuments%2Fsbunlp&FolderCTID=0x01200065B78F960C7F3B4E9E0BBD567D049028) \ No newline at end of file +# embedding-benchmark +Word Embedding benchmark project By Shahid Beheshti University NLP Lab + +Please read [Our Wiki Page](https://github.com/sehsanm/embedding-benchmark/wiki) for more information + +Folder structure : +* data/corpus This must be empty as the codes will downlaod the corpus from some external repository to here. +* data/analogy Contains the analogy dataset(s) +* data/wordsim Contains the word similarity dataset(s) +* data/categories Contains the catgories dataset(s) +* code This folder contains codes that will be used to run all evaluation related tasks and utulities to downlaod the corpus files +* scripts This folder contains cleansing/crawling and any other once off activity that needs to be done. + diff --git a/data/corpus/README.md b/data/corpus/README.md index 9f5efd1..5411660 100644 --- a/data/corpus/README.md +++ b/data/corpus/README.md @@ -16,3 +16,8 @@ You can download the corpus using this [LINK](https://sbuacir-my.sharepoint.com/ irBlogs is a standard Persian weblogs collection that is suitable for studying Persian social networks and evaluation of graph mining and blog retrieval algorithms. You can find the collection [here](http://dbrg.ut.ac.ir/irblogs/) + +## Persian News Corpus +Persian News Corpus contains more than 120 million sentences from tnews. + +You can download corpus from [here](https://sbuacir-my.sharepoint.com/personal/se_mahmoudi_sbu_ac_ir/Documents/Forms/All.aspx?slrid=5cbcb09e%2D9091%2D7000%2Db143%2D92a4031b9417&RootFolder=%2Fpersonal%2Fse%5Fmahmoudi%5Fsbu%5Fac%5Fir%2FDocuments%2Fsbunlp&FolderCTID=0x01200065B78F960C7F3B4E9E0BBD567D049028)