Skip to content

Commit 0097b63

Browse files
authored
add instructions to readme
1 parent 45511b3 commit 0097b63

File tree

1 file changed

+45
-7
lines changed

1 file changed

+45
-7
lines changed

README.md

+45-7
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,56 @@
11
# Knowledge Graph Analysis
22
Code accompanying our paper "One Knowledge Graph to Rule them All? Analyzing the Differences between DBpedia, YAGO, Wikidata & co."
33

4-
Quantitative analysis of the following Knowledge Graphs (KG):
5-
* DBpedia
6-
* YAGO
7-
* Wikidata
8-
* NELL
9-
* OpenCyc
4+
Quantitative analysis of the following Knowledge Graphs (KGs):
5+
* DBpedia (D)
6+
* YAGO (Y)
7+
* Wikidata (W)
8+
* NELL (N)
9+
* OpenCyc (O)
1010

11-
Approach:
11+
## Approach:
1212
* Get top 10 classes for each KG
1313
* Calculation of class indegree and outdegree
1414
* Get all instances for each class
1515
* Calculation of minimum, average, median, and maximum indegree and outdegree for the instances of each class
1616
* Create a combined list with all top 10 classes and equal classes in other KGs (e.g. with owl:sameAs properties)
1717
* Calculate all degree values for the new classes as well
1818
* Calculate the instance overlap of the classes using different string similarity measures
19+
20+
## Instructions:
21+
1. **/LinkedInstances/*.py** creates files with all linked instances between two KGs.
22+
* Input:
23+
* KG files containing instances and/or links to other instances.
24+
* Output:
25+
* Files containing the combined links between two KGs (e.g. *DO_sameAs_union.nt* for the links between DBpedia and OpenCyc) that are denoted as **#o1**.
26+
* Move those **#o1** files to the */InstanceOverlap/owlSameAs/* folder.
27+
2. **/GetInstances/src/GetInstances.java** creates files that contain all instances of a class including all English labels.
28+
* Input:
29+
* Array with class names for each KG.
30+
* Full KG or just the files containing the instances and labels.
31+
* Output:
32+
* Textfiles containing all instances with all English labels for each class in each KG.
33+
* Saved as *<k_className>InstancesWithLabels.txt* where *k* stands for the abbreviation of the KG (e.g. *d_ActorInstancesWithLabels.txt* for the actor instances in DBpedia). All those files are denoted as **#o2**.
34+
* Move these **#o2** files to the */InstanceOverlap/InstanceLabels/* folder.
35+
3. **/InstanceOverlap/src/InstanceOverlapMain.java** executes the following three steps for each class in the className array for calculating the estimated overlap:
36+
1. **CountSameAs.java** creates files with the linked instances of two classes by e.g. using the *owl:sameAs* property.
37+
* Input:
38+
* Class name.
39+
* **#o1** files with the linked instances in the */InstanceOverlap/owlSameAs/* folder.
40+
* **#o2** files with all English instance labels for the respective class and for each KG in the */InstanceOverlap/InstanceLabels/* folder.
41+
* Output:
42+
* Links between instances for each class1-class2 combination that is used as gold standard (there might be multiple classes that describe the same concept in a single KG, e.g. wordnet_actor_109765278 and wordnet_actor_109767197 in the YAGO KG). These files are saved as *<className1_className2>.tsv* in the */InstanceOverlap/owlSameAs/x2y/* folder (e.g. *Actor_wordnet_actor_109765278.tsv* in the *d2y* folder). These files are denoted as **#o3**.
43+
2. **CountStringSimilarity.java** creates files that contain all found links between two classes using the different string similarity measures (e.g. Jaro, Levenshtein) and different thresholds.
44+
* Input:
45+
* Class name.
46+
* **#o2** files.
47+
* Output:
48+
* Links between the instances of two classes that are found using a specific similarity measure and threshold. The results are saved as *<fromK_2_toK_fromClass_toClass_simMeasure_threshold>.tsv* in the */InstanceOverlap/simMeasureResults/* folder (e.g. *d2y_Actor_wordnet_actor_109765278_jaro_1.0.tsv*). These files are denoted as **#o4**.
49+
3. **EstimatedInstanceOverlap.java**
50+
* Input:
51+
* Class name.
52+
* **#o3** containing linked instances that is used as gold standard.
53+
* **#o4** containing the instances that should be linked based on the respective similarity measure and threshold.
54+
* Output:
55+
* *estimatedOverlap_<className_parameter_timestamp>.csv* files in the */InstanceOverlap/estimatedOverlap/* folder containing instance counts, precision, recall, f-measure, estimatedOverlap, number of links, count of matching alignment, count of partial matching alignment, and true positives for each class1-class2 combination for each class and each KG combination (e.g. *estimatedInstanceOverlap_Actor_wBlockingMax1000000_tokenBk4_2017_02_17_13_35_52.csv*).
56+

0 commit comments

Comments
 (0)