Skip to content

Commit 5e8b9b8

Browse files
committed
Merge branch 'master' of github.com:lispc/EditDistanceClusterer
2 parents 70f7aac + d5050f3 commit 5e8b9b8

File tree

1 file changed

+3
-5
lines changed

1 file changed

+3
-5
lines changed

README.md

+3-5
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,9 @@
1-
#EditDistanceJoiner
1+
# EditDistanceJoiner
22

33
EditDistanceJoiner is a java library develop by [database group](http://dbgroup.cs.tsinghua.edu.cn/) of Tsinghua University which can help you (a) select similar string pairs and (b) get similar string clusters among lots of strings based on similarity measured by edit distance very effiently.
44

5-
###How do it works ?
5+
### How does it work?
66
This library is based on a method called [PassJoin](http://dbgroup.cs.tsinghua.edu.cn/dd/projects/passjoin/index.html) proposed on VLDB2012, which is proved to be orders of magnitude faster than previous methods. The library can handle a dataset in 2 minutes which costs 70 minutes by naive brute force implementation used in [simile-vicino](https://code.google.com/p/simile-vicino/), besides, unlike simile-vicino which uses blocking methods to speed up clustering with the loss of accuracy, this library can generate accurate results.
77

8-
###Usage
8+
### Usage
99
This library use similar interface with simile-vicino. You can have a look at the samples in joining and clustering at [EditDistanceClustererTest](sample/edu/tsinghua/dbgroup/sample/EditDistanceClustererTest.java) and [EditDistanceJoinerTest](sample/edu/tsinghua/dbgroup/sample/EditDistanceJoinerTest.java)
10-
11-

0 commit comments

Comments
 (0)