Developed a program which groups text files based on the similarity of their content as a part of Pattern Recognition and Data Mining Course work.
Sample execution:
//input
f1.txt f2.txt f3.txt
f4.txt f5.txt f6.txt
f7.txt f8.txt f9.txt
f10.txt f11 f12.dat
t20.dat t21.dat t29.dat
//output
Group 1: f1.txt, f10.txt, f11
Group 2: f2.txt, f12.dat
Group 3: f3.txt
Group 4: f4.txt
Group 5: f5.txt
Group 6: f6.txt
Group 7: f7.txt
Group 8: f8.txt
Group 9: f9.txt
Group 10: t20.dat
Group 11: t21.dat
Group 12: t29.dat