Skip to content

Commit 564ad2a

Browse files
authored
Merge pull request #164 from dakshayahuja/master
added a duplicate finder script
2 parents ea6f5af + 8dccdaa commit 564ad2a

File tree

3 files changed

+106
-1
lines changed

3 files changed

+106
-1
lines changed

Duplicate Finder/Readme.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# Duplicate Finder Script
2+
3+
This script scans a given directory for duplicate files based on their MD5 hash. It provides options to delete or move the duplicate files to another directory.
4+
5+
## Features
6+
7+
- Scan a directory recursively for duplicate files.
8+
- Filter files by minimum size.
9+
- Display a list of duplicate files.
10+
- Option to delete or move the duplicate files.
11+
12+
## Usage
13+
14+
1. Run the script.
15+
2. Enter the directory you want to scan for duplicates.
16+
3. Specify the minimum file size to consider (in bytes). By default, it's set to 0, which means all files will be considered.
17+
4. The script will display a list of duplicate files, if any.
18+
5. Choose an action:
19+
- `(D)elete`: Deletes all but one of each set of duplicate files.
20+
- `(M)ove`: Moves all but one of each set of duplicate files to another directory.
21+
- `(N)o action`: Exits the script without making any changes.
22+
23+
## Notes
24+
25+
- When choosing the delete option, the script keeps the first file it encounters and deletes the rest of the duplicates.
26+
- When choosing the move option, the script keeps the first file it encounters and moves the rest to the specified directory. If the target directory doesn't exist, it will be created.
27+
- The script uses MD5 hashing to identify duplicates. While MD5 is fast, it's not the most secure hashing algorithm. There's a very low probability of hash collisions (different files having the same hash), but it's something to be aware of.
28+
29+
30+
## Disclaimer
31+
32+
Always backup your data before using scripts that modify files. The author is not responsible for any data loss.
33+

Duplicate Finder/duplicate-finder.py

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
import os
2+
import hashlib
3+
4+
def get_file_hash(filepath):
5+
"""Return the MD5 hash of a file."""
6+
hasher = hashlib.md5()
7+
with open(filepath, 'rb') as f:
8+
buf = f.read()
9+
hasher.update(buf)
10+
return hasher.hexdigest()
11+
12+
def find_duplicates(directory, min_size=0):
13+
"""Find duplicate files in a directory."""
14+
hashes = {}
15+
duplicates = {}
16+
17+
for dirpath, dirnames, filenames in os.walk(directory):
18+
for filename in filenames:
19+
filepath = os.path.join(dirpath, filename)
20+
if os.path.getsize(filepath) >= min_size:
21+
file_hash = get_file_hash(filepath)
22+
if file_hash in hashes:
23+
duplicates.setdefault(file_hash, []).append(filepath)
24+
# Also ensure the original file is in the duplicates list
25+
if hashes[file_hash] not in duplicates[file_hash]:
26+
duplicates[file_hash].append(hashes[file_hash])
27+
else:
28+
hashes[file_hash] = filepath
29+
30+
return {k: v for k, v in duplicates.items() if len(v) > 1}
31+
32+
def main():
33+
directory = input("Enter the directory to scan for duplicates: ")
34+
min_size = int(input("Enter the minimum file size to consider (in bytes, default is 0): ") or "0")
35+
36+
duplicates = find_duplicates(directory, min_size)
37+
38+
if not duplicates:
39+
print("No duplicates found.")
40+
return
41+
42+
print("\nDuplicates found:")
43+
for _, paths in duplicates.items():
44+
for path in paths:
45+
print(path)
46+
print("------")
47+
48+
action = input("\nChoose an action: (D)elete, (M)ove, (N)o action: ").lower()
49+
50+
if action == "d":
51+
for _, paths in duplicates.items():
52+
for path in paths[1:]: # Keep the first file, delete the rest
53+
os.remove(path)
54+
print(f"Deleted {path}")
55+
56+
elif action == "m":
57+
target_dir = input("Enter the directory to move duplicates to: ")
58+
if not os.path.exists(target_dir):
59+
os.makedirs(target_dir)
60+
61+
for _, paths in duplicates.items():
62+
for path in paths[1:]: # Keep the first file, move the rest
63+
target_path = os.path.join(target_dir, os.path.basename(path))
64+
os.rename(path, target_path)
65+
print(f"Moved {path} to {target_path}")
66+
67+
else:
68+
print("No action taken.")
69+
70+
if __name__ == "__main__":
71+
main()

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,8 @@ More information on contributing and the general code of conduct for discussion
4242
| Crop Images | [Crop Images](https://github.com/DhanushNehru/Python-Scripts/tree/master/Crop_Images) | A Python script to crop a given image. |
4343
| CSV to Excel | [CSV to Excel](https://github.com/DhanushNehru/Python-Scripts/tree/master/CSVToExcel) | A Python script to convert a CSV to an Excel file. |
4444
| Currency Script | [Currency Script](https://github.com/DhanushNehru/Python-Scripts/tree/master/currency_script) | A Python script to convert the currency of one country to that of another. |
45-
| Digital Clock | [Digital Clock](https://github.com/DhanushNehru/Python-Scripts/tree/master/Digital%20Clock) | A Python script to preview a digital clock in the terminal. |
45+
| Digital Clock | [Digital Clock](https://github.com/DhanushNehru/Python-Scripts/tree/master/Digital%20Clock) | A Python script to preview a digital clock in the terminal.
46+
| Duplicate Finder | [Duplicate Finder](https://github.com/DhanushNehru/Python-Scripts/tree/master/Duplicate%Fnder) | The script identifies duplicate files by MD5 hash and allows deletion or relocation. |
4647
| Display Popup Window | [Display Popup Window](https://github.com/DhanushNehru/Python-Scripts/tree/master/Display%20Popup%20Window) | A Python script to preview a GUI interface to user. |
4748
| Face Reaction | [Face Reaction](https://github.com/DhanushNehru/Python-Scripts/tree/master/Face%20Reaction) | A script which attempts to detect facial expressions. |
4849
| Fake Profiles | [Fake Profiles](https://github.com/DhanushNehru/Python-Scripts/tree/master/Fake%20Profile) | Create fake profiles. |

0 commit comments

Comments
 (0)