Deduplidog overview
Launch deduplidog
and change these parameter from GUI/TUI, or set them via CLI.
Find the duplicates.
Normally, the file must have the same size, date and name. (Name might be just similar if parameters like strip_end_counter are set.)
If media_magic=True
, media files receive different rules: Neither the size nor the date are compared. See its help.
work_dir: Path = Path.cwd()
Folder of the files suspectible to be duplicates.
original_dir: Path | None = None
Folder of the original files. Normally, these files will not be affected.
(However, they might get affected by treat_bigger_as_original
or set_both_to_older_date
).
Action
What is to be done with the duplicates.
execute: bool = False
If False, nothing happens, just a safe run is performed.
inspect: bool = False
Print bash commands that correspond to the actions that would have been executed if execute were True. You can check and run them yourself.
rename: bool = False
If execute=True
, prepend ✓ to the duplicated work file name (or possibly to the original file name if treat_bigger_as_original).
Mutually exclusive with other execute action.
delete: bool = False
If execute=True
, delete theduplicated work file name (or possibly to the original file name if treat_bigger_as_original).
Mutually exclusive with other execute action.
replace_with_original: bool = False
If execute=True
, replace duplicated work file with the original (or possibly vice versa if treat_bigger_as_original).
Mutually exclusive with other execute action.
replace_with_symlink: bool = False
If execute=True
, replace duplicated work file with the relative symlink to the original (or possibly vice versa if treat_bigger_as_original). Its modification time is kept.
Mutually exclusive with other execute action.
Execution
Parameters affecting the way the execution runs.
set_both_to_older_date: bool = False
If execute=True
, media_magic=True
or (media_magic=False and ignore_date=True
), both files are set to the older date. Ex: work file get's the original file's date or vice versa.
treat_bigger_as_original: bool = False
If execute=True
and rename=True
and media_magic=True
, the original file might be affected (by renaming) if smaller than the work file.
skip_bigger: bool = False
If media_magic=True
, all writing actions, such as rename
, replace_with_original
, set_both_to_older_date
and treat_bigger_as_original
are executed only if the affectable file is smaller (or the same size) than the other.
skip_empty: bool = False
Skip files with zero size.
neglect_warning: bool = False
By default, when a file with bigger size or older date should be affected, just warning is generated. Turn this to suppress it.
confirm_one_by_one: bool = True
Instead of executing changes all at once, confirm one by one. So that you may decide whether the media similarity detection works. If a warning occurs, the default is 'no' to perform the action.
Match
The way the files are compared.
casefold: bool = False
Case insensitive file name comparing.
checksum: bool = False
If media_magic=False
and ignore_size=False
, files will be compared by CRC32 checksum.
(This mode is considerably slower.)
tolerate_hour: int | tuple[int, int] | bool = False
When comparing files in work_dir and media_magic=False
, tolerate hour difference.
Sometimes when dealing with FS changes, files might got shifted few hours.
* bool → -1 .. +1
* int → -int .. +int
* tuple → int1 .. int2
Ex: tolerate_hour=2 → work_file.st_mtime -7200 ... + 7200 is compared to the original_file.st_mtime
ignore_name: bool = False
Files will not be compared by stem nor suffix.
ignore_date: bool = False
If media_magic=False
, files will not be compared by date.
ignore_size: bool = False
If media_magic=False
, files will not be compared by size.
space2char: bool = False
When comparing files in work_dir, consider space as another char. Ex: "file 012.jpg" is compared as "file_012.jpg"
strip_end_counter: bool = False
When comparing files in work_dir, strip the counter. Ex: "00034(3).MTS" is compared as "00034.MTS"
strip_suffix: str = ''
When comparing files in work_dir, strip the file name end matched by a regular. Ex: "001-edited.jpg" is compared as "001.jpg"
work_file_stem_shortened: int | None = None
Photos downloaded from Google have its stem shortened to 47 chars. For the comparing purpose, treat original folder file names shortened.
invert_selection: bool = False
Match only those files from work_dir that does not match the criterions.
Media
Media files similarity detection.
media_magic: bool = False
Media files similarity detection. Nor the size or date is compared for files with media suffixes. A video is considered a duplicate if it has the same name and a similar number of frames, even if it has a different extension. An image is considered a duplicate if it has the same name and a similar image hash, even if the files are of different sizes. (This mode is considerably slower.)
accepted_frame_delta: int = 1
Number of frames for which two videos are considered equal.
accepted_img_hash_diff: int = 1
Hash difference between images so that they are considered equal, see https://github.com/JohannesBuchner/imagehash
img_compare_date: bool = False
If True and media_magic=True
, the work file date or the work file EXIF date must match the original file date (has to be no more than an hour around).
img_max_size: int = 0
Used only when media_magic is True. In the beginning, we preload the image hash of all the img in the original folder. This makes the hash calculation preload to skip if the file is bigger than this bytes. If you are searching for a relatively small image duplicates, you boost the original image hash caching speed by skipping the large ones.