Skip to content

Deduplidog overview

Launch deduplidog and change these parameter from GUI/TUI, or set them via CLI.

Find the duplicates.

Normally, the file must have the same size, date and name. (Name might be just similar if parameters like strip_end_counter are set.)

If media_magic=True, media files receive different rules: Neither the size nor the date are compared. See its help.

work_dir: Path = Path.cwd()

Folder of the files suspectible to be duplicates.

original_dir: Path | None = None

Folder of the original files. Normally, these files will not be affected. (However, they might get affected by treat_bigger_as_original or set_both_to_older_date).

Action

What is to be done with the duplicates.

execute: bool = False

If False, nothing happens, just a safe run is performed.

inspect: bool = False

Print bash commands that correspond to the actions that would have been executed if execute were True. You can check and run them yourself.

rename: bool = False

If execute=True, prepend ✓ to the duplicated work file name (or possibly to the original file name if treat_bigger_as_original). Mutually exclusive with other execute action.

delete: bool = False

If execute=True, delete theduplicated work file name (or possibly to the original file name if treat_bigger_as_original). Mutually exclusive with other execute action.

replace_with_original: bool = False

If execute=True, replace duplicated work file with the original (or possibly vice versa if treat_bigger_as_original). Mutually exclusive with other execute action.

If execute=True, replace duplicated work file with the relative symlink to the original (or possibly vice versa if treat_bigger_as_original). Its modification time is kept. Mutually exclusive with other execute action.

Execution

Parameters affecting the way the execution runs.

set_both_to_older_date: bool = False

If execute=True, media_magic=True or (media_magic=False and ignore_date=True), both files are set to the older date. Ex: work file get's the original file's date or vice versa.

treat_bigger_as_original: bool = False

If execute=True and rename=True and media_magic=True, the original file might be affected (by renaming) if smaller than the work file.

skip_bigger: bool = False

If media_magic=True, all writing actions, such as rename, replace_with_original, set_both_to_older_date and treat_bigger_as_original are executed only if the affectable file is smaller (or the same size) than the other.

skip_empty: bool = False

Skip files with zero size.

neglect_warning: bool = False

By default, when a file with bigger size or older date should be affected, just warning is generated. Turn this to suppress it.

confirm_one_by_one: bool = True

Instead of executing changes all at once, confirm one by one. So that you may decide whether the media similarity detection works. If a warning occurs, the default is 'no' to perform the action.

Match

The way the files are compared.

casefold: bool = False

Case insensitive file name comparing.

checksum: bool = False

If media_magic=False and ignore_size=False, files will be compared by CRC32 checksum. (This mode is considerably slower.)

tolerate_hour: int | tuple[int, int] | bool = False

When comparing files in work_dir and media_magic=False, tolerate hour difference. Sometimes when dealing with FS changes, files might got shifted few hours. * bool → -1 .. +1 * int → -int .. +int * tuple → int1 .. int2 Ex: tolerate_hour=2 → work_file.st_mtime -7200 ... + 7200 is compared to the original_file.st_mtime

ignore_name: bool = False

Files will not be compared by stem nor suffix.

ignore_date: bool = False

If media_magic=False, files will not be compared by date.

ignore_size: bool = False

If media_magic=False, files will not be compared by size.

space2char: bool = False

When comparing files in work_dir, consider space as another char. Ex: "file 012.jpg" is compared as "file_012.jpg"

strip_end_counter: bool = False

When comparing files in work_dir, strip the counter. Ex: "00034(3).MTS" is compared as "00034.MTS"

strip_suffix: str = ''

When comparing files in work_dir, strip the file name end matched by a regular. Ex: "001-edited.jpg" is compared as "001.jpg"

work_file_stem_shortened: int | None = None

Photos downloaded from Google have its stem shortened to 47 chars. For the comparing purpose, treat original folder file names shortened.

invert_selection: bool = False

Match only those files from work_dir that does not match the criterions.

Media

Media files similarity detection.

media_magic: bool = False

Media files similarity detection. Nor the size or date is compared for files with media suffixes. A video is considered a duplicate if it has the same name and a similar number of frames, even if it has a different extension. An image is considered a duplicate if it has the same name and a similar image hash, even if the files are of different sizes. (This mode is considerably slower.)

accepted_frame_delta: int = 1

Number of frames for which two videos are considered equal.

accepted_img_hash_diff: int = 1

Hash difference between images so that they are considered equal, see https://github.com/JohannesBuchner/imagehash

img_compare_date: bool = False

If True and media_magic=True, the work file date or the work file EXIF date must match the original file date (has to be no more than an hour around).

img_max_size: int = 0

Used only when media_magic is True. In the beginning, we preload the image hash of all the img in the original folder. This makes the hash calculation preload to skip if the file is bigger than this bytes. If you are searching for a relatively small image duplicates, you boost the original image hash caching speed by skipping the large ones.

Helper

Helper settings.

log_level: int = logging.WARNING

10 debug .. 50 critical

output: bool = False

Stores the output log to a file in the current working directory. (Never overwrites an older file.)