Deduplidog overview

Launch deduplidog and change these parameter from GUI/TUI, or set them via CLI.

Find the duplicates.

Normally, the file must have the same size, date and name. (Name might be just similar if parameters like strip_end_counter are set.)

If media_magic=True, media files receive different rules: Neither the size nor the date are compared. See its help.

`work_dir: Path = Path.cwd()`

Folder of the files suspectible to be duplicates.

`original_dir: Path | None = None`

Folder of the original files. Normally, these files will not be affected. (However, they might get affected by treat_bigger_as_original or set_both_to_older_date).

Action

What is to be done with the duplicates.

`execute: bool = False`

If False, nothing happens, just a safe run is performed.

`inspect: bool = False`

Print bash commands that correspond to the actions that would have been executed if execute were True. You can check and run them yourself.

`rename: bool = False`

If execute=True, prepend ✓ to the duplicated work file name (or possibly to the original file name if treat_bigger_as_original). Mutually exclusive with other execute action.

`delete: bool = False`

If execute=True, delete theduplicated work file name (or possibly to the original file name if treat_bigger_as_original). Mutually exclusive with other execute action.

`replace_with_original: bool = False`

If execute=True, replace duplicated work file with the original (or possibly vice versa if treat_bigger_as_original). Mutually exclusive with other execute action.

`replace_with_symlink: bool = False`

If execute=True, replace duplicated work file with the relative symlink to the original (or possibly vice versa if treat_bigger_as_original). Its modification time is kept. Mutually exclusive with other execute action.

Execution

Parameters affecting the way the execution runs.

`set_both_to_older_date: bool = False`

If execute=True, media_magic=True or (media_magic=False and ignore_date=True), both files are set to the older date. Ex: work file get's the original file's date or vice versa.

`treat_bigger_as_original: bool = False`

If execute=True and rename=True and media_magic=True, the original file might be affected (by renaming) if smaller than the work file.

`skip_bigger: bool = False`

If media_magic=True, all writing actions, such as rename, replace_with_original, set_both_to_older_date and treat_bigger_as_original are executed only if the affectable file is smaller (or the same size) than the other.

`skip_empty: bool = False`

Skip files with zero size.

`neglect_warning: bool = False`

By default, when a file with bigger size or older date should be affected, just warning is generated. Turn this to suppress it.

`confirm_one_by_one: bool = True`

Instead of executing changes all at once, confirm one by one. So that you may decide whether the media similarity detection works. If a warning occurs, the default is 'no' to perform the action.

Match

The way the files are compared.

`casefold: bool = False`

Case insensitive file name comparing.

`checksum: bool = False`

If media_magic=False and ignore_size=False, files will be compared by CRC32 checksum. (This mode is considerably slower.)

`tolerate_hour: int | tuple[int, int] | bool = False`

When comparing files in work_dir and media_magic=False, tolerate hour difference. Sometimes when dealing with FS changes, files might got shifted few hours. * bool → -1 .. +1 * int → -int .. +int * tuple → int1 .. int2 Ex: tolerate_hour=2 → work_file.st_mtime -7200 ... + 7200 is compared to the original_file.st_mtime

`ignore_name: bool = False`

Files will not be compared by stem nor suffix.

`ignore_date: bool = False`

If media_magic=False, files will not be compared by date.

`ignore_size: bool = False`

If media_magic=False, files will not be compared by size.

`space2char: bool = False`

When comparing files in work_dir, consider space as another char. Ex: "file 012.jpg" is compared as "file_012.jpg"

`strip_end_counter: bool = False`

When comparing files in work_dir, strip the counter. Ex: "00034(3).MTS" is compared as "00034.MTS"

`strip_suffix: str = ''`

When comparing files in work_dir, strip the file name end matched by a regular. Ex: "001-edited.jpg" is compared as "001.jpg"

`work_file_stem_shortened: int | None = None`

Photos downloaded from Google have its stem shortened to 47 chars. For the comparing purpose, treat original folder file names shortened.

`invert_selection: bool = False`

Match only those files from work_dir that does not match the criterions.

Media

Media files similarity detection.

`media_magic: bool = False`

Media files similarity detection. Neither the size, date nor suffix is compared for files with media suffixes. A video is considered a duplicate if it has the same name and a similar number of frames, even if it has a different extension. An image is considered a duplicate if it has the same name and a similar image hash, even if the files are of different sizes. (This mode is considerably slower.)

`accepted_frame_delta: int = 1`

Number of frames for which two videos are considered equal.

`accepted_img_hash_diff: int = 1`

Hash difference between images so that they are considered equal, see https://github.com/JohannesBuchner/imagehash

`img_compare_date: bool = False`

If True and media_magic=True, the work file date or the work file EXIF date must match the original file date (has to be no more than an hour around).

`img_max_size: int = 0`

Used only when media_magic is True. In the beginning, we preload the image hash of all the img in the original folder. This makes the hash calculation preload to skip if the file is bigger than this bytes. If you are searching for a relatively small image duplicates, you boost the original image hash caching speed by skipping the large ones.

Helper

Helper settings.

`log_level: int = logging.WARNING`

10 debug .. 50 critical

`output: bool = False`

Stores the output log to a file in the current working directory. (Never overwrites an older file.)