Managing large amounts of data has become a common problem in today's digital age. One of the most common issues is the presence of duplicate files, which can consume valuable storage space and make it challenging to find the correct files. This article will explore how to find and remove duplicate files in Linux, including the commands and tools to help you accomplish this task quickly and efficiently.
How To Find and Remove Duplicate Files in Linux
While working with massive amounts of media and documents, it's normal to end up with many versions of the same file on your computer. Duplicate files will eventually result in a mass of data and a storage limitation that will force you to inspect your system for duplicate files.
Manual file duplication checks are not an option for obvious reasons. You can use various programs to find and remove duplicate files. This article will teach you how to handle your duplicate files with 2 methods: Fdupes and DupeGuru.
- A Linux-based system
- Terminal access
- A user account with sudo privileges
Method 1: Find and Remove Duplicate Files in Linux Using Fdupes
One of the most straightforward tools to locate and remove duplicate files in folders is fdupes. It is an open-source and free duplicate file finder and was published on GitHub under the MIT License.
This Linux duplicate file finder uses a md5sum signature and byte-by-byte comparison verification to identify duplicate files in a directory. You can also do recursive searches, exclude specific search results, and display a list of duplicate files found if necessary.
With fdupes, you may either remove the duplicate files or replace them with links to the actual files after you have found them in a directory.
Install Fdupes for Linux
- On Debian-based distros
sudo apt install fdupes
- On RHEL-based distros
sudo yum install fdupes
sudo dnf install fdupes
- On Arch Linux and Manjar
sudo pacman -S fdupes
Find Duplicate Files in Linux With Fdupes
After the installation, you can use fdupes to find duplicate files.
Run the following command with your directory path to find duplicate files. This command only looks for duplicate files in the current folder. It does not search through subfolders and the like.
fdupes <directory path>
Run the fdupes command with the -r option to find duplicates throughout the folder and subfolders. The output shows that the "-r" option performs a more thorough search for duplicates in the folders and subfolders.
fdupes -r <directory path>
You can also look for duplicate files that aren't empty. It will allow you to concentrate on the task and eliminate the need to deal with empty files. Use the following command to enable this option.
fdupes -n <directory path>
To get more information on the set of duplicate files, use the fdupes command with the -m option.
fdupes -m <directory path>
You can also enter this fdupes command with the -S option to get duplicate file size information.
fdupes -S <directory path>
To save the outputs of the fdupes command, execute the following command.
fdupes <directory path> > output.txt
Delete Duplicate Files in Linux With fdupes
Once the duplicates in the directory were narrowed down, use the fdupes command with the -d option to remove them.
fdupes -d <directory path>
You will be asked to save versions from the list of duplicate files. Enter the file number from the list to save the file.
Note: There are also advanced fdupes command options. Execute the fdupes commands with multiple options.
The following command will find all non-empty files in all folders and subfolders.
fdupes -n -r <directory path>
Or get an overview of all the duplicate files in the folders and subfolders by entering the following command.
fdupes -m -r <directory path>
Method 2: Find and Remove Duplicate Files in Linux Using dupeGuru
dupeGuru is a cross-platform program for locating and removing duplicate files from your computer. One of its most vital attributes is its power to tailor the matching engine to your preferences, increasing your chances of finding the correct type of duplicate files in a directory. And, like a few other duplicate finder programs, it includes a graphical user interface to make operations easier.
Regarding functionality, dupeGuru uses its fuzzy matching algorithm to scan either filenames or file contents and quickly and efficiently find duplicates. It's also adept at dealing with music and image-specific data, giving it an advantage over other Linux duplicate file finders. Furthermore, if necessary, you can modify its matching engine to find the type of duplicate files you want to remove.
You can also delete duplicate files with dupeGuru. It also has a reference directory system to prevent accidentally deleting the wrong files. Aside from deletion, you can also move or copy them elsewhere.
Install DupeGuru for Linux
- On Debian-based Distros
sudo add-apt-repository ppa:dupeguru/ppa
sudo apt-get update
sudo apt-get install dupeguru
- On Arch Linux
sudo pacman -S dupeguru
Find and Remove Duplicate Files in Linux With dupeGuru
dupeGuru is a quick and secure Linux duplicate finder program. So it will not go berserk on your system. It has a meager chance of deleting things you didn't intend to delete. However, since we are still discussing file deletion, it is always a good idea to be extra cautious: a safe backup is required.
After you've taken your precautions, run dupeGuru with the command:
Step 1. The folder selection screen should appear, where you can add folders to scan for deduplication.
Step 2. dupeGuru will display its results by grouping duplicate files together in a list after you have selected your directories and launched the scan.
Step 3. By default, dupeGuru matches files based on their content rather than their name. The match column displays the matching algorithm's accuracy to ensure you do not accidentally delete anything important. Select the duplicate files you want to act on and click the Actions button to see the available actions.
Step 4. There are various actions available. You can delete duplicates, relocate, ignore, open, rename, or even run a custom command on them. If you decide to delete a duplicate, select the deletion options available.
You can not only send duplicate files to the trash or permanently delete them, but you can also leave a link to the original file (either using hard link or a symlink). The duplicate files will be deleted, and a link to the original will be left, saving a significant amount of disk space. It is beneficial if you import those files into your workspace or have dependencies on them.
Another option is to export the results as an HTML or CSV file. It could be useful if you prefer to keep track of duplicates rather than use any of dupeGuru's actions on them.
Step 5. Finally, the preferences menu will delete all of your duplicate files.
You can choose whether to scan for content or by name and set a duplicate threshold to limit the number of results. You can also define a custom command that can be selected from the actions menu. Among the numerous other options, it is worth noting that dupeGuru ignores files smaller than 10KB by default.
How To Recover Deleted Files in Linux
Accidental deletion of files and folders or directories in Linux is inevitable, especially if you need to be tech-savvy to perform the required actions. But if it does happen to you, don't freak out! You can still recover them. Numerous procedures and software options are available to restore a deleted file in Linux. But you ought to select the greatest of all!
For this reason, Wondershare Recoverit Linux File Recovery is one of the best recovery tools we recommend to all users. Linux users can recover deleted files, folders, and partitions with Wondershare Recoverit.
Here are some of the features of Wondershare Recoverit Linux Recovery:
- Recoverit can help you recover from over 500 data loss scenarios, including accidental deletion, disk formatting, operating system crashes, power outages, virus attacks, lost partitions, and many others.
- It works with all major Linux distributions like Ubuntu, Linux Mint, Debian, Fedora, Solus, Opensuse, Manjaro, etc.
- Recoverit can effectively, safely, and completely recover various files, such as documents, photos, videos, music, emails, and more than 1000 other file types.
- With its simple and intuitive interface, you can quickly recover data from Linux hard drives in just a few clicks.
Just install Recoverit on your PC and follow the video tutorial to learn how to recover mistakenly removed files from Linux.
Apart from Wondershare Recoverit, there are seven more solutions to recover deleted files in Linux.
Finding and removing duplicate files that might be eating up space on your Linux computer is easy with the fdupes and dupeGuru commands. It can help you save disk space, avoid confusion, and streamline your workflow. To avoid accidentally deleting essential data, proceed with extreme caution. However, if you delete files and want to retrieve them, you can still recover them using Wondershare Recoverit Linux Recovery.