Software Linux

Does Anyone Make a Photo De-Duplicator For Linux? Something That Reads EXIF?

Posted by timothy
from the which-ones-are-not-like-the-others? dept.
postbigbang writes "Imagine having thousands of images on disparate machines. many are dupes, even among the disparate machines. It's impossible to delete all the dupes manually and create a singular, accurate photo image base? Is there an app out there that can scan a file system, perhaps a target sub-folder system, and suck in the images-- WITHOUT creating duplicates? Perhaps by reading EXIF info or hashes? I have eleven file systems saved, and the task of eliminating dupes seems impossible."
Does Anyone Make a Photo De-Duplicator For Linux? Something That Reads EXIF?

  • by retchdog (1319261) on Thursday January 23, 2014 @06:35PM (#46051315) Journal

    exactly what you mean by deduplication is kind of vague, but whatever you decide on, it could probably be done in a hundred lines of perl (using CPAN libraries of course).

  • by Cummy (2900029) on Thursday January 23, 2014 @07:15PM (#46051765)

    Why do people on this site believe that everyone who is interested in tech is a programmer? This"just write it" is foolishness of the highest order. For many of us non-programers "just write it" is like telling some one living in Florida to "just build a plane and fly to that concert in Vienna after work tomorrow". If that seems like a ridiculous ask, then so is asking a person without the skill to write a script for that. So it can be done in 20 minutes, use that 20 minutes to help someone by writing the program and loading it to a repo. All the 20second tutorials in the world will not get someone to write a program if they just don;t have the skill set.
    This is part of the reason Windows is successful: think of a problem, there is likely program out there that solves it already, and if there isn't one someone will soon write one (Apple users just go and buy one). Linux will not get out of single digit adoption until people with the skills write and edit programs for the non-programers like myself because when stuff needs to get done fast Windows will have the program (and yes it is easier to clean out the malware and fight the popups than it is to write the program).

  • I wrote one myself (Score:4, Insightful)

    by tepples (727027) <tepples AT gmail DOT com> on Thursday January 23, 2014 @10:14PM (#46053045) Homepage Journal
    What I did in my deduplicator written in Python [] was group the files by their and reject any file with a unique size. Then I'd hash the first few kilobytes of each file with MD5 (it's just a spot check so speed is more valuable than security against intentional collisions) and reject any file with a unique first few kilobytes. Finally I'd hash the whole file with a more secure hash.

