Does Anyone Make a Photo De-Duplicator For Linux? Something That Reads EXIF?

Does Anyone Make a Photo De-Duplicator For Linux? Something That Reads EXIF? 243

Posted by timothy on Thursday January 23, 2014 @06:32PM from the which-ones-are-not-like-the-others? dept.

postbigbang writes "Imagine having thousands of images on disparate machines. many are dupes, even among the disparate machines. It's impossible to delete all the dupes manually and create a singular, accurate photo image base? Is there an app out there that can scan a file system, perhaps a target sub-folder system, and suck in the images-- WITHOUT creating duplicates? Perhaps by reading EXIF info or hashes? I have eleven file systems saved, and the task of eliminating dupes seems impossible."

Does Anyone Make a Photo De-Duplicator For Linux? Something That Reads EXIF?

This discussion has been archived. No new comments can be posted.

Search 243 Comments Log In/Create an Account

Comments Filter:

findimagedupes in Debian (Score:5, Interesting)

by nemesisrocks ( 1464705 ) writes: on Thursday January 23, 2014 @06:47PM (#46051473) Homepage

whatever you decide on, it could probably be done in a hundred lines of perl
Funny you mention perl.
There's a tool written in perl called "findimagedupes" in Debian [debian.org]. Pretty awesome tool for large image collections, because it could identify duplicates even if they had been resized, or messed with a little (e.g. adding logos, etc). Point it at a directory, and it'll find all the dupes for you.

Re:Seriously? (Score:4, Interesting)

by postbigbang ( 761081 ) writes: on Thursday January 23, 2014 @06:56PM (#46051557)

Yeah. Thanks. It's a simple question. So far, I've seen scripting suggestions, which might be useful. I'm a nerd, but not wanting to do much code because I'm really rusty at it. Instead, I'm amazed that no one runs into this problem and has built an app that does this. That's all I'm looking for: consolidation.

Quick shell script using exiftool (Score:5, Interesting)

by Khopesh ( 112447 ) writes: on Thursday January 23, 2014 @07:07PM (#46051689) Homepage Journal

This will help find exact matches by exif data. It will not find near-matches unless they have the same exif data. If you want that, good luck. Geeqie [sourceforge.net] has a find-similar command, but it's only so good (image search is hard!). Apparently there's also a findimagedupes tool available, see comments above (I wrote this before seeing that and had assumed apt-cache search had already been exhausted).
I would write a script that runs exiftool on each file you want to test. Remove the items that refer to timestamp, file name, path, etc. make a md5.
Something like this exif_hash.sh (sorry, slashdot eats whitespace so this is not indented):
#!/bin/sh for image in "$@"; do echo "`exiftool |grep -ve 20..:..: -e 19..:..: -e File -e Directory |md5sum` $image" done
And then run:
find [list of paths] -typef -print0 |xargs -0 exif_hash.sh |sort > output
If you have a really large list of images, do not run this through sort. Just pipe it into your output file and sort it later. It's possible that the sort utility can't deal with the size of the list (you can work around this by using grep '^[0-8]' output |sort >output-1 and grep -v '^[0-8]' output |sort >output-2, then cat output-1 output-2 > output.sorted or thereabouts; you may need more than two passes).
There are other things you can do to display these, e.g. awk '{print $1}' output |uniq -c |sort -n to rank them by hash.
On Debian, exiftool is part of the libimage-exiftool-perl package. If you know perl, you can write this with far more precision (I figured this would be an easier explanation for non-coders).

Re:findimagedupes in Debian (Score:4, Interesting)

by msobkow ( 48369 ) writes: on Thursday January 23, 2014 @07:12PM (#46051739) Homepage Journal

Why do I have this sneaking suspicion it runs in exponential time, varying as the size of the data set...
From what this user is talking about (multiple drives full of images), they may well have reached the point where it is impossible to sort out the dupes without one hell of a heavy hitting cluster to do the comparisons and sorting.

Re:write it yourself (Score:4, Interesting)

by niftymitch ( 1625721 ) writes: on Thursday January 23, 2014 @11:31PM (#46053421)

ExifTool is probably your best start:
http://www.sno.phy.queensu.ca/~phil/exiftool/
find . -print0 | xargs -0 md5sum | sort -flags | uniq -flags
There are flags in uniq to let you see pairs of identical md5sums as a pair.
Multiple machines drag the full file to the next machine and concat the
local files....
Yes exif helps. but some editors attach exif data from the original...
The serious might cmp files as well before deleting.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Does Anyone Make a Photo De-Duplicator For Linux? Something That Reads EXIF? 243

Does Anyone Make a Photo De-Duplicator For Linux? Something That Reads EXIF? More Login

Does Anyone Make a Photo De-Duplicator For Linux? Something That Reads EXIF?

findimagedupes in Debian (Score:5, Interesting)

Re:Seriously? (Score:4, Interesting)

Quick shell script using exiftool (Score:5, Interesting)

Re:findimagedupes in Debian (Score:4, Interesting)

Re:write it yourself (Score:4, Interesting)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot