alias checkDuplicateSizes="java -cp ~/scripts DuplicateSizeChecker"
The path names of the possibly duplicated files are surrounded by single quotes (') and separated by a space. This makes them ideal for using with cmp. You can either copy and paste file pairs of interest by hand, or set up a script to cmp all of the files the DuplicateSizeChecker finds. You can easily grep out the "non-interesting" lines (which make the output more human readable when not used in a script) using something like the following:
checkDuplicateSizes /tmp/a /tmp/b | grep "'/"
Finding identical files is then pretty easy with a little script (for which I also have an alias called checkDuplicates):
#!/bin/bash
java -cp ~/scripts DuplicateSizeChecker "$1" "$2" | grep "'/" |
while read filePair; do
eval cmp -s $filePair
same=`echo $?`
#echo $filePair
#echo $same
if [ $same == "0" ]; then
echo
echo The following files are identical
echo $filePair | sed "s/\\' \\'/\\'\\`echo -e '\n\r'`\\'/g"
fi
done
That's it. This has helped me solve my problems (for now) and I hope it helps you too. I haven't included any options like suppressing certain output, having more verbose output or different formatting, recursing through subdirectories, etc because I wanted to get this done quickly and because I am trying to be a little more YAGNI. Feel free to take, use, adapt or do whatever you like with this code (be respectful and reasonable, and leave a comment if it helped you out somehow - especially if you modify the code to do something smarter).
Here's some sample output:
$ checkDuplicateSizes /tmp/one /tmp/two
The following files have the same length (0 B)
'/tmp/one/one empty file' '/tmp/two/twoemptyfile'
The following files have the same length (22 B)
'/tmp/one/onesame' '/tmp/two/twosame'
The following files have the same length (15 B)
'/tmp/one/onesamesize' '/tmp/two/twosamesize'
$ checkDuplicates /tmp/one /tmp/two
The following files are identical
'/tmp/one/one empty file'
'/tmp/two/twoemptyfile'
The following files are identical
'/tmp/one/onesame'
'/tmp/two/twosame'
Here's the meat:
import java.io.File;
public class DuplicateSizeChecker {
public static void main(String[] args){
if(args.length < 2){
System.err.println("Please specify two different directories as the first two arguments");
return;
}
File folder1 = new File(args[0]);
File folder2 = new File(args[1]);
if(!folder1.isDirectory() || !folder2.isDirectory() || folder1.equals(folder2)){
System.err.println("Please specify two different directories as the first two arguments");
}
else{
if(args.length > 2){
System.out.println("More than two arguments supplied; only the first two are necessary; subsequent ones will be ignored"$
}
int size1 = folder1.list().length;
int size2 = folder2.list().length;
if(size1 > size2) {
for(File f1 : folder1.listFiles()){
for(File f2 : folder2.listFiles()){
if(f1.isFile() && f2.isFile() && f1.length() == f2.length()){
printFileInfo(f1, f2);
}
}
}
}
else{
for(File f2 : folder2.listFiles()){
for(File f1 : folder1.listFiles()){
if(f1.isFile() && f2.isFile() && f1.length() == f2.length()){
printFileInfo(f1, f2);
}
}
}
}
}
}
private static void printFileInfo(File f1, File f2){
System.out.println();
System.out.println("The following files have the same length (" + f1.length() + " B)");
System.out.println("'" + f1.getAbsolutePath() + "' '" + f2.getAbsolutePath() + "'");
//System.out.println("\"" + f1.getAbsolutePath() + "\" \"" + f2.getAbsolutePath() + "\"");
}
}