the f*ck rants about stuff

Automating the extraction of duplicated zip files

Its not that well-known that a zip file does not save a directory inside. It saves a secuence of files, and nothing prevents those files names to be duplicated inside a file

All the tools Ive checked out overwrite silently the duplicates or allow you to manually rename them. Which is very tedious as soon as you have to do this a few times with lots of duplicated files

I had to bake my own solution using python. If you know about a tool that does this, please let me know. I love to deprecate my own solutions :)

unzip_rename_dups.py

#!/usr/bin/env python3
import pdb
import sys
import zipfile
from os.path import splitext, dirname, abspath, join
from os import rename


ZIP = sys.argv[1]
DIR = dirname(abspath(ZIP))

filenames = {}
extracted = 0
dups = 0

with zipfile.ZipFile(ZIP) as z:

    for info in z.infolist():
        z.extract(info, DIR)
        extracted += 1

        fn = info.filename

        if fn not in filenames:
            filenames[fn] = 1
        else:
            filenames[fn] += 1
            dups += 1

        orig_path = join(DIR, fn)

        preext, postext = splitext(fn)
        final_fn = preext + str(filenames[fn]) + postext
        final_path = join(DIR, final_fn)

        rename(orig_path, final_path)

print("{} files extracted sucesfully. {} Duplicated files saved!".format(extracted, dups))
comments?

If you liked this, I think you might be interested in some of these related articles:

¡ En Español !