Also, I create a folder accessible to my web server called 'hashes' in that folder I create symbolic links to the hash folders that were used in the index setup.
The first script:
#!/usr/bin/python import pickle
index = pickle.load(open("/var/media/index.pickle"))
print "Content-Type: text/html\n" index = list(set([(a.split('/')[-1], b, c) for a,b,c in index])) index = sorted(index, key=lambda x: x.lower()) for name, size, check in index: print '<a href="fetch.py?%s%s">%s (%s bytes)</a><br/>' % (check, name, name, size)
This script is creates the links to the second script. Open the index file and un-pickle the content. Print a header for the browser, (the trailing new line is important for HTTP.)
Next we do some transforms on the index. First we discard the path portion of the filename. Then we do a case insensitive sort by filename.
The for loop then iterates the index and prints each tuple as an anchor, referring to fetch.py, our second script.
The second script:
#!/usr/bin/python import sys import os import urllib query = os.environ['QUERY_STRING'] fileid = '' filename = 'noname' try: fileid = query[:40] # Get the checksum fileid # Check the checksum length int(fileid, 16) # Check the checksum could be hexadecimal except: print "Content-Type: text/plain\n\nBad ID" exit(0) try: filename = query[40:] filename = urllib.unquote(filename) # Replace double quotes with singles for the filename="" below filename.replace('"', "'") except: pass print "Content-Type: application/octet-stream" print 'Content-disposition: attachment; filename="%s"\n' % filename inobject = open("/var/www/hashes/%s/%s" % (fileid, fileid)) filebuffer = inobject.read(4096) while len(filebuffer) != 0: sys.stdout.write(filebuffer) filebuffer = inobject.read(4096)
Again nothing too exciting here, the main reason for this script is to provide a filename to the user agent and also to force a download instead of displaying in the browser.
As I invoke this script by CGI I grab the query string from the environment. Now I do some checking on the checksum provided. Since it comes from the user agent we want to make sure that it isn't too exciting, (like having a '/' or '..' in the filename.)
Our hashes are 40 hexadecimal characters and so we chop the string at 40 characters, check there is a character in the 40th position and check if it could be a hexadecimal integer. If not we show a dull error page.
Next try clock is getting a filename from the user agent to give to the user agent. We grab any characters after the checksum, unquote the characters from HTTP and then we replace any double quotes with singles. The quote replacement helps stop the filename argument to the content-disposition header from exploding.
We then open the file and pipe it to the user agent 4k at a time.
The End :D