Thursday, December 15, 2011

Index webpage

I like to be able to point my browser at my media collection, so I expose the content using a pair of scripts. The first opens the index and lists the contents as a set of anchors. The second provides for the download of the file and setting the name.

Also, I create a folder accessible to my web server called 'hashes' in that folder I create symbolic links to the hash folders that were used in the index setup.

The first script:

#!/usr/bin/python
import pickle 
 
index = pickle.load(open("/var/media/index.pickle")) 
 
print "Content-Type: text/html\n"

index = list(set([(a.split('/')[-1], b, c) for a,b,c in index]))
index = sorted(index, key=lambda x: x[0].lower())

for name, size, check in index:
    print '<a href="fetch.py?%s%s">%s (%s bytes)</a><br/>' % (check, name, name, size)

This script is creates the links to the second script. Open the index file and un-pickle the content. Print a header for the browser, (the trailing new line is important for HTTP.)

Next we do some transforms on the index. First we discard the path portion of the filename. Then we do a case insensitive sort by filename.

The for loop then iterates the index and prints each tuple as an anchor, referring to fetch.py, our second script.

The second script:

#!/usr/bin/python
import sys
import os
import urllib

query = os.environ['QUERY_STRING']
fileid = ''
filename = 'noname'

try:
    fileid = query[:40] # Get the checksum
    fileid[39] # Check the checksum length
    int(fileid, 16) # Check the checksum could be hexadecimal
except:
    print "Content-Type: text/plain\n\nBad ID"
    exit(0)

try:
    filename = query[40:]
    filename = urllib.unquote(filename)
    # Replace double quotes with singles for the filename="" below
    filename.replace('"', "'")
except:
    pass

print "Content-Type: application/octet-stream"
print 'Content-disposition: attachment; filename="%s"\n' % filename

inobject = open("/var/www/hashes/%s/%s" % (fileid[0], fileid))

filebuffer = inobject.read(4096)

while len(filebuffer) != 0:
    sys.stdout.write(filebuffer)
    filebuffer = inobject.read(4096)

Again nothing too exciting here, the main reason for this script is to provide a filename to the user agent and also to force a download instead of displaying in the browser.

As I invoke this script by CGI I grab the query string from the environment. Now I do some checking on the checksum provided. Since it comes from the user agent we want to make sure that it isn't too exciting, (like having a '/' or '..' in the filename.)

Our hashes are 40 hexadecimal characters and so we chop the string at 40 characters, check there is a character in the 40th position and check if it could be a hexadecimal integer. If not we show a dull error page.

Next try clock is getting a filename from the user agent to give to the user agent. We grab any characters after the checksum, unquote the characters from HTTP and then we replace any double quotes with singles. The quote replacement helps stop the filename argument to the content-disposition header from exploding.

We then open the file and pipe it to the user agent 4k at a time.

The End :D