adventures with parsing Django uploaded csv files in python3

Let’s say you’re having problems parsing a csv file, represented as an InMemoryUploadedFile, that you’ve just uploaded through a Django form. There are a bunch of answers on stackoverflow! They all totally work with Python 2! …and lead to hours of frustration if, say, hypothetically, like me, you’re using Python 3.

If you are getting errors like _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) — and then getting different errors about DictReader not getting an expected iterator after you use .decode('utf-8') to coerce your file to str — this is the post for you.

It turns out all you need to do (e.g. in your form_valid) is:


csv_file.seek(0)
csv.DictReader(io.StringIO(csv_file.read().decode('utf-8')))

What’s going on here?

The seek statement ensures the pointer is at the beginning of the file. This may or may not be required in your case. In my case, I’d already read the file in my forms.py in order to validate it, so my file pointer was at the end. You’ll be able to tell that you need to seek() if your csv.DictReader() doesn’t throw any errors, but when you try to loop over the lines of the file you don’t even enter the for loop (e.g. print() statements you put in it never print) — there’s nothing left to loop over if you’re at the end of the file.

read() gives you the file contents as a bytes object, on which you can call decode().

decode('utf-8') turns your bytes into a string, with known encoding. (Make sure that you know how your CSV is encoded to start with, though! That’s why I was doing validation on it myself. Unicode, Dammit is going to be my friend here. Even if I didn’t want an excuse to use it because of its title alone. Which I do.)

io.StringIO() gives you the iterator that DictReader needs, while ensuring that your content remains stringy.

tl;dr I wrote two lines of code (but eight lines of comments) for a problem that took me hours to solve. Hopefully now you can copy these lines, and spend only a few minutes solving this problem!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s