adventures with parsing Django uploaded csv files in python3

Let’s say you’re having problems parsing a csv file, represented as an InMemoryUploadedFile, that you’ve just uploaded through a Django form. There are a bunch of answers on stackoverflow! They all totally work with Python 2! …and lead to hours of frustration if, say, hypothetically, like me, you’re using Python 3.

If you are getting errors like _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) — and then getting different errors about DictReader not getting an expected iterator after you use .decode('utf-8') to coerce your file to str — this is the post for you.

It turns out all you need to do (e.g. in your form_valid) is:


csv_file.seek(0)
csv.DictReader(io.StringIO(csv_file.read().decode('utf-8')))

What’s going on here?

The seek statement ensures the pointer is at the beginning of the file. This may or may not be required in your case. In my case, I’d already read the file in my forms.py in order to validate it, so my file pointer was at the end. You’ll be able to tell that you need to seek() if your csv.DictReader() doesn’t throw any errors, but when you try to loop over the lines of the file you don’t even enter the for loop (e.g. print() statements you put in it never print) — there’s nothing left to loop over if you’re at the end of the file.

read() gives you the file contents as a bytes object, on which you can call decode().

decode('utf-8') turns your bytes into a string, with known encoding. (Make sure that you know how your CSV is encoded to start with, though! That’s why I was doing validation on it myself. Unicode, Dammit is going to be my friend here. Even if I didn’t want an excuse to use it because of its title alone. Which I do.)

io.StringIO() gives you the iterator that DictReader needs, while ensuring that your content remains stringy.

tl;dr I wrote two lines of code (but eight lines of comments) for a problem that took me hours to solve. Hopefully now you can copy these lines, and spend only a few minutes solving this problem!

Advertisement

17 thoughts on “adventures with parsing Django uploaded csv files in python3

  1. Thank you; this works with FileFields too !
    Just a question, though: are we loading the whole file in memory before starting to iterate over data rows ?

    Like

  2. Thank you. This works with FileFields too !
    Just a question, though: does this mean we’re loading the whole file in memory before starting iterating over data rows ?

    Like

  3. I’d just like the future to know that I dropped the beginning of this into https://talktotransformer.com/, to see how GPT2 would complete my blog post, and the result started with “Let’s say you’re having problems parsing a csv file, represented as ಠ_ಠ.”

    Like

  4. Thanks for this – I had the same problem and the presence of mind to determine the type of the uploaded file object. Your article is the first article when googling “csv.dictreader InMemoryUploadedFile” :).

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s