Let’s say you’re having problems parsing a csv file, represented as an InMemoryUploadedFile, that you’ve just uploaded through a Django form. There are a bunch of answers on stackoverflow! They all totally work with Python 2! …and lead to hours of frustration if, say, hypothetically, like me, you’re using Python 3.
If you are getting errors like
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) — and then getting different errors about DictReader not getting an expected iterator after you use
.decode('utf-8') to coerce your file to
str — this is the post for you.
It turns out all you need to do (e.g. in your
What’s going on here?
seek statement ensures the pointer is at the beginning of the file. This may or may not be required in your case. In my case, I’d already read the file in my
forms.py in order to validate it, so my file pointer was at the end. You’ll be able to tell that you need to
seek() if your
csv.DictReader() doesn’t throw any errors, but when you try to loop over the lines of the file you don’t even enter the for loop (e.g.
print() statements you put in it never print) — there’s nothing left to loop over if you’re at the end of the file.
read() gives you the file contents as a bytes object, on which you can call
decode('utf-8') turns your bytes into a string, with known encoding. (Make sure that you know how your CSV is encoded to start with, though! That’s why I was doing validation on it myself. Unicode, Dammit is going to be my friend here. Even if I didn’t want an excuse to use it because of its title alone. Which I do.)
io.StringIO() gives you the iterator that DictReader needs, while ensuring that your content remains stringy.
tl;dr I wrote two lines of code (but eight lines of comments) for a problem that took me hours to solve. Hopefully now you can copy these lines, and spend only a few minutes solving this problem!
17 thoughts on “adventures with parsing Django uploaded csv files in python3”
Thank you so much for the write up, this was my issue exactly!
after almost half a day of search. i found it. Thanks a lot man
This is the perfect solution. Thank you so much!
Thanks, this is perfect for me too. I’m using chardet to do the encoding detection: (https://chardet.readthedocs.io/en/latest/usage.html#basic-usage)
Thanks a lot
Thanks so much for this! I’m working on the Py3 conversion of 80K lines of Python 2 code, and this was one of my last stumpers — 18 unit tests were throwing that `_csv.Error`.
80K lines! That is a massive undertaking — congrats on being almost done!
Just following up to say thanks again; our conversion was a great success, I ended up giving a talk at the Django Boston meetup to share how we did it and what we learned – https://www.meetup.com/djangoboston/events/262740418/
Thank you so much for the solution
Thank you so much for the solution.
Thank you; this works with FileFields too !
Just a question, though: are we loading the whole file in memory before starting to iterate over data rows ?
Thank you. This works with FileFields too !
Just a question, though: does this mean we’re loading the whole file in memory before starting iterating over data rows ?
I’d just like the future to know that I dropped the beginning of this into https://talktotransformer.com/, to see how GPT2 would complete my blog post, and the result started with “Let’s say you’re having problems parsing a csv file, represented as ಠ_ಠ.”
Its really useful. thanks a lot
Thanks for this – I had the same problem and the presence of mind to determine the type of the uploaded file object. Your article is the first article when googling “csv.dictreader InMemoryUploadedFile” :).
It’s great that people are still getting mileage out of this! Glad it solved your problem.