Let’s say you’re having problems parsing a csv file, represented as an InMemoryUploadedFile, that you’ve just uploaded through a Django form. There are a bunch of answers on stackoverflow! They all totally work with Python 2! …and lead to hours of frustration if, say, hypothetically, like me, you’re using Python 3.
If you are getting errors like _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
— and then getting different errors about DictReader not getting an expected iterator after you use .decode('utf-8')
to coerce your file to str
— this is the post for you.
It turns out all you need to do (e.g. in your form_valid
) is:
csv_file.seek(0)
csv.DictReader(io.StringIO(csv_file.read().decode('utf-8')))
What’s going on here?
The seek
statement ensures the pointer is at the beginning of the file. This may or may not be required in your case. In my case, I’d already read the file in my forms.py
in order to validate it, so my file pointer was at the end. You’ll be able to tell that you need to seek()
if your csv.DictReader()
doesn’t throw any errors, but when you try to loop over the lines of the file you don’t even enter the for loop (e.g. print()
statements you put in it never print) — there’s nothing left to loop over if you’re at the end of the file.
read()
gives you the file contents as a bytes object, on which you can call decode()
.
decode('utf-8')
turns your bytes into a string, with known encoding. (Make sure that you know how your CSV is encoded to start with, though! That’s why I was doing validation on it myself. Unicode, Dammit is going to be my friend here. Even if I didn’t want an excuse to use it because of its title alone. Which I do.)
io.StringIO()
gives you the iterator that DictReader needs, while ensuring that your content remains stringy.
tl;dr I wrote two lines of code (but eight lines of comments) for a problem that took me hours to solve. Hopefully now you can copy these lines, and spend only a few minutes solving this problem!
Thank you so much for the write up, this was my issue exactly!
LikeLike
after almost half a day of search. i found it. Thanks a lot man
LikeLike
This is the perfect solution. Thank you so much!
LikeLike
Thanks, this is perfect for me too. I’m using chardet to do the encoding detection: (https://chardet.readthedocs.io/en/latest/usage.html#basic-usage)
LikeLike
Thanks a lot
LikeLike
Thankssss!!!
LikeLike
Thanks so much for this! I’m working on the Py3 conversion of 80K lines of Python 2 code, and this was one of my last stumpers — 18 unit tests were throwing that `_csv.Error`.
LikeLike
80K lines! That is a massive undertaking — congrats on being almost done!
LikeLike
Just following up to say thanks again; our conversion was a great success, I ended up giving a talk at the Django Boston meetup to share how we did it and what we learned – https://www.meetup.com/djangoboston/events/262740418/
LikeLike
Thank you so much for the solution
LikeLike
Thank you so much for the solution.
LikeLike
Thank you; this works with FileFields too !
Just a question, though: are we loading the whole file in memory before starting to iterate over data rows ?
LikeLike
Thank you. This works with FileFields too !
Just a question, though: does this mean we’re loading the whole file in memory before starting iterating over data rows ?
LikeLike
I’d just like the future to know that I dropped the beginning of this into https://talktotransformer.com/, to see how GPT2 would complete my blog post, and the result started with “Let’s say you’re having problems parsing a csv file, represented as ಠ_ಠ.”
LikeLike
Its really useful. thanks a lot
LikeLike
Thanks for this – I had the same problem and the presence of mind to determine the type of the uploaded file object. Your article is the first article when googling “csv.dictreader InMemoryUploadedFile” :).
LikeLike
It’s great that people are still getting mileage out of this! Glad it solved your problem.
LikeLike