In 2019, Dominique Luster gave a super good Code4Lib talk about applying AI to metadata for the Charles “Teenie” Harris collection at the Carnegie Museum of Art — more than 70,000 photographs of Black life in Pittsburgh. They experimented with solutions to various metadata problems, but the one that’s stuck in my head since 2019 is the face recognition one. It sure would be cool if you could throw AI at your digitized archival photos to find all the instances of the same person, right? Or automatically label them, given that any of them are labeled correctly?
Sadly, because we cannot have nice things, the data sets used for pretrained face recognition embeddings are things like lots of modern photos of celebrities, a corpus which wildly underrepresents 1) archival photos and 2) Black people. So the results of the face recognition process are not all that great.
I have some extremely technical ideas for how to improve this — ideas which, weirdly, some computer science PhDs I’ve spoken with haven’t seen in the field. So I would like to experiment with them. But I must first
invent the universe set up a data processing pipeline.
Three steps here:
- Fetch archival photographs;
- Do face detection (draw bounding boxes around faces and crop them out for use in the next step);
- Do face recognition.
For step 1, I’m using DPLA, which has a super straightforward and well-documented API and an easy-to-use Python wrapper (which, despite not having been updated in a while, works just fine with Python 3.6, the latest version compatible with some of my dependencies).
For step 2, I’m using mtcnn, because I’ve been following this tutorial.
For step 3, face recognition, I’m using the steps in the same tutorial, but purely for proof-of-concept — the results are garbage because archival photos from mid-century don’t actually look anything like modern-day celebrities. (Neural net: “I have 6% confidence this is Stevie Wonder!” How nice for you.) Clearly I’m going to need to build my own corpus of people, which I have a plan for (i.e. I spent some quality time thinking about numpy) but haven’t yet implemented.
So far the gotchas have been:
Gotcha 1: If you fetch a page from the API and assume you can treat its contents as an image, you will be sad. You have to treat them as a raw data stream and interpret that as an image, thusly:
from PIL import Image import requests response = requests.get(url, stream=True) response.raw.decode_content = True data = requests.get(url).content Image.open(io.BytesIO(data))
This code is, of course, hilariously lacking in error handling, despite fetching content from a cesspool of untrustworthiness, aka the internet. It’s a first draft.
Gotcha 2: You see code snippets to convert images to pixel arrays (suitable for AI ingestion) that look kinda like this:
np.array(image).astype('uint8'). Except they say
astype('float32') instead of
astype('uint32'). I got a creepy photonegative effect when I used floats.
Gotcha 3: Although PIL was happy to manipulate the .pngs fetched from the API, it was not happy to write them to disk; I needed to convert formats first (
Gotcha 4: The suggested keras_vggface library doesn’t have a Pipfile or requirements.txt, so I had to manually install keras and tensorflow. Luckily the setup.py documented the correct versions. Sadly the tensorflow version is only compatible with python up to 3.6 (hence the comment about DPyLA compatibility above). I don’t love this, but it got me up and running, and it seems like an easy enough part of the pipeline to rip out and replace if it’s bugging me too much.
The plan from here, not entirely in order, subject to change as I don’t entirely know what I’m doing until after I’ve done it:
- Build my own corpus of identified people
- This means the numpy thoughts, above
- It also means spending more quality time with the API to see if I can automatically apply names from photo metadata rather than having to spend too much of my own time manually labeling the corpus
- Decide how much metadata I need to pull down in my data pipeline and how to store it
- Figure out some kind of benchmark and measure it
- Try out my idea for improving recognition accuracy
- Benchmark again
- Hopefully celebrate awesomeness