When last we met I was turning a perfectly innocent neural net into a terribly ineffective one, in an attempt to get it to be better at face recognition in archival photos. I was also (what cultural heritage technology experience would be complete without this?) being foiled by metadata.
So, uh, I stopped using metadata. 🤦♀️ With twinges of guilt. And full knowledge that I was tossing out a practically difficult but conceptually straightforward supervised learning problem for…what?
Well. I realized that the work that initially inspired me to try my hand at face recognition in archival photos was not, in fact, a recognition problem but a similarity problem: could the Charles Teenie Harris collection find multiple instances of the same person? This doesn’t require me to identify people, per se; it just requires me to know if they are the same or different.
And you know what? I can do a pretty good job of getting different people by randomly selecting two photos from my data set — they’re not guaranteed to be different, but I’ll settle for pretty good. And I can do an actually awesome job of guaranteeing that I have two photos of the same person with the ✨magic✨ of data augmentation.
Keras (which, by the way, is about a trillionty times better than hand-coding stuff in Octave, for all I appreciate that Coursera made me understand the fundamentals by doing that) — Keras has an ImageDataGenerator class which makes it straightforward to alter images in a variety of ways, like horizontal flips, rotations, or brightness changes — all of which are completely plausible ways that archival photos of the same person might differ inter alia! So I can get two photos of the same person by taking one photo, and messing with it.
And at this point I have a Siamese network with triplet loss, another concept that Coursera set me up with (via the deeplearning.ai sequence). And now we are getting somewhere!
Well. We’re getting somewhere once you realize that, when you make a Siamese network architecture, you no longer have layers with the names of your base network; you have one GIANT layer which is just named VGGFace
or whatever, instead of having all of its constituent layers, and so when you try to set layer.trainable = True
whenever the layer name is in a list of names of VGGFace layers…uh…well…it turns out you just never encounter any layers by that name and therefore don’t set layers to be trainable and it turns out if you train a neural net which doesn’t have any trainable parameters it doesn’t learn much, who knew. But. Anyway. Once you, after embarrassingly long, get past that, and set layers in the base network to be trainable before you build the Siamese network from it…
This turns out to work much better! I now have a network which does, in fact, have decreased loss and increased accuracy as it trains. I’m in a space where I can actually play with hyperparameters to figure out how to do this best. Yay!
…ok, so, does it get me anywhere in practice? Well, to test that I think I’m actually going to need a corpus of labeled photos so that I can tell if given, say, one of WEB Du Bois, it thinks the most similar photos in the collection are also those of WEB Du Bois, which is to say…
Alas, metadata.