Adapting Coursera’s neural style transfer code to localhost

Last time, when making cats from the void, I promised that I’d discuss how I adapted the neural style transfer code from Coursera’s Convolutional Neural Networks course to run on localhost. Here you go!

Step 1: First, of course, download (as python) the script. You’ll also need the nst_utils.py file, which you can access via File > Open.

Step 2: While the Coursera file is in .py format, it’s iPython in its heart of hearts. So I opened a new file and started copying over the bits I actually needed, reading them as I went to be sure I understood how they all fit together. Along the way I also organized them into functions, to clarify where each responsibility happened and give it a name. The goal here was ultimately to get something I could run at the command line via python dpla_cats.py, so that I could find out where it blew up in step 3.

Step 3: Time to install dependencies. I promptly made a pipenv and, in running the code and finding what ImportErrors showed up, discovered what I needed to have installed: scipy, pillow, imageio, tensorflow. Whatever available versions of the former three worked, but for tensorflow I pinned to the version used in Coursera — 1.2.1 — because there are major breaking API changes with the current (2.x) versions.

This turned out to be a bummer, because tensorflow promptly threw warnings that it could be much faster on my system if I compiled it with various flags my computer supports. OK, so I looked up the docs for doing that, which said I needed bazel/bazelisk — but of course I needed a paleolithic version of that for tensorflow 1.2.1 compat, so it was irritating to install — and then running that failed because it needed a version of Java old enough that I didn’t have it, and at that point I gave up because I have better things to do than installing quasi-EOLed Java versions. Updating the code to be compatible with the latest tensorflow version and compiling an optimized version of that would clearly be the right answer, but also it would have been work and I wanted messed-up cat pictures now.

(As for the rest of my dependencies, I ended up with scipy==1.5.4, pillow==8.0.1, and imageio==2.9.0, and then whatever sub-dependencies pipenv installed. Just in case the latest versions don’t work by the time you read this. 🙂

At this point I had achieved goal 1, aka “getting anything to run at all”.

Step 4: I realized that, honestly, almost everything in nst_utils wanted to be an ImageUtility, which was initialized with metadata about the content and style files (height, width, channels, paths), and carried the globals (shudder) originally in nst_utils as class data. This meant that my new dpla_cats script only had to import ImageUtility rather than * (from X import * is, of course, deeply unnerving), and that utility could pingpong around knowing how to do the things it knew how to do, whenever I needed to interact with image-y functions (like creating a generated image or saving outputs) rather than neural-net-ish stuff. Everything in nst_utils that properly belonged in an ImageUtility got moved, step by step, into that class; I think one or two functions remained, and they got moved into the main script.

Step 5: Ughhh, scope. The notebook plays fast and loose with scope; the raw python script is, rightly, not so forgiving. But that meant I had to think about what got defined at what level, what got passed around in an argument, what order things happened in, et cetera. I’m not happy with the result — there’s a lot of stuff that will fail with minor edits — but it works. Scope errors will announce themselves pretty loudly with exceptions; it’s just nice to know you’re going to run into them.

Step 5a: You have to initialize the Adam optimizer before you run sess.run(tf.global_variables_initializer()). (Thanks, StackOverflow!) The error message if you don’t is maddeningly unhelpful. (FailedPreconditionError, I mean, what.)

Step 6: argparse! I spent some quality time reading this neural style implementation early on and thought, gosh, that’s argparse-heavy. Then I found myself wanting to kick off a whole bunch of different script runs to do their thing overnight investigating multiple hypotheses and discovered how very much I wanted there to be command-line arguments, so I could configure all the different things I wanted to try right there and leave it alone. Aw yeah. I’ve ended up with the following:

parser.add_argument('--content', required=True)
parser.add_argument('--style', required=True)
parser.add_argument('--iterations', default=400)        # was 200
parser.add_argument('--learning_rate', default=3.0)     # was 2.0
parser.add_argument('--layer_weights', nargs=5, default=[0.2,0.2,0.2,0.2,0.2])
parser.add_argument('--run_until_steady', default=False)
parser.add_argument('--noisy_start', default=True)

content is the path to the content image; style is the path to the style image; iterations and learning_rate are the usual; layer_weights is the value of STYLE_LAYERS in the original code, i.e. how much to weight each layer; run_until_steady is a bad API because it means to ignore the value of the iterations parameter and instead run until there is no longer significant change in cost; and noisy_start is whether to use the content image plus static as the first input or just the plain content image.

I can definitely see adding more command line flags if I were going to be spending a lot of time with this code. (For instance, a layer_names parameter that adjusted what STYLE_LAYERS considered could be fun! Or making “significant change in cost” be a user-supplied rather than hardcoded parameter!)

Step 6a: Correspondingly, I configured the output filenames to record some of the metadata used to create the image (content, style, layer_weights), to make it easier to keep track of which images came from which script runs.

Stuff I haven’t done but it might be great:

Updating tensorflow, per above, and recompiling it. The slowness is acceptable — I can run quite a few trials on my 2015 MacBook overnight — but it would get frustrating if I were doing a lot of this.

Supporting both num_iterations and run_until_steady means my iterator inside the model_nn function is kind of a mess right now. I think they’re itching to be two very thin subclasses of a superclass that knows all the things about neural net training, with the subclass just handling the iterator, but I didn’t spend a lot of time thinking about this.

Reshaping input files. Right now it needs both input files to be the same dimensions. Maybe it would be cool if it didn’t need that.

Trying different pretrained models! It would be easy to pass a different arg to load_vgg_model. It would subsequently be annoying to make sure that STYLE_LAYERS worked — the available layer names would be different, and load_vgg_model makes a lot of assumptions about how that model is shaped.

As your reward for reading this post, you get another cat image! A friend commented that a thing he dislikes about neural style transfer is that it’s allergic to whitespace; it wants to paint everything with a texture. This makes sense — it sees subtle variations within that whitespace and it tries to make them conform to patterns of variation it knows. This is why I ended up with the noisy_start flag; I wondered what would happen if I didn’t add the static to the initial image, so that the original negative space stayed more negative-spacey.

This, as you can probably tell, uses the Harlem renaissance style image.

It’s still allergic to negative space — even without the generated static there are variations in pixel color in the original — but they are much subtler, so instead of saying “maybe what I see is coiled hair?” it says “big open blue patches; we like those”. But the semantics of the original image are more in place — the kittens more kitteny, the card more readable — even though the whole image has been pushed more to colorblocks and bold lines.

I find I like the results better without the static — even though the cost function is larger, and thus in a sense the algorithm is less successful. Look, one more.

Superhero!

Dear Internet, merry Christmas; my robot made you cats from the void

Recently I learned how neural style transfer works. I wanted to be able to play with it more and gain some insights, so I adapted the Coursera notebook code to something that works on localhost (more on that in a later post), found myself a nice historical cat image via DPLA, and started mashing it up with all manner of images of varying styles culled from DPLA’s list of primary source sets. (It really helped me that these display images were already curated for looking cool, and cropped to uniform size!)

These sweet babies do not know what is about to happen to them.

Let’s get started, shall we?

Style image from the Fake News in the 1890s: Yellow Journalism primary source set.

I really love how this one turned out. It’s pulled the blue and yellow colors, and the concerned face of the lower kitten was a perfect match for the expression on the right-hand muckraker. The lines of the card have taken on the precise quality of those in the cartoon — strong outlines and textured interiors. “Merry Christmas” the bird waves, like an eager newsboy.

Style image from the Food and Social Justice exhibit.

This is one of the first ones I made, and I was delighted by how it learned the square-iness of its style image. Everything is more snapped to a grid. The colors are bolder, too, cueing off of that dominant yellow. The Christmas banner remains almost readable and somehow heraldic.

Style image from the Truth, Justice, and the American Way primary source set.

How about Christmas of Steel? These kittens have broadly retained their shape (perhaps as the figures in the comic book foreground have organic detail?), but the background holly is more polygon-esque. The colors have been nudged toward primary, and the static of the background has taken on a swirl of dynamic motion lines.

Style image from the Visual Art During the Harlem Renaissance primary source set.

How about starting with something boldly colored and almost abstract? Why look: the kittens have learned a world of black and white and blue, with the background transformed into that stippled texture it picked up from the hair. The holly has gone more colorblocky and the lines bolder.

Style image from the Treaty of Versailles and the End of World War I primary source set.

This one learned its style so aptly that I couldn’t actually tell where the boundary between the second and third images was when I was placing that equals sign. The soft pencil lines, the vertical textures of shadows and jail bars, the fact that all the colors in the world are black and white and orange (the latter mostly in the middle) — these kittens are positively melting before the force of Wilsonian propaganda. Imagine them in the Hall of Mirrors, drowning in gold and reflecting back at you dozens of times, for full nightmare effect.

Style image from the Victorian Era primary source set.

Shall we step back a few decades to something slightly more calming? These kittens have learned to take on soft lines and swathes of pale pink. The holly is perfectly happy to conform itself to the texture of these New England trees. The dark space behind the kittens wonders if, perhaps, it is meant to be lapels.

I totally can’t remember how I found this cropped version of US food propaganda.

And now for kittens from the void.

Brown, it has learned. The world is brown. The space behind the kittens is brown. Those dark stripes were helpfully already brown. The eyes were brown. Perhaps they can be the same brown, a hole dropped through kitten-space.

I thought this was honestly pretty creepy, and I wondered if rerunning the process with different layer weights might help. Each layer of the neural net notices different sorts of things about its image; it starts with simpler things (colors, straight lines), moves through compositions of those (textures, basic shapes), and builds its way up to entire features (faces). The style transfer algorithm looks at each of those layers and applies some of its knowledge to the generated image. So I thought, what if I change the weights? The initial algorithm weights each of five layers equally; I reran it weighted toward the middle layers and entirely ignoring the first layer, in hopes that it would learn a little less about gaping voids of brown.

Same thing, less void.

This worked! There’s still a lot of brown, but the kitten’s eye is at least separate from its facial markings. My daughter was also delighted by how both of these images want to be letters; there are lots of letter-ish shapes strewn throughout, particularly on the horizontal line that used to be the edge of a planter, between the lower cat and the demon holly.

So there you go, internet; some Christmas cards from the nightmare realm. May 2021 bring fewer nightmares to us all.