Last time, when making cats from the void, I promised that I’d discuss how I adapted the neural style transfer code from Coursera’s Convolutional Neural Networks course to run on localhost. Here you go!
Step 1: First, of course, download (as python) the script. You’ll also need the
nst_utils.py file, which you can access via File > Open.
Step 2: While the Coursera file is in .py format, it’s iPython in its heart of hearts. So I opened a new file and started copying over the bits I actually needed, reading them as I went to be sure I understood how they all fit together. Along the way I also organized them into functions, to clarify where each responsibility happened and give it a name. The goal here was ultimately to get something I could run at the command line via
python dpla_cats.py, so that I could find out where it blew up in step 3.
Step 3: Time to install dependencies. I promptly made a pipenv and, in running the code and finding what
ImportErrors showed up, discovered what I needed to have installed:
scipy, pillow, imageio, tensorflow. Whatever available versions of the former three worked, but for tensorflow I pinned to the version used in Coursera — 1.2.1 — because there are major breaking API changes with the current (2.x) versions.
This turned out to be a bummer, because tensorflow promptly threw warnings that it could be much faster on my system if I compiled it with various flags my computer supports. OK, so I looked up the docs for doing that, which said I needed bazel/bazelisk — but of course I needed a paleolithic version of that for tensorflow 1.2.1 compat, so it was irritating to install — and then running that failed because it needed a version of Java old enough that I didn’t have it, and at that point I gave up because I have better things to do than installing quasi-EOLed Java versions. Updating the code to be compatible with the latest tensorflow version and compiling an optimized version of that would clearly be the right answer, but also it would have been work and I wanted messed-up cat pictures now.
(As for the rest of my dependencies, I ended up with
imageio==2.9.0, and then whatever sub-dependencies pipenv installed. Just in case the latest versions don’t work by the time you read this. 🙂
At this point I had achieved goal 1, aka “getting anything to run at all”.
Step 4: I realized that, honestly, almost everything in nst_utils wanted to be an
ImageUtility, which was initialized with metadata about the content and style files (height, width, channels, paths), and carried the globals (shudder) originally in nst_utils as class data. This meant that my new dpla_cats script only had to import ImageUtility rather than * (
from X import * is, of course, deeply unnerving), and that utility could pingpong around knowing how to do the things it knew how to do, whenever I needed to interact with image-y functions (like creating a generated image or saving outputs) rather than neural-net-ish stuff. Everything in nst_utils that properly belonged in an ImageUtility got moved, step by step, into that class; I think one or two functions remained, and they got moved into the main script.
Step 5: Ughhh, scope. The notebook plays fast and loose with scope; the raw python script is, rightly, not so forgiving. But that meant I had to think about what got defined at what level, what got passed around in an argument, what order things happened in, et cetera. I’m not happy with the result — there’s a lot of stuff that will fail with minor edits — but it works. Scope errors will announce themselves pretty loudly with exceptions; it’s just nice to know you’re going to run into them.
Step 5a: You have to initialize the Adam optimizer before you run
sess.run(tf.global_variables_initializer()). (Thanks, StackOverflow!) The error message if you don’t is maddeningly unhelpful. (
FailedPreconditionError, I mean, what.)
Step 6: argparse! I spent some quality time reading this
neural style implementation early on and thought, gosh, that’s argparse-heavy. Then I found myself wanting to kick off a whole bunch of different script runs to do their thing overnight investigating multiple hypotheses and discovered how very much I wanted there to be command-line arguments, so I could configure all the different things I wanted to try right there and leave it alone. Aw yeah. I’ve ended up with the following:
parser.add_argument('--content', required=True) parser.add_argument('--style', required=True) parser.add_argument('--iterations', default=400) # was 200 parser.add_argument('--learning_rate', default=3.0) # was 2.0 parser.add_argument('--layer_weights', nargs=5, default=[0.2,0.2,0.2,0.2,0.2]) parser.add_argument('--run_until_steady', default=False) parser.add_argument('--noisy_start', default=True)
content is the path to the content image;
style is the path to the style image;
learning_rate are the usual;
layer_weights is the value of
STYLE_LAYERS in the original code, i.e. how much to weight each layer;
run_until_steady is a bad API because it means to ignore the value of the
iterations parameter and instead run until there is no longer significant change in cost; and
noisy_start is whether to use the content image plus static as the first input or just the plain content image.
I can definitely see adding more command line flags if I were going to be spending a lot of time with this code. (For instance, a
layer_names parameter that adjusted what
STYLE_LAYERS considered could be fun! Or making “significant change in cost” be a user-supplied rather than hardcoded parameter!)
Step 6a: Correspondingly, I configured the output filenames to record some of the metadata used to create the image (content, style, layer_weights), to make it easier to keep track of which images came from which script runs.
Stuff I haven’t done but it might be great:
Updating tensorflow, per above, and recompiling it. The slowness is acceptable — I can run quite a few trials on my 2015 MacBook overnight — but it would get frustrating if I were doing a lot of this.
Supporting both num_iterations and run_until_steady means my iterator inside the model_nn function is kind of a mess right now. I think they’re itching to be two very thin subclasses of a superclass that knows all the things about neural net training, with the subclass just handling the iterator, but I didn’t spend a lot of time thinking about this.
Reshaping input files. Right now it needs both input files to be the same dimensions. Maybe it would be cool if it didn’t need that.
Trying different pretrained models! It would be easy to pass a different arg to
load_vgg_model. It would subsequently be annoying to make sure that
STYLE_LAYERS worked — the available layer names would be different, and
load_vgg_model makes a lot of assumptions about how that model is shaped.
As your reward for reading this post, you get another cat image! A friend commented that a thing he dislikes about neural style transfer is that it’s allergic to whitespace; it wants to paint everything with a texture. This makes sense — it sees subtle variations within that whitespace and it tries to make them conform to patterns of variation it knows. This is why I ended up with the
noisy_start flag; I wondered what would happen if I didn’t add the static to the initial image, so that the original negative space stayed more negative-spacey.
This, as you can probably tell, uses the Harlem renaissance style image.
It’s still allergic to negative space — even without the generated static there are variations in pixel color in the original — but they are much subtler, so instead of saying “maybe what I see is coiled hair?” it says “big open blue patches; we like those”. But the semantics of the original image are more in place — the kittens more kitteny, the card more readable — even though the whole image has been pushed more to colorblocks and bold lines.
I find I like the results better without the static — even though the cost function is larger, and thus in a sense the algorithm is less successful. Look, one more.