Deep Photo Style Transfer | Two Minute Papers #150


Dear Fellow Scholars, this is Two Minute Papers
with Károly Zsolnai-Fehér. Let’s have a look at this majestic technique
that is about style transfer for photos. Style transfer is a magical algorithm where
we have one photograph with content, and one with an interesting style. And the output is a third image with these
two photos fused together. This is typically achieved by a classical
machine learning technique that we call a convolutional neural network. The more layers these networks contain, the
more powerful they are, and the more capable they are in building an intuitive understanding
of an image. We had several earlier episodes on visualizing
the inner workings of these neural networks, as always, the links are available in the
video description. Don’t miss out, I am sure you’ll be as amazed
by the results as I was when I have first seen them. These previous neural style transfer techniques
work amazingly well if we’re looking for a painterly result. However, for photo style transfer, the closeups
here reveal that they introduce unnecessary distortions to the image. They won’t look realistic anymore. But not with this new one. Have a look at these results. This is absolute insanity. They are just right in some sense. There is an elusive quality to them. And this is the challenge! We not only have to put what we’re searching
for into words, but we have to find a mathematical description of these words to make the computer
execute it. So what would this definition be? Just think about this, this is a really challenging
question. The authors decided that the photorealism
of the output image is to be maximized. Well, this sounds great, but who really knows
a rigorous mathematical description of photorealism? One possible solution would be to stipulate
that the changes in the output color would have to preserve the ratios and distances
of the input style colors. Similar rules are used in linear algebra and
computer graphics to make sure shapes don’t get distorted as we’re tormenting them with
rotations, translations and more. We like to call these operations affine transformations. So the fully scientific description would
be that we add a regularization term that stipulates, that these colors only undergo
affine transformations. But we’ve used one more new word here – what
does this regularization term mean? This means that there are a ton of different
possible solutions for transferring the colors, and we’re trying to steer the optimizer towards
solutions that adhere to some additional criterion, in our case, the affine transformations. In the mathematical description of this problem,
these additional stipulations appear in the form of a regularization term. I am so happy that you Fellow Scholars have
been watching Two Minute Papers for so long, that we can finally talk about techniques
like this. It’s fantastic to have an audience that has
this level of understanding of these topics. Love it. Just absolutely love it. The source code of this project is also available. Also, make sure to have a look at Distill,
an absolutely amazing new science journal from the Google Brain team. But this is no ordinary journal, because what
they are looking for is not necessarily novel techniques, but novel and intuitive ways of
explaining already existing works. There is also an excellent write-up on research
debt that can almost be understood as a manifesto for this journal. A worthy read indeed. They also created a prize for science distillation. I love this new initiative and I am sure we’ll
hear about this journal a lot in the near future. Make sure to have a look, there is a link
to all of these in the video description. Thanks for watching and for your generous
support, and I’ll see you next time!

22 thoughts on “Deep Photo Style Transfer | Two Minute Papers #150

  1. Those results are astonishing. I wonder how long it will take before media starts using techniques like this over typical video filters?

  2. Hey Karoly, with all that open AI spreading fast around the world i think you wil get really bussy in the next months/years 😁

  3. I have a noob question but I think its important, for example I create a recommendation system with a neural network and it works well, what I need to save to keep this neural network running?, I mean for example I need move the NN to another server what I do cause the training take a lot of time and I already have it train, or in production keep training until gives an result.

    Sorry if my english is bad still learning…

  4. The author's code is inefficient, lacks the current Neural-Style features, and uses Matlab. This fork of the code is much more efficient, and removes the Matlab Dependency: https://github.com/martinbenson/deep-photo-styletransfer. This fork also has mask image creation guides in it's wiki.

    Also, do you plan to talk about the "Controlling Perceptual Factors in Neural Style Transfer" research paper? https://arxiv.org/abs/1611.07865, Github code here: https://github.com/leongatys/NeuralImageSynthesis. I got the "Scale Control", "Color Control", and "Scale Control For High Resolution" working in the current version of Neural-Style using this code here: https://github.com/ProGamerGov/Neural-Tools

    Neural-Style (Which Deep Photo Style Transfer and NeuralImageSynthesis use): https://github.com/jcjohnson/neural-style

  5. Simply stunning! 🙂 Anyone have any links to any applications that use Technique like this for Audio morphing? Something with a User interface if possible. 🙂 Thanks. Also.. a video plugin based on this would be amazing that could do moving images (within the restrictions of course!) 😉

  6. Awesome. There are some apps on iOS out that simulante this effect, but i guess we'll see more photoshop filters like these in the future.

  7. Does anyone think this could be done with other categories of transformations? that is, asserting that the output is conformal, smooth, continuous, isometric etc.. instead of just affine or linear?

    I think the conformal restriction would lead to really interesting results.

  8. When you show these long tables it would be nice to have a floating bar which shows which column refers to what. In a paper you can just dart your eyes over to the bottom of the figure to check it but in the video you can't.

  9. I was just thinking yesterday that i would love to read a journall just like You mentioned in this episode. I'm realy gratefull for mentioning it. Keep up the good work !

  10. You know what I would love? A website where someone takes as many of these kinds of programs and neural networks as they can and converts them to a version that you can use in your browser. I would spend hours on that!

  11. Haha, I don't have any understanding of machine learning. I watch your videos just to see what cool stuff researchers have done.

  12. I just read the "Research Debt" link and I was really struck by the idea and of destilling research. I love that I now have a word and a concept of what people like you, Kurzgesagt, TED-Ed, Crashcourse etc. are doing. It reminded me of the legendary Carl Sagan and his very important work Very interesting and important!

  13. Love the videos! Just one suggestion: When you display multiple columns of images like around 1:40, it would help to have a quick explanation of what each of the columns represent.

  14. Hi! Thank you so much for this amazing work! Could you tell me, please, what editor did you use to create this video?

Leave a Reply

Your email address will not be published. Required fields are marked *