Differentiable Rasterizer for Vector Graphics

enriquto · on Jan 13, 2021

As a mathematician, I love how the word "differentiable" is changing meaning so fast for so many people!

If I had been told a few years ago that the mainstream definition of "differentiable function" would become so different during my lifetime I would not have believed it! Cannot wait to teach a course in "differentiable calculus" and introduce the new stuff.

sabertoothed · on Jan 13, 2021

As a non-mathematician, how did the word "differentiable" change its meaning? Is it not still that one can compute the derivative (so that Gradient Descent can be applied)? (honest question)

enriquto · on Jan 13, 2021

The "canonical" example of non-differentiable function (for a traditional mathematician) is precisely RELU.

beagle3 · on Jan 13, 2021

The really canonical non-differentiable function is the Takagi/Blancmange Function[0] if you like things deterministic, and the Wiener Process[1] if you like stochastic processes.

Both are everywhere-continuous, nowhere-differentiable.

[0] https://en.wikipedia.org/wiki/Blancmange_curve

[1] https://en.wikipedia.org/wiki/Wiener_process

kwkelly · on Jan 13, 2021

The canonical example in my experience is always the Weierstrass function

im3w1l · on Jan 13, 2021

Well neural networks based on relu are almost everywhere differentiable.

sabertoothed · on Jan 13, 2021

But the definition of differentiable did not change, did it? RELU is not differentiable (or - in other words - is differentiable at all points except 0). Not nitpicking, just trying to improve my understanding.

enriquto · on Jan 13, 2021

> But the definition of differentiable did not change, did it?

It did! In the title of this article they are describing a blatantly non-differentiable function (the rasterizer) using the word "differentiable". This is indeed a new usage of the word, only seen since the advent of automatic differentiation a few years ago.

kokinon · on Jan 13, 2021

To be fair, AI researchers used strictly differentiable functions (which is required for back-propagation) until recently. For example lenet5 uses the logistic function.

Only in 2011, some smart-asses [1] :) experimented with rectifier units and discovered they're even better

[1] Xavier Glorot, Antoine Bordes and Yoshua Bengio - Deep sparse rectifier neural networks (2011)

DougBTX · on Jan 13, 2021

They make the point in their presentation to say that their method replaces the non-differentiable step function with a differentiable sigmoid function to enable the rasterizer to be differentiable.

pixelpoet · on Jan 13, 2021

How is it non-differentiable? They explicitly point out that after prefiltering (convolution with a pixel filter), it is differentiable.

Moreover, these derivatives can be obtained via methods like Automatic Differentiation, so they are true derivatives, not finite difference approximations: https://en.wikipedia.org/wiki/Automatic_differentiation

kokinon · on Jan 13, 2021

Do you mean symbolically non-differentiable? Because RELU is certainly numerically differentiable...

kokinon · on Jan 13, 2021

Add: ah I see what you mean. In uni differential calculus we learned about differentiable from above(1), from below(0), so I assumed RELU is considered "differentiable", as opposed to Cantor set or Dirichlet / Thomae's function...

enriquto · on Jan 13, 2021

What is RELU'(0) ?

dpwm · on Jan 13, 2021

Under floating point we have two zeros, so we can have RELU'(+0) != RELU'(-0). Hence, I propose:

RELU'(-0) = 0

RELU'(+0) = 1

gugagore · on Jan 13, 2021

Signed zeros are a tricky design choice, for sure...

https://www.johndcook.com/blog/2010/06/15/why-computers-have...

I like this comment:

> To other commenters: It’s important not to confuse floating point with mathematics. In floating point, both zeroes are considered to be exactly equal to zero, to each other. The redundant sign bit is “piggybacked” information which retains a sign across a series of multiplies or divisions, and it only affects downstream results in a few exceptional cases (eg dividing non-zero by zero, yielding inf or -inf). The behaviour has been standardized based on various pragmatic considerations, and will not always be consistent with evaluating infinitesimals in mathematics. I can see the reasoning that log(-0.0) should give the same exceptional result as log(-1). But bear in mind it can arise from something like log( -(a+b)) where a+b evaluates to 0. Whereas ((-a)-b) would be +zero. I.e the sign of the zero is generally not meaningful when the zero originally arises from a sum or difference. Indeed, in such cases, if you consider a and b to carry “rounding fuzz”, then the proper sign of a+b is indeterminate, and not correlated to the sign bit generated by the add. When zeroes are generated by underflow of mul or div, at least the sign is “correct”.

kokinon · on Jan 13, 2021

Nice hack!

Although it doesn't work for modern networks, where the forward pass is done with int8 (not float, thus no +0/-0), and only the backward pass is float16/32. (common for networks used in mobile phones)

IdiocyInAction · on Jan 13, 2021

It is subdifferentiable though.

enriquto · on Jan 13, 2021

Sure, and also differentiable in the sense of distributions. Still, not differentiable.

jrk · on Jan 13, 2021

Coauthor of the paper here: this tangent does not apply to this post.

The key thing about the actual work linked in this post is that it is pointing out and addressing an important case where discontinuous computations (polygon coverage at edges) are in fact meaningfully differentiable.

gugagore · on Jan 13, 2021

Thanks for re-highlighting this point, and for doing it with a pun!

While not directly related to the differentiability question, the idea that you can resolve spatial movement to well below the spatial resolution of pixels in an imager is relevant.

It is possible to resolve displacements that are e.g. 0.02 pixels wide [1] by relying on the "natural" anti-aliasing effect of the point-spread function [2] that arises from 1. the optics (where even if lenses were free of any aberration, the aperture cannot be infinitely small nor can the lens diameter be infinitely large), 2. the pixels don't sample light at a point.

This enables video motion amplification/magnification, which has been on the front page of HN before [3].

There, the rasterizer is a physical process that needs to be modeled. Here, the rasterizer is constructed to have the necessary property.

Note that if you had a perfect model of the physical process, and if it had no noise (e.g. no photon noise and no Johnson noise---impossible), and if the pixel intensities were given to infinite precision (as opposed to being discretized by the imager bit depth), then you could resolve arbitrarily small displacements of point light sources. I wonder if that points more to a continuity question than a differentiability question... but I felt compelled to make the connection in any case.

Returning back to your comment: > case where discontinuous computations (polygon coverage at edges) are in fact meaningfully differentiable

It seems like the rasterizer ends up being a continuous computation, unless we disagree on what counts as the "rasterizer" (I'm including any anti-aliasing strategy, including one that underlyingly makes multiple calls to a discontinuous rasterizer.)

[1] Sound-Induced Motions of Individual Cochlear Hair Bundles https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1304819/

[2] https://en.wikipedia.org/wiki/Point_spread_function

[3] https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...

sabertoothed · on Jan 13, 2021

If anyone else here is working on Deep Learning on/with vector graphics, feel free to ping me. I am not affiliated with the great work linked here - but I have been working on the general topic as a side project for a few years.

andybak · on Jan 13, 2021

I'm fairly new to the topic but I'm interested in 3D and VR and generative stuff and I've always yearned to see ways to apply crazy GAN stuff to immersive environments.

One way is volumetric rendering or distance fields. I've been keeping my eye on NeRF but it seems a long way from being useful for anything realtime.

Vectors and mesh representations seem somewhat neglected and this project has given me some insight into the technical problems.

I've finally got CUDA running on WSL so I intend to start having a play with things. I've also got beta access to GPT-3 and I've been curious if it can do anything interesting in terms of generating structured data. Can it manipulate symbolic scene descriptions or generate meaningful vector paths? A few experiments with SVG haven't been terribly promising.

I'd love to hear your thoughts on the general topic and any suggestions for things to look into.

andybak · on Jan 13, 2021

(you haven't mentioned how to contact you. Nothing obvious in your HN profile?)

sabertoothed · on Jan 13, 2021

Sry, ping me here: https://twitter.com/wichmaennchen

Jack000 · on Jan 13, 2021

This is really cool. A possible application I see is GAN generation of new fonts.

RicoElectrico · on Jan 13, 2021

However according to the paper it can't arbitrarily change topology.

> Discrete changes to the topology of vector graphics are important, but they remain beyond the scope of this paper.

Still, very impressive what they achieved.

anentropic · on Jan 13, 2021

What does "topology" mean in this context?

I was guessing it meant something like the z-index of different shape layers, which wouldn't pose a problem for font generation

webmaven · on Jan 13, 2021

> What does "topology" mean in this context?

Some letterforms are not topologically equivalent and cannot be transformed into each other. For a simple example, lower case 'g' has a form in some fonts that is a loop with a descender, and in other fonts a 'two story' form where the descender connects to a lower loop.

enriquto · on Jan 13, 2021

> Some letterforms are not topologically equivalent and cannot be transformed into each other.

But this does not seem like a fundamental limitation, just a temporary implementation detail. If you represent shapes implicitly using a condition like "u(x,y)>0", then there's no topology changes involved and you can dance between all the letters just by changing smoothly the values of u.

gugagore · on Jan 13, 2021

Wonderful example, and this is why signed-distance functions/fields (SDF) are used. The diagram in https://en.wikipedia.org/wiki/Level-set_method might help explain how such a representation can "uniformly" handle topology changes.

I'll excerpt:

> The advantage of the level-set model is that one can perform numerical computations involving curves and surfaces on a fixed Cartesian grid without having to parameterize these objects (this is called the Eulerian approach).[1] Also, the level-set method makes it very easy to follow shapes that change topology, for example, when a shape splits in two, develops holes, or the reverse of these operations. All these make the level-set method a great tool for modeling time-varying objects, like inflation of an airbag, or a drop of oil floating in water.

It is unclear to me what you mean by "implementation detail". In order for a SDF to be used in vector graphics, it must be represented symbolically (the alternative is to rasterize them, as what is implied by "level-set method", which means you must commit to a spatial resolution). But I suspect you then have the same problem on your lap: how to take derivatives with respect to structural changes in your symbolic SDF?

This question reminds me a bit of https://news.ycombinator.com/item?id=19871207 discussing "Why are 2D vector graphics so much harder than 3D?"

enriquto · on Jan 14, 2021

> the alternative is to rasterize them, as what is implied by "level-set method", which means you must commit to a spatial resolution

Rasterizing is alright, since then there are no structural changes, just independent pixel values. Notice that rasterizing the function does not commit you to an output resolution, you can always interpolate it at an arbitrarily high resolution for the rendering step.

gugagore · on Jan 14, 2021

Surely you can see that rasterizing defeats the purpose of a vector graphics representation.

enriquto · on Jan 14, 2021

Not really, why? An interpolated rasterization is a just a particular case of vector graphics representation.

As long as the represented objects fit well in the rasterized distance field (of a fixed resolution), you can sample the level curve to an arbitrarily high resolution, just like any other kind of vector graphics.

gugagore · on Jan 14, 2021

You can do that in the same way that you can upscale a raster image to arbitrarily high resolutions by interpolating.

In my mind, it's not just that you can zoom into vector graphics without seeing pixels. It's also that you can make small adjustments to the input parameters. If you're talking about sampling a grid at resolutions on the order of ~floating point epsilon (seems akin to using fixed-point numbers for parameters), then I guess you have a point.

andybak · on Jan 13, 2021

But any shape can be made (almost) visually identical to another shape without changing the topology. For example a C can become an O if the gap becomes arbitrarily small.

krastanov · on Jan 13, 2021

Can an an O became a C in this example?

andybak · on Jan 13, 2021

Yes. The existing hole shrinks to the point of invisibility and a dimple in the side opens up to become the hollow of the C.

anentropic · on Jan 13, 2021

ah I see, thanks

andybak · on Jan 13, 2021

Does anyone remember the stuff Douglas Hofstadter wrote about Metafont, human creativity and artificial intelligence?

I was wondering recently whether thay might not age terribly well. Some of the things he claimed to be very difficult for a machine recently seem to be within reach.