As a mathematician, I love how the word "differentiable" is changing meaning so fast for so many people!
If I had been told a few years ago that the mainstream definition of "differentiable function" would become so different during my lifetime I would not have believed it! Cannot wait to teach a course in "differentiable calculus" and introduce the new stuff.
As a non-mathematician, how did the word "differentiable" change its meaning? Is it not still that one can compute the derivative (so that Gradient Descent can be applied)? (honest question)
The really canonical non-differentiable function is the Takagi/Blancmange Function[0] if you like things deterministic, and the Wiener Process[1] if you like stochastic processes.
Both are everywhere-continuous, nowhere-differentiable.
But the definition of differentiable did not change, did it? RELU is not differentiable (or - in other words - is differentiable at all points except 0). Not nitpicking, just trying to improve my understanding.
> But the definition of differentiable did not change, did it?
It did! In the title of this article they are describing a blatantly non-differentiable function (the rasterizer) using the word "differentiable". This is indeed a new usage of the word, only seen since the advent of automatic differentiation a few years ago.
To be fair, AI researchers used strictly differentiable functions (which is required for back-propagation) until recently. For example lenet5 uses the logistic function.
Only in 2011, some smart-asses [1] :) experimented with rectifier units and discovered they're even better
[1] Xavier Glorot, Antoine Bordes and Yoshua Bengio - Deep sparse rectifier neural networks (2011)
They make the point in their presentation to say that their method replaces the non-differentiable step function with a differentiable sigmoid function to enable the rasterizer to be differentiable.
Add: ah I see what you mean. In uni differential calculus we learned about differentiable from above(1), from below(0), so I assumed RELU is considered "differentiable", as opposed to Cantor set or Dirichlet / Thomae's function...
> To other commenters: It’s important not to confuse floating point with mathematics. In floating point, both zeroes are considered to be exactly equal to zero, to each other. The redundant sign bit is “piggybacked” information which retains a sign across a series of multiplies or divisions, and it only affects downstream results in a few exceptional cases (eg dividing non-zero by zero, yielding inf or -inf). The behaviour has been standardized based on various pragmatic considerations, and will not always be consistent with evaluating infinitesimals in mathematics.
I can see the reasoning that log(-0.0) should give the same exceptional result as log(-1). But bear in mind it can arise from something like log( -(a+b)) where a+b evaluates to 0. Whereas ((-a)-b) would be +zero. I.e the sign of the zero is generally not meaningful when the zero originally arises from a sum or difference. Indeed, in such cases, if you consider a and b to carry “rounding fuzz”, then the proper sign of a+b is indeterminate, and not correlated to the sign bit generated by the add. When zeroes are generated by underflow of mul or div, at least the sign is “correct”.
Although it doesn't work for modern networks, where the forward pass is done with int8 (not float, thus no +0/-0), and only the backward pass is float16/32. (common for networks used in mobile phones)
Coauthor of the paper here: this tangent does not apply to this post.
The key thing about the actual work linked in this post is that it is pointing out and addressing an important case where discontinuous computations (polygon coverage at edges) are in fact meaningfully differentiable.
Thanks for re-highlighting this point, and for doing it with a pun!
While not directly related to the differentiability question, the idea that you can resolve spatial movement to well below the spatial resolution of pixels in an imager is relevant.
It is possible to resolve displacements that are e.g. 0.02 pixels wide [1] by relying on the "natural" anti-aliasing effect of the point-spread function [2] that arises from 1. the optics (where even if lenses were free of any aberration, the aperture cannot be infinitely small nor can the lens diameter be infinitely large), 2. the pixels don't sample light at a point.
This enables video motion amplification/magnification, which has been on the front page of HN before [3].
There, the rasterizer is a physical process that needs to be modeled. Here, the rasterizer is constructed to have the necessary property.
Note that if you had a perfect model of the physical process, and if it had no noise (e.g. no photon noise and no Johnson noise---impossible), and if the pixel intensities were given to infinite precision (as opposed to being discretized by the imager bit depth), then you could resolve arbitrarily small displacements of point light sources. I wonder if that points more to a continuity question than a differentiability question... but I felt compelled to make the connection in any case.
Returning back to your comment:
> case where discontinuous computations (polygon coverage at edges) are in fact meaningfully differentiable
It seems like the rasterizer ends up being a continuous computation, unless we disagree on what counts as the "rasterizer" (I'm including any anti-aliasing strategy, including one that underlyingly makes multiple calls to a discontinuous rasterizer.)
If anyone else here is working on Deep Learning on/with vector graphics, feel free to ping me.
I am not affiliated with the great work linked here - but I have been working on the general topic as a side project for a few years.
I'm fairly new to the topic but I'm interested in 3D and VR and generative stuff and I've always yearned to see ways to apply crazy GAN stuff to immersive environments.
One way is volumetric rendering or distance fields. I've been keeping my eye on NeRF but it seems a long way from being useful for anything realtime.
Vectors and mesh representations seem somewhat neglected and this project has given me some insight into the technical problems.
I've finally got CUDA running on WSL so I intend to start having a play with things. I've also got beta access to GPT-3 and I've been curious if it can do anything interesting in terms of generating structured data. Can it manipulate symbolic scene descriptions or generate meaningful vector paths? A few experiments with SVG haven't been terribly promising.
I'd love to hear your thoughts on the general topic and any suggestions for things to look into.
Some letterforms are not topologically equivalent and cannot be transformed into each other. For a simple example, lower case 'g' has a form in some fonts that is a loop with a descender, and in other fonts a 'two story' form where the descender connects to a lower loop.
> Some letterforms are not topologically equivalent and cannot be transformed into each other.
But this does not seem like a fundamental limitation, just a temporary implementation detail. If you represent shapes implicitly using a condition like "u(x,y)>0", then there's no topology changes involved and you can dance between all the letters just by changing smoothly the values of u.
Wonderful example, and this is why signed-distance functions/fields (SDF) are used. The diagram in https://en.wikipedia.org/wiki/Level-set_method might help explain how such a representation can "uniformly" handle topology changes.
I'll excerpt:
> The advantage of the level-set model is that one can perform numerical computations involving curves and surfaces on a fixed Cartesian grid without having to parameterize these objects (this is called the Eulerian approach).[1] Also, the level-set method makes it very easy to follow shapes that change topology, for example, when a shape splits in two, develops holes, or the reverse of these operations. All these make the level-set method a great tool for modeling time-varying objects, like inflation of an airbag, or a drop of oil floating in water.
It is unclear to me what you mean by "implementation detail". In order for a SDF to be used in vector graphics, it must be represented symbolically (the alternative is to rasterize them, as what is implied by "level-set method", which means you must commit to a spatial resolution).
But I suspect you then have the same problem on your lap: how to take derivatives with respect to structural changes in your symbolic SDF?
> the alternative is to rasterize them, as what is implied by "level-set method", which means you must commit to a spatial resolution
Rasterizing is alright, since then there are no structural changes, just independent pixel values. Notice that rasterizing the function does not commit you to an output resolution, you can always interpolate it at an arbitrarily high resolution for the rendering step.
Not really, why? An interpolated rasterization is a just a particular case of vector graphics representation.
As long as the represented objects fit well in the rasterized distance field (of a fixed resolution), you can sample the level curve to an arbitrarily high resolution, just like any other kind of vector graphics.
You can do that in the same way that you can upscale a raster image to arbitrarily high resolutions by interpolating.
In my mind, it's not just that you can zoom into vector graphics without seeing pixels. It's also that you can make small adjustments to the input parameters. If you're talking about sampling a grid at resolutions on the order of ~floating point epsilon (seems akin to using fixed-point numbers for parameters), then I guess you have a point.
But any shape can be made (almost) visually identical to another shape without changing the topology. For example a C can become an O if the gap becomes arbitrarily small.
Does anyone remember the stuff Douglas Hofstadter wrote about Metafont, human creativity and artificial intelligence?
I was wondering recently whether thay might not age terribly well. Some of the things he claimed to be very difficult for a machine recently seem to be within reach.
If I had been told a few years ago that the mainstream definition of "differentiable function" would become so different during my lifetime I would not have believed it! Cannot wait to teach a course in "differentiable calculus" and introduce the new stuff.