For functions that are not differentiable you can use the subdifferential [0] (or subgradient) which gives a set rather than a single value. In your example this would be the interval [-2,1].
In practice, from what I've read it seems to be fine to just take any value between these and most commonly they are just averaged. It helps that these regions are rarely encountered.
For more detail, check out Chapter 6 [1], Section 6.3 "Hidden Units" in The Deep Learning book.
In practice, from what I've read it seems to be fine to just take any value between these and most commonly they are just averaged. It helps that these regions are rarely encountered.
For more detail, check out Chapter 6 [1], Section 6.3 "Hidden Units" in The Deep Learning book.
[0] https://en.wikipedia.org/wiki/Subderivative
[1] http://www.deeplearningbook.org/contents/mlp.html