Using differences to avoid non-linearity is a pretty common technique used across EE, especially circuit design. Transistors are very non-linear and circuits like amplifiers have problematic squared terms. However, they're almost always built "differentially": there are both positive and negative input terminals, and positive and negative output terminals. The difference of the output voltages is an amplified version of the difference of the input voltages. Because the squared term at each terminal is the same polarity, subtracting them cancels it out very well. In practice, you're limited by how well the positive and negative paths match, and the mismatch allows some second order term to leak out. Unfortunately, this does not help with the third order terms.
Differential measurements cancelling out unwanted terms is a common "trick" in electronics, and explains why Ethernet cables come wired as 4 twisted pairs.
Along similar lines, the "Translinear Principle" [1] may be another fun mathematical-relationship-turned-practical-electronics-tool to explore: it takes advantage of the logarithmic relationship between voltage and current in a PN junction -- specifically a bipolar junction transistor (BJT).
Since log(X) + log(Y) = log(X*Y), you can make very simple analog circuits which compute, for example, a square [2] or a square root [3] using just 4 transistors. These can actually be quite high-performance, low-power-consumption circuits. (JavaScript simulations attached; just click the link and then "Run DC Sweep".)
Why haven't analog circuits taken off yet in mobile computing and other spaces? They seem great for things like machine learning and possibly superior to traditional digital circuits. What are the trade-offs?
It isn't so much that analog is expensive, but digital gates are really really really cheap, robust to noise and interference (except for designed-in noise like floating point quantization error), and often field-configurable/programmable/etc. This isn't entirely fair, but:
ANALOG: a TL082 op-amp has ~30 analog transistors and costs $0.10 = $0.003 per transistor.
DIGITAL: a GeForce 2080 Ti has ~18.6 billion digital transistors and MSRP $799 = $0.00000004 per transistor.
The kind of analog computation I'm demonstrating requires that the 4 BJT transistors be well matched to each other, which can be expensive in terms of silicon die area. Digital computation doesn't require well-matched transistors. It requires that each of the individual transistors is just barely good enough relative to some absolute threshold. (And if it doesn't, we'll just clock the chip slower!)
Of course, there are tons of critical analog circuits in your smartphone. The RF interfaces for cell/wifi/BT, the capacitive touchscreen, the cameras, the battery management system, audio, display... But given the cost difference and other advantages, it makes sense to do only what must be done in the analog domain, then convert to digital and do the rest.
Propagation of noise is a big deal. "Noise immunity" is one of the defining features of digital systems. The copy is identical to the original, even over many generations of copying. While ML can tolerate some noise, the sheer amount that can pile up from one end to the other of a signal chain might still give pause.
This is why some of the pioneers of information theory and digital systems worked for the phone company. ;-)
heh ... way ... way before digital computers were invented, analog computers, mechanical, and electromechanical, were implemented. A great example is the comparison circuit that keeps your house at nearly constant temperatures. It started out as a bimetallic stripe and a mercury switch to turn a furnace on or off. Electronic circuits, doing similar but more complex things came later.
If you think about it, the fastest sorting algorithm is O(n) via a mechanical device. Parallel versions of these sorting mechanisms are used to bin fruits and vegetables for packing and shipping. They are all analog computers.
Power is a big deal with analog. Power wasted in heating a device is (I^2)R where I is the current through and R is the effective resistance of the device. Power wasted is obviously 0 if R=0. Because I=V/R, power wasted is also 0 if R is very high (because in that case I approaches 0).
Power wasted is highest if R is between 0 and "very high." This third case is where analog electronics like to live. The first two cases are where digital electronics lives. And that's why switching power supplies and Class D amplifiers are so efficient: Their effective R values spend most of the time near zero or "high" but not much time in between.
This is also why even in digital circuits higher clock speed means more power: More transistor transitions between "fully on" and "fully off" mean more time spent in between in the high power dissipation regime.
Back in the late 90s, a mechanical engineer on the MIT solar car team was explaining the latest high-efficiency motor controllers for brushless DC motors, and my degree is also in Mechanical Engineering, so this could be way off. However, as explained to me, the highest efficiency brushless DC motor controllers in the late 1990s used a phase-locked loop to get some of the off periods in the square wave in the power conversion stage to be in sync with the switches between coils in the motor. Transistors don't switch instantly, and a good deal of your power loss is during the transition between fully off and fully on. By synchronizing the two parts of the motor controller, you get the coil changes to happen when that part of the controller is seeing dips in its power input, so you get less loss.
To answer the post's final question: Yes. It is quite common to design instrumentation to cancel systematic uncertainties or nonlinearities to leading order. Fancier arrangements go to higher orders.
In the particular case of the differential capacitor, that form of differential measurement is particularly common -- if one non-linear system can paired with a symmetric partner with equal and opposite nonlinearity, the leading-order non-linearities are suppressed (to the extent that the matched pair are actually matched).
See: https://en.wikipedia.org/wiki/Differential_amplifier https://en.wikipedia.org/wiki/Third-order_intercept_point