The central limit theorem shows us that unimodal data with lots of independent sources of error tends towards a normal distribution. That description is a good first-pass, descriptive model for lots and lots of contexts, and standard deviation speaks well to normally distributed data.
Squaring error isn't just a convenient way to remove sign, it's driven by a lot of data-sets' conformance to the central limit theorem.
Squaring error isn't just a convenient way to remove sign, it's driven by a lot of data-sets' conformance to the central limit theorem.