The article mentions the fairly familiar fact that "median is to L1 as mean is t...

TheLoneWolfling · on May 17, 2015

Define it as the limit as n goes to 0 from above of the sum of abs(act - expected)^(n), i.e. of Ln. No need to muck about with zero to the zeroth power directly.

skierscott · on May 17, 2015

The L0/L1 norm can have surprising benefits under certain assumptions. If you know (or assume) your signal is sparse (or mostly zero) in some domain, you're just trying to minimize the nonzero terms with the L0 norm.... But that's hard so we use the L1 norm (which still guarantees exact recovery even under Nyquist). Look up the 2004 paper by Candes and Tao for more info (titled "Exact recovery ... undersampled ...")

tome · on May 17, 2015

Wow, despite studying probability for years I never quite realised that the mean minimised the L2 norm. This is despite knowing full-well the essentially equivalent fact that the mean is the orthogonal projection in L2!

The median minimising L1 is also very nice, and I did not know that. It also means that the concept of median generalises easily to higher dimensions.