"More data means more information, perhaps, but it also means more false information.Trechos retirados de "Antifragile" de Nassim Taleb.
There is a certain property of data: in large data sets, large deviations are vastly more attributable to noise (or variance) than to information (or signal).
If I have a set of 200 random variables, completely unrelated to each other, then it would be near impossible not to find in it a high correlation of sorts, say 30 percent, but that is entirely spurious.
The fooled-by-data effect is accelerating. There is a nasty phenomenon called “Big Data” in which researchers have brought cherry-picking to an industrial level. Modernity provides too many variables (but too little data per variable), and the spurious relationships grow much, much faster than real information, as noise is convex and information is concave.
Increasingly, data can only truly deliver via negativa-style knowledge - it can be effectively used to debunk, not confirm."
quarta-feira, maio 08, 2013
Curiosidade do dia
Agora que cada vez mais se escreve e se elogia o "Big Data" que tudo vai revelar: