Statistics have always competed with anecdotes. Now they must also compete with ‘Big Data.’ (More)

“So many people, friends of mine”

The God-King’s rationale for gutting the Dodd-Frank Wall Street Reform and Consumer Protection Act was a case study in anecdotes over statistics:

“We expect to be cutting a lot out of Dodd-Frank, because frankly I have so many people, friends of mine, that have nice businesses and they can’t borrow money,” Trump said in announcing the review of Dodd-Frank on Friday. “They just can’t get any money because the banks just won’t let them borrow because of the rules and regulations in Dodd-Frank.”

As Forbes’ Stephen Gandel notes in that article, the God-King’s claim is pure bunk:

Last year, Fortune found that Trump and his businesses have taken on more than $1 billion in debt. That amount includes a $170 million line of credit on Trump’s recently opened Washington Hotel.

In the past, Trump has said that Dodd-Frank is killing small business lending. But there isn’t much evidence of that either. Commercial and industrial loans have been one of the fastest growing segments of lending in the past few years. At the end of Sept. 2016, the last the figure is available, the volume of C&I loans outstanding from U.S. banks had risen to $1.7 trillion, up $250 billion from two years before.

The God-King may know some people who can’t get loans but, as Gandel concludes, “that may say more about the company Trump keeps, and less about Dodd-Frank.”

Or it may be merely a friend-of-a-friend urban legend, like the God-King’s tale about Bernhard Langer.

“How often do these crash?”

Of course, the God-King didn’t invent the tactic of valuing anecdotes over statistics. Indeed that’s a common human failing, and psychologists have a term for it: the availability heuristic. That is, we tend to estimate the likelihood of an event by on how readily we can recall even a single vivid example. This can lead to wildly-mistaken estimates of risk, such as the fear of commercial airline flying:

PASSENGER: Excuse me, Miss. How often do these crash?
FLIGHT ATTENDANT: Only once.

In fact commercial airlines, by far, the safest way to travel. From 2009-2013, an average 443 people per year died in commercial airline crashes. That sounds like a lot, unless you know that airlines carry roughly 3 billion passengers per year. Put another way, ten times more people die of accidental choking, each year, in the U.S. alone, than die in commercial airline crashes worldwide.

Yet however irrational, it’s entirely normal to fret about flying, especially right after a crash has been in the news. When it comes to feelings, anecdotes swallow statistics.

“An understanding of a population in its entirety”

But statistics now have another competitor in ‘Big Data.’ And no, those are not the same, as William Davies explains at the Guardian:

Statistics were designed to give an understanding of a population in its entirety, rather than simply to pinpoint strategically valuable sources of power and wealth. In the early days, this didn’t always involve producing numbers. In Germany, for example (from where we get the term Statistik) the challenge was to map disparate customs, institutions and laws across an empire of hundreds of micro-states. What characterised this knowledge as statistical was its holistic nature: it aimed to produce a picture of the nation as a whole. Statistics would do for populations what cartography did for territory.

Indeed it’s no mere coincidence that the word “statistic” includes most of the word “state”:

statistics (n.)

1770, “science dealing with data about the condition of a state or community” [Barnhart], from German Statistik, popularized and perhaps coined by German political scientist Gottfried Aschenwall (1719-1772) in his “Vorbereitung zur Staatswissenschaft” (1748), from Modern Latin statisticum (collegium) “(lecture course on) state affairs,” from Italian statista “one skilled in statecraft,” from Latin status (see state (n.2)).

We see statistics about lots of stuff nowadays, but the methods and calculations began with attempts to understand entire peoples. Davies explores the many weaknesses of that approach. For example:

The Enlightenment ideal of the nation as a single community, bound together by a common measurement framework, is harder and harder to sustain. If you live in one of the towns in the Welsh valleys that was once dependent on steel manufacturing or mining for jobs, politicians talking of how “the economy” is “doing well” are likely to breed additional resentment. From that standpoint, the term “GDP” fails to capture anything meaningful or credible.

When macroeconomics is used to make a political argument, this implies that the losses in one part of the country are offset by gains somewhere else. Headline-grabbing national indicators, such as GDP and inflation, conceal all sorts of localised gains and losses that are less commonly discussed by national politicians. Immigration may be good for the economy overall, but this does not mean that there are no local costs at all. So when politicians use national indicators to make their case, they implicitly assume some spirit of patriotic mutual sacrifice on the part of voters: you might be the loser on this occasion, but next time you might be the beneficiary. But what if the tables are never turned? What if the same city or region wins over and over again, while others always lose? On what principle of give and take is that justified?

It’s not merely geography. What if certain demographic groups (e.g.: wealthy, white men) repeatedly scoop up most or all of the gains, while other groups (working class families, women, people of color) get only the leftovers? And that’s a huge problem when the media and leaders cite mean (think per-capita) rather than median data. As the familiar example goes, if Bill Gates walks into a room with 19 unemployed people, the room’s per-capita income just skyrocketed … but the room’s median income didn’t change at all.

“Data that accumulates by default, as a consequence of sweeping digitization”

Davies also notes that statistics have historically been gathered by or for governments, usually to answer specific questions. But that is changing with the advent of ‘Big Data,’ gathered from our digital footprints:

In recent years, a new way of quantifying and visualising populations has emerged that potentially pushes statistics to the margins, ushering in a different era altogether. Statistics, collected and compiled by technical experts, are giving way to data that accumulates by default, as a consequence of sweeping digitisation. Traditionally, statisticians have known which questions they wanted to ask regarding which population, then set out to answer them. By contrast, data is automatically produced whenever we swipe a loyalty card, comment on Facebook or search for something on Google. As our cities, cars, homes and household objects become digitally connected, the amount of data we leave in our trail will grow even greater. In this new world, data is captured first and research questions come later.

Moreover, ‘Big Data’ can often measure something that evades statisticians, whether an opinion or preference motivates you to do something. In fact, that’s often how the data is ‘born,’ as we stop buying This or start buying That, abandon This online source and start using That one, delete or stop using This smartphone app and install and start using That one.

Such information can be very important to marketers … and to skilled political operatives:

Figures close to Donald Trump, such as his chief strategist Steve Bannon and the Silicon Valley billionaire Peter Thiel, are closely acquainted with cutting-edge data analytics techniques, via companies such as Cambridge Analytica, on whose board Bannon sits. During the presidential election campaign, Cambridge Analytica drew on various data sources to develop psychological profiles of millions of Americans, which it then used to help Trump target voters with tailored messaging.

This ability to develop and refine psychological insights across large populations is one of the most innovative and controversial features of the new data analysis. As techniques of “sentiment analysis”, which detect the mood of large numbers of people by tracking indicators such as word usage on social media, become incorporated into political campaigns, the emotional allure of figures such as Trump will become amenable to scientific scrutiny. In a world where the political feelings of the general public are becoming this traceable, who needs pollsters?

Put another way, if the God-King can say “I know so many people, good friends” and be confident that a politically sufficient number of Americans will believe that … do contrary polls and statistics matter at all?

“Those still committed to public knowledge and public argument and those who profit from the ongoing disintegration of those things”

Finally, Davies emphasizes that most ‘Big Data’ is collected by and for private actors. And unlike states who release statistics to the public – often as required by law – those private actors can and often do keep ‘Big Data’ to themselves. That data may drive public policy, but it’s not exposed to public debate:

Statistics began life as a tool through which the state could view society, but gradually developed into something that academics, civic reformers and businesses had a stake in. But for many data analytics firms, secrecy surrounding methods and sources of data is a competitive advantage that they will not give up voluntarily.

A post-statistical society is a potentially frightening proposition, not because it would lack any forms of truth or expertise altogether, but because it would drastically privatise them. Statistics are one of many pillars of liberalism, indeed of Enlightenment. The experts who produce and use them have become painted as arrogant and oblivious to the emotional and local dimensions of politics. No doubt there are ways in which data collection could be adapted to reflect lived experiences better. But the battle that will need to be waged in the long term is not between an elite-led politics of facts versus a populist politics of feeling. It is between those still committed to public knowledge and public argument and those who profit from the ongoing disintegration of those things.

I recommend reading Davies’ article in full. It’s a long read, but very much worth your time … and it provides essential context for our ‘post-truth’ age.

+++++

Image Credit: Safari Books Online

+++++

Good day and good nuts