big data

  • Article (book review): Everybody Lies  

     The bookEverybody Lies by Seth Stephens-Davidowitz looks at the use of search data as a way to find previously invisible correlations and connections. Example: the prevalence of the term “n*gger” in search results was the best variable in predicting whether or not the voters in that region would vote for Trump in the 2016 GOP primaries. Search data is a game-changer because it gets at what people actually believe, not what they are willing to admit to a stranger with a clipboard.

    From the review: ‘Modern microeconomics, sociology, political science and quantitative psychology all depend to a large extent on surveys of at most a few thousand respondents. In contrast, he says, there are “four unique powers of Big Data”: it provides new sources of information, such as pornographic searches; it captures what people actually do or think, rather than what they choose to tell pollsters; it enables researchers to home in on and compare demographic or geographic subsets; and it allows for speedy randomized controlled trials that demonstrate not just correlation but causality. As a result, he predicts, “the days of academics devoting months to recruiting a small number of undergraduates to perform a single test will come to an end.” In their place, “the social and behavioural sciences are most definitely going to scale,” and the conclusions researchers will be able to reach are “the stuff of science, not pseudoscience”.’