Home United States USA — IT The Facebook Controversy: Privacy Is Not the Issue

The Facebook Controversy: Privacy Is Not the Issue

April 18, 2018

181

The real danger is that the information and social platforms on the internet are being corrupted in the service of con men, political demagogues and thieves
Cambridge Analytica’s wholesale scraping of Facebook user data is familiar news by now, and we are all “shocked” that personal data are being shared and traded on a massive scale. But the real issue with social media is not the harm to individual users whose information was shared, but the sophisticated and sometimes subtle mass manipulation of social and political behavior by bad actors, facilitated by deceit, fraud and the amplification of lies that spread easily through societal discourse on the internet.
Any pretense to privacy was abandoned long ago when we accepted the free service model of Google, Facebook, Twitter and others. Did the Senators who listened to Mark Zuckerberg’s mea culpa last week really think that Facebook, which charges nothing for its services to users, was simply providing a public service for free? Where did they think its $11 billion in advertising revenue came from, if not from selling ads and user data to advertisers?
Let’s identify the real issue. The tangible damage resulting from identity theft and the loss of personal financial information are growing problems, but they typically are not caused by our use of social media platforms, where we share a lot of information—but rarely our credit card or Social Security numbers.
The controversy about Cambridge Analytica that landed Zuckerberg before Congress actually began brewing over a year ago. It was a controversy not about privacy but about how Cambridge Analytica put vast amounts of personal data, mostly from Facebook, into its so-called “psychographic” engine to influence behavior at the individual level (see When the Big Lie Meets Big Data, published here in March of 2017).
Cambridge Analytica worked with researchers from Cambridge University who developed a Facebook app that provided a free personality test, then proceeded to scoop up all the users’ Facebook data plus that of all their friends (thus leveraging the actual users, who numbered less than a million, to harvest the data of more than 80 million people). Using this data, Cambridge Analytica then classified each individual’s personality according to the so-called “OCEAN” scale (Openness, Conscientiousness, Extroversion, Agreeableness and Neuroticism) and fashioned individually targeted messages to appeal to each person’s personality.
Neither subpoenas nor investigative journalists were needed to find all this out—most of it was publicly revealed by Cambridge Analytica’s chief Alexander Nix in a marketing presentation that received wide distribution on YouTube. Cambridge Analytica (owned partly by Robert Mercer, an early pioneer in artificial intelligence who has been a financial backer of Breitbart News and other right-wing causes) had already done work for the Trump Campaign, and Nix was seeking more business.
The real danger revealed by the Cambridge Analytica scandal is that the information and social platforms of the internet, on which we increasingly spend our time and through which more and more of our personal and social connections flow, are being corrupted in the service of con men, political demagogues and thieves. Russia’s troll farm, the Internet Research Agency, employs fake user accounts to post divisive messages, purchase political ads, spread fabricated images and even organize political rallies.
The danger of misinformation is not just political; it is commercial as well. Major purchasers of internet advertising know that the “pay per click” model is flawed. Competitors can set up bots (or even human campaigns) to click on their ads, driving up their costs and casting doubt on the value of an advertising campaign. The company Devumi sells Twitter followers and retweets to celebrities and businesses in order to make them appear more popular than they are. The followers are fake, cobbled together in automated fashion by scraping the social media web for names and photos.
Until Zuckerberg’s decision to testify before Congress, there was scant evidence that Twitter or Facebook were disturbed about any of this. Still, there are a number of machine learning tools that can be used to identify fake accounts or activity.
Benford’s Law
In 2015, a University of Maryland professor, Jen Golbeck, discovered an ingenious real-time method for identifying fake social media accounts. She found that the number of a user’s Twitter or Facebook friends follows a well-known statistical distribution called Benford’s Law. The law states that, in a conforming data set, the first significant digit of numbers is a “1” about 30 percent of the time—six times more often than it’s a 9. The phenomenon, which is quite widespread, is named after physicist Frank Benford, who illustrated it with the surface areas of rivers, street addresses, numbers appearing in a Reader’s Digest issue, and many more examples.
In other words, if you looked at (say) a thousand Facebook users and counted how many friends each had, roughly 300 of them would have friend counts in the teens (1x), 100–199 range (1xx) or 1,000–1,999 range (1xxx). Only 5 percent would have counts beginning with a nine: 9,90–99,900–999,9,000–9,999.
We can represent each Facebook, Twitter or other social media user as a network of linked users. A plot of an individual user’s links to other users might look like the figure below:
To assess whether a user is genuine, we can look at each of that user’s friends, and count their friends or followers. Specifically:
1. Consider a friend or follower of the account in question.
2. Count its followers/friends (“friend of friend”); record.
3. Repeat for all the remaining friends/followers of the original account.
4. Calculate the distribution of those “friend of friend” counts.
Russian Bots Were Revealed, Yet Stayed Online
Golbeck found that the overwhelming majority of Facebook, Twitter and other social media follower and friend counts followed Benford’s Law. On Twitter, however, she found a small set of 170 accounts whose follower distributions differed markedly from that law. In her 2015 paper she wrote:
“Some accounts were spam, but most were part of a network of Russian bots that posted random snippets of literary works or quotations, often pulled arbitrarily from the middle of a sentence. All the Russian accounts behaved the same way: following other accounts of their type, posting exactly one stock photo image, and using a different stock photo image as the profile picture.”
Golbeck told me that she and others posted lists of Russian bots active on Twitter three years ago, and they were still active as of January in this year.