An interesting article from MIT shows how easy it is to identify people using just a few data points. The scientists have demonstrated that only a few data points can identify you well enough to pick out your credit card purchases made from a giant file of credit cards purchases.
What I think that most people are going to find scary about this is that the data points don’t even have to be related. To quote the article, “That means that someone with copies of just three of your recent receipts — or one receipt, one Instagram photo of you having coffee with friends, and one tweet about the phone you just bought — would have a 94 percent chance of extracting your credit card records from those of a million other people.”
What this means is that the ‘metadata’ that a lot of companies sell to each other is not safe, as they would have you believe. Metadata is data that has been scrubbed of identifying information. For instance, it might be credit card transactions without the credit card number and the name of the person doing the purchasing. But it would show the date of the purchase, the amount spent, and some short description of what was bought.
But metadata can mean many other things. It can mean readings from your home Internet of Things devices or your fitness wearable. It can mean records on web sites you visited through a search engine. It can be almost any big pile of raw data that is assembled and compiled from individual transactions. Big web sites tell us not to worry about security because they don’t sell our information. Instead they only sell big metadata files that have removed any personal identifying information.
What the MIT data shows is that it’s relatively easy for data scientists to find any individual’s records within a big pile of metadata. Just two data points were enough in this MIT trial to identify 40 percent of people. Five data points were enough to identify almost everybody.
Of course, in real life we leave all sorts of data points in public that identify us. We allow adware to stay on our computers that tell advertisers sites we have visited in the past. We allow tracking cookies that might tell somebody our name and address. We frequently sign up for public websites and then give them all sorts of personal information in exchange for the right to use the site for free. And we tell sites like Pinterest and Facebook all of our likes and dislikes from restaurants to clothes to politics.
I read somewhere last year that there are other easy ways to identify almost everybody. One thing that makes us easy to identify is the configuration file on our computer that tracks all of the specific versions of software. It turns out that this is as accurate for identifying a given computer as fingerprints are to identify people. Every computer contains small differences in the versions of programs installed that makes that machine unique. When you consider the hundreds or programs and apps that sit on a typical PC, tablet, or smartphone this is not surprising.
I don’t think there are many people left who are naïve enough not to know that people watch what you do online. After all, if you post all of your favorite things on some public website, you have to expect that somebody is going to gather that information somewhere and associate it with you in a database.
But this MIT research shows that big data research firms can take that public data and then match it up to all sorts of things that you didn’t think were public, like scrubbed metafile records of credit card receipts or phone calls. We’ve been told that such metadata records are anonymous, but this research shows how easy it is for a big company to crack the metadata and find your specific records.
We would be naïve to think that somebody isn’t already doing that. There are now a number of firms who will sell you a file of data about anybody if you are willing to pay them for it. A company that will do that has no morals, because they could be selling your information to a jealous partner (or ex-partner), a stalker, or who knows who else? This kind of data is also available to your employer or anybody else willing to pony up a few bucks to take a closer look at you.
It ought to be illegal to sell your personal data to the public, but it’s not. But even if your personal data was not available to the general public, it is routinely passed between large corporations who use it to create a detailed profile of you.
I don’t think there is any way to stop companies from gathering data on us. Data is now generated for everything we do as a routine part of living in a computerized world. If we had strict privacy laws it could be made harder to share such data and to monetize it. But it’s going to take a lot of people to get really angry about this for this to drive a political change. We don’t seem very close to that point yet.