The Basics of Big Data

DARPA_Big_DataI read all of time how big data is going to transform our lives. Big data is supposed to make our lives better by sorting through the data that surrounds us to help us make sense out of the chaos. This will be accomplished using tools of the new science / technology called analytics.

The most commonly used tools that make some basic sense out of big data are called descriptive analytics. This is the process of screening big data sets to produce statistics that we can understand. In the simplest sense descriptive analytics is used to count and tally data into understandable pieces.

Descriptive analytics are used to do things like track hits on web sites, to track followers on social media sites and to track other statistics like page views or any other statistic that involves basic counting. One of the more well-known uses of descriptive analytics in our industry is when the cable and cellphone companies track the amount of data that customers have used during the month to apply against data caps. If you recall, some of the big companies like Comcast had a really difficult time getting this right and some people still say that they are not accurate. This illustrates that descriptive analytics does not necessarily mean simple counting and can involve tracking more complex pieces of the larger data set.

A more complicated type of analytics is predictive analytics. This is the process of not just analyzing the data, but then trying to make predictions about what might come next. The programs used to analyze the big data for this purpose use a number of statistical, modeling and data mining techniques to makes some sense out of the data. These techniques do not really predict the future, but rather look at existing and probable outcomes and calculate the percentage probability of different scenarios.

For example, you read all of the time how companies like Facebook or Google can figure out all sorts of things about you, such as whether you are an alcoholic or have insomnia or if you are just starting a new relationship. They do this by comparing data they have gathered on you to data from millions of other users. These companies look at your behavior, and when you start to resemble a known behavior pattern they used predictive analytics to start to fill in the gaps to paint a probable picture of you. For example, they will probably not know for sure that you are an alcoholic or have diabetes, but they can calculate the likelihood that you fit one of those known patterns.

This is the where the use of big data starts to concern many people. As the techniques used to analyze big data about people get better these companies might come to know more about you than you might know about yourself. For instance, I’ve read that Facebook is getting fairly good at predicting when relationships between couples are coming to an end. Most couples in this situation probably know this as well, but over time Facebook will probably get good at sensing this a lot sooner than the average person will be able to do. After all, people are sometimes very unaware of their own behavior patterns, but a company like Facebook, especially when combined with data gathered from other sources can paint a detailed and accurate picture of you.

The final kind of analytics is called prescriptive analytics and this takes the the trends and statistical possibilities found through predictive analytics are uses them to suggest solutions to problems. We are still a long way from trusting computers to use prescriptive analytics to solve specific problems. But already today we can uncover unsuspected trends in the analysis of big data and the computer can then suggest several solutions to fix those problems, and assign a statistical probability of the potential success of each solution. We are in the infancy of this process, but this is the hoped for end game from analyzing big data.

The Internet of Things is counting on success in the techniques of prescriptive analytics. In the near future there will be many more big data sets generated about each person from a number of sources like medical monitors, home security systems. location monitors and multiple other monitors in our lives. When these data sets are combined with the things we do such as write emails, search web sites, text our friends, there will be a detailed set of data created about each one of us.  For example, let’s say that we feel queasy one evening. Big data will be able to suggest that this might have been due to the fact that we walked close to glen full of oak trees in full pollen that afternoon or that it might have come from the shrimp we had for lunch and that a few other people who ate at that restaurant are experiencing the same feeling. Big data will be able to correlate the things that happen to us to what is happening in the wider world.

The average person is going to experience the results of big data by having something that seems like a self-aware assistant, or at least a set of programs that seem to be aware. These programs will track everything we do and will give us a whole new set of tools to understand ourselves and to control our personal world better. But these same big data sets could also be used by others to know things about us that we want to keep private. Probably the scariest thing about this kind of analytics is that everybody has secrets they would prefer to not reveal and these analytics tools can go a long way towards uncovering these little secrets we all keep.Today we are still exploring the techniques that will help us make sense of big data, but as that starts working we are also going to have to find ways to protect our privacy.

Leave a Reply