Big Data. Maybe you’ve heard the name? On the surface it’s a simple enough concept — every day, incredible amounts of data are being produced and, increasingly, organizations need better strategies for dealing with it.
But how much data is “big”, what should you do with it, and why should you care?
Let’s start with the why. At its heart, Big Data is the analyses of large amounts of varied, disparate data to help you better understand complex systems. The hope is that there’s “gold” hidden within the deluge of information that will help you make better decisions…if you can extract it from all the “noise.”
“Almost every aspect of government, from how we fight crime, to how we take care of our kids, to how we fix potholes will change as a result of analytics,” said Stephen Goldsmith, former mayor of Indianapolis, Indiana, in a recent interview. “We have a chance to make government more targeted, efficient and effective at solving problems before they occur.”
NOTE: Goldsmith’s point is this: if we ask the right questions of all this data, we’ll get valuable answers that positively shape the way we work.
Organizations have been analyzing digital data for decades; why do we need new strategies now? The answer comes with the three “V”s of Big Data: Volume, Velocity and Variety.
In 2010, Eric Schmidt, Google CEO said, “We create as much information in two days now as we did from the dawn of man through 2003.” That is five exabytes of data per day. And that was 3 years ago.For 2011, estimates put the number at 1.8 zettabytes, or 1 trillion gigabytes, of data.
Relax, you probably don’t have a zettabyte of data that slipped behind a filing cabinet. However, the average state and local agency does store approximately 499 terabytes of data. Needless to say, there is more data being produced and stored than ever before and that is just one of challenges Big Data aims to solve.
Gartner, Inc., a well-known IT Research firm, uses the term Velocity to describe how fast data is being produced and how fast that data must be processed. So not only is there a large amount of data being generated, it’s also happening at breakneck speed. So if you are unable to analyze and detect patterns in the data quickly, the situation will have changed by the time you are ready to act upon it.
Just how fast is data being produced?
- Twitter users create up to 25,000 tweets per second.
- The US Stock Exchange averages 28,000 trades per second.
- 20 hours of video uploaded every minute.
So, whether you’re talking log data from sensors or video from traffic cameras, it is all valuable data that’s ripe for analysis.
Data can be found in many places. The first sources that come to mind are those held within structured relational databases. But that only accounts for about 15% of the data available to an organization. The rest is what Big Data considers unstructured — tweets, video, audio, SCADA log data, instant messages…the sources are endless.
Traditional analytics focuses on the 15% of data whose structure and relationships are well understood. However, Big Data suggests that, although structured data is important, it’s tools that allow you to include the 85% of unstructured data in your analysis that truly unlock the value of your data.
It’s important to remember that the payout from Big Data analytics depends on the quality of data you have. That means identifying important data and collecting it as soon as possible. The more historical data you have the clearer the emerging patterns will be.
A recent report suggests that only 2% of state and local agencies have a clear Big Data strategy. So if this topic hasn’t made it to the top of your to-do list, you aren’t alone. Even if you don’t have a well-defined Big Data strategy today, take time to think about the data you should be collecting. That way, when you dive in to Big Data you’ll get the best results possible.
Editor's note: This article was originally published on 8/28/13
Stay in the know: sign up to receive the Cartegraph newsletter!