The world generates data at an astonishing rate—about 2.5 quintillion bytes each day. Big data is huge and it is an emerging fascinating area.
But what is big data? That’s a good question. There seems to be as many definitions for big data as there are businesses, nonprofit organizations, government agencies, and individuals who want to benefit from it.
One popular interpretation of big data refers to extremely large data sets. A 2011 McKinsey study defined big data as “datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” Some have defined big data as an amount of data that exceeds a petabyte—one million gigabytes.
Another definition for big data is the exponential increase and availability of data in our world.
This data comes from myriad sources: smartphones and social media posts; sensors, such as traffic signals and utility meters; point-of-sale terminals; consumer wearables such as fit meters; electronic health records; and on and on.
Buried deep within this data are immense opportunities for organizations that have the talent and technology to transform their vast stores of data into actionable insight, improved decision making, and competitive advantage.
By harnessing the power of big data, healthcare systems can identify at-risk patients and intervene sooner. Police departments can predict crime and stop it before it starts. Retailers can better forecast inventory to optimize supply-chain efficiency. The possibilities are endless.
But to fulfill this promise, organizations need qualified professionals with the skills to extract meaning from the mountains of data—and these elusive data scientists are in short supply.
The “Three Vs” of big data
In 2001, industry analyst Doug Laney defined the “Three Vs” of big data:
The unprecedented explosion of data means that 90 percent of all the world’s data has been created in the past two years. The global digital universe now exceeds 2.7 zettabytes, and volume is expected to double every two years. Today, the challenge with data volume is not so much storage as it is how to identify relevant data within gigantic data sets and make good use of it.
Data is generated at an ever-accelerating pace. Every minute, 571 new websites are created. Email users send 204,166,667 messages. Google receives more than two million search queries. The challenge for data scientists is to find ways to collect, process, and make use of huge amounts of data as it comes in.
Data comes in different forms. Structured data is that which can be organized neatly within the columns of a database. This type of data is relatively easy to enter, store, query, and analyze. Unstructured data is more difficult to sort and extract value from. Examples of unstructured data include emails, social media posts, word-processing documents; audio, video and photo files; web pages, and more.
Beyond the big three Vs
More recently, big-data practitioners and thought leaders have proposed additional Vs:
This refers to the quality of the collected data. If source data is not correct, analyses will be worthless. As the world moves toward automated decision-making, where computers make choices instead of humans, it becomes imperative that organizations be able to trust the quality of the data.
Data’s meaning is constantly changing. For example, language processing by computers is exceedingly difficult because words often have several meanings. Data scientists must account for this variability by creating sophisticated programs that understand context and meaning.
Data must be understandable to nontechnical stakeholders and decision makers. Visualization is the creation of complex graphs that tell the data scientist’s story, transforming the data into information, information into insight, insight into knowledge, and knowledge into advantage.
How can organizations make use of big data to improve decision-making? A McKinsey articleabout the potential impact of big data on health care in the U.S. suggested that big-data initiatives “could account for $300 billion to $450 billion in reduced health-care spending, or 12 to 17 percent of the $2.6 trillion baseline in US health-care costs.” The secrets hidden within big data can be a goldmine of opportunity and savings.
Bringing it all together
No matter how many Vs you prefer in your big data, one thing is sure: Big data is here, and it’s only getting bigger. Every organization needs to understand what big data means to them and what it can help them do. The possibilities really are endless.