Recently, whenever I attend meetings or read journal articles, magazines, newspapers, or blogs, I see reference to big data. They have become the "flavor of the month" (or longer?). The real question in my mind is two-fold:
While we might argue what constitutes big data (versus regular data), I would suggest the answer is different for different organizations. In my mind, they involve handling datasets that are larger than our organization's current analytical capabilities can meaningfully interpret.
The topic of big data is of particular interest right now because we are getting started with the 2015–2016 revisions to the Baldrige Criteria for Performance Excellence and need to come to grips with a way to handle the big data concept in those revisions. What are the critical aspects of big data that are applicable to all sizes and types of organizations and that reflect the Criteria's aim of assessing performance through concepts at "the leading edge of validated management practice"?
In this Insights column, I will explore the topic of big data from five perspectives: what are big data, what are their uses and promise, what are the challenges and risks, what are the opportunities, and what does this all mean for the Baldrige Criteria.
The McKinsey Global Institute defines big data as "datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze." This seems to be a generally accepted classification of big data. It does not use a definition based on a certain number of exabytes (approximately 1,000,000,000,000,000,000 bytes or one billion gigabytes) because it is assumed that as technology advances the size of big data datasets will grow. Furthermore, the size of a dataset that an organization can analyze with existing tools will vary by organization and its IT resources. Therefore, definitions of big data will likely vary by organization.
Even today, the numbers associated with big data are staggering. According to McAfee and Brynjolfsson, in a 2012 Harvard Business Review (HBR) article, 2.5 exabytes of data are generated each day. It has been estimated that Walmart alone collects more than 2.5 petabytes (2.5 quadrillion bytes or 2.5 million gigabytes) of data every hour from its customer transactions. In 2010, it cost $600 to buy a disk drive that could store all of the world's music. According to the Library of Congress, in 2011, the information it stored totaled 235 terabytes or, put in other terms, one exabyte is more than 400 times the information that was stored in the Library of Congress.
Big data are generally characterized by three or four v's: volume (sheer size of datasets), variety (heterogeneity of data—text, images, videos, databases, geolocation information, etc.), velocity (rate at which data arrive and rate at which they must be analyzed), and veracity (trust and uncertainty).
But here is the real challenge. In a TDWI report on making big data accessible to executives, the author writes, "Big data are like the egg in an omelet." Big data are the bulkiest ingredient in any recipe for big data insight, but the ingredients need a lot of work before the dish can be enjoyed. Like the eggs, the data must be cracked, whipped into shape, and cooked before a meal can be consumed.
According to a 2013 Wall Street Journal report, the uses of big data to date have been fairly conventional and do not involve much blending of data from different sources. Typical uses have included analysis of customer transaction data; process monitoring and improvement using such things as machine, plant productivity, and sensor data; the mining of social media data for customer sentiment; the tracking of flu outbreaks based on search engine regional data; and targeted marketing to enhance customer experiences.
The potential for big data use is limited only by our imaginations. And that potential includes significant opportunity for abuse of freedoms and privacy. Here are some examples of potential uses gleaned from various sources:
On a big-picture scale, large macro-economic analyses on blended data sets could (according to McKinsey)
Based on what I have read and learned, I would summarize the challenges to use of big data into nine major areas and the risks into four groups. The challenges will be either overcome or handled through innovative and incremental solutions. The risks will be ongoing and require constant attention. Let's start with the challenges:
And here are the four groups of risks:
Looking beyond the illustrative uses for big data that are discussed above, the opportunities for whole new fields of scientific endeavor and human benefit are not fully fathomable today but are unbelievably exciting. These potential opportunities led to a "community white paper" by leading researchers across the United States.
Think of the potential unleashed by relational datasets that cross genomics, chemistry, mathematics, and engineering to develop totally new fields of study and benefit for humankind. Think of the potential for models of global economics that prevent large-scale poverty in any country. The human genome project was undertaken at the extreme infancy of big data, and it has already created tremendous opportunities and promising scientific and health care breakthroughs.
The future for big data is exciting!
In my opinion, the role for the Baldrige Criteria, at this time, is in the enablers, opportunities, and risks associated with big data and not in requiring implementation of big data strategies. Many of these enablers, opportunities, and risks are already addressed in the Criteria because of their wider applicability than just for big data, but some additional emphasis or commentary may be appropriate. The future competitive advantage that will flow from big data is not based on the datasets collected but on the analytics performed, the conclusions drawn, and the actions—including intelligent risks—pursued.
In a recent HBR blog, Tom Davenport writes about the needs to make big data projects succeed. Many of these needs align with large project management in general and relate to management systems considerations: formation of effective cross-functional teams, good change management practices as organizational opportunities are gleaned or deduced, clear business objectives, and good project management skills. The path from big datasets to strategic advantage is an organizational effort that involves many areas of expertise and everyone from front-line employees to senior management.
The Criteria need to address (and already do address) data confidentiality, accuracy, verification, and access. Cybersecurity is of growing importance for many reasons, including the new challenges from big data.
Big data and the analytical conclusions drawn add to the ethical considerations that organizations must already address. The results of analytics could reveal information that is very sensitive for people and companies.
While "big data" is the current buzzword, in the end it is all about the foundational category of the Baldrige framework, Measurement, Analysis, and Knowledge Management (category 4), and how knowledge is turned into strategic insights and advantage.
Baldrige Excellence Framework
Baldrige Excellence Builder
Bleeding Edge or Leading Edge? (January 2014)
First, Put a Stake in the Ground (April 2014)