Recently, whenever I attend meetings or read journal articles, magazines, newspapers, or blogs, I see reference to big data. They have become the "flavor of the month" (or longer?). The real question in my mind is two-fold:
- How will organizations and governments manage big data?
- How will they properly and appropriately use them?
While we might argue what constitutes big data (versus regular data), I would suggest the answer is different for different organizations. In my mind, they involve handling datasets that are larger than our organization's current analytical capabilities can meaningfully interpret.
The topic of big data is of particular interest right now because we are getting started with the 2015–2016 revisions to the Baldrige Criteria for Performance Excellence and need to come to grips with a way to handle the big data concept in those revisions. What are the critical aspects of big data that are applicable to all sizes and types of organizations and that reflect the Criteria's aim of assessing performance through concepts at "the leading edge of validated management practice"?
In this Insights column, I will explore the topic of big data from five perspectives: what are big data, what are their uses and promise, what are the challenges and risks, what are the opportunities, and what does this all mean for the Baldrige Criteria.
What Are Big Data?
The McKinsey Global Institute defines big data as "datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze." This seems to be a generally accepted classification of big data. It does not use a definition based on a certain number of exabytes (approximately 1,000,000,000,000,000,000 bytes or one billion gigabytes) because it is assumed that as technology advances the size of big data datasets will grow. Furthermore, the size of a dataset that an organization can analyze with existing tools will vary by organization and its IT resources. Therefore, definitions of big data will likely vary by organization.
Even today, the numbers associated with big data are staggering. According to McAfee and Brynjolfsson, in a 2012 Harvard Business Review (HBR) article, 2.5 exabytes of data are generated each day. It has been estimated that Walmart alone collects more than 2.5 petabytes (2.5 quadrillion bytes or 2.5 million gigabytes) of data every hour from its customer transactions. In 2010, it cost $600 to buy a disk drive that could store all of the world's music. According to the Library of Congress, in 2011, the information it stored totaled 235 terabytes or, put in other terms, one exabyte is more than 400 times the information that was stored in the Library of Congress.
Big data are generally characterized by three or four v's: volume (sheer size of datasets), variety (heterogeneity of data—text, images, videos, databases, geolocation information, etc.), velocity (rate at which data arrive and rate at which they must be analyzed), and veracity (trust and uncertainty).
But here is the real challenge. In a TDWI report on making big data accessible to executives, the author writes, "Big data are like the egg in an omelet." Big data are the bulkiest ingredient in any recipe for big data insight, but the ingredients need a lot of work before the dish can be enjoyed. Like the eggs, the data must be cracked, whipped into shape, and cooked before a meal can be consumed.
What Are the Uses and Promise of Big Data?
According to a 2013 Wall Street Journal report, the uses of big data to date have been fairly conventional and do not involve much blending of data from different sources. Typical uses have included analysis of customer transaction data; process monitoring and improvement using such things as machine, plant productivity, and sensor data; the mining of social media data for customer sentiment; the tracking of flu outbreaks based on search engine regional data; and targeted marketing to enhance customer experiences.
The potential for big data use is limited only by our imaginations. And that potential includes significant opportunity for abuse of freedoms and privacy. Here are some examples of potential uses gleaned from various sources:
- Analytics that use blended customer data to optimize pricing across consumer life cycles with a product or to optimize marketing spending by predicting areas where product promotions will be most effectiveDevelopment of vacation packages optimized for age group, time of day for marketing them, and media to be used (e.g., web ads, tweets, television or radio spots)
- Use of real-time data from mobile phones to track shoppers' movements through a mall, analyze movements for behavioral patterns or the prediction of intentions, and use the data in targeted marketing or (short-term) price-setting
- Tracking of personal health issues, for example, by using mobile phone data to track presence in a cancer treatment center or looking at web pages visited. Similarly, tracking or observing religious preferences or political leanings by looking at web pages visited or phone calls made
- Embedded microchips that transmit product-use information combined with social media data on product use to improve development of next-generation products and to create after-sales service offerings, possibly targeted to the individual customer
On a big-picture scale, large macro-economic analyses on blended data sets could (according to McKinsey)
- Drive efficiency and quality in U.S. health care, with a potential value of $300 billion a year, two-thirds of which would come from reducing national health care expenditures by 8 percent
- Generate a 60 percent increase in operating margin for retailers that use the full potential of big data
- Save $149 billion in government administration in the developed economies of Europe through operational efficiency improvements
What Are the Challenges and Risks of Big Data?
Based on what I have read and learned, I would summarize the challenges to use of big data into nine major areas and the risks into four groups. The challenges will be either overcome or handled through innovative and incremental solutions. The risks will be ongoing and require constant attention. Let's start with the challenges:
- Heterogeneity—Data will come in all forms and need to be analyzed to generate purposeful, actionable information. Data will be hard numbers, text, images, and video. Accommodating these data will require new technology (hardware and algorithms for analytics).
- Scale—The sheer volume of multi-modal data will generate challenges, beginning with simple decisions of keeping or discarding data, and using or not using them in a particular analysis. How will we determine "exhaust data"?
- Timeliness—How do you collect, analyze, and capitalize on data with speed? Customer-relevant uses may be for brief periods only—before the customer's attention moves on.
- Human Collaboration—Producing timely, useful information will require collaborations across an organization, possibly among groups that have never collaborated before. Depending on the purpose of the effort, producing this information may also require collaboration among organizations. Typical teams may involve computer programmers, product engineers, sales, marketing, sociologists, strategic planners, etc.
- Access—Needed or desired data may have multiple sources and multiple owners, inside and outside your organization. These data may well be structured differently and need to be assimilated into a single "dataset" for analysis. In some cases, provenance or ownership may not even be clear.
- Accuracy—The accuracy of both the input data and the analytics must be guaranteed, so that erroneous conclusions are not drawn or, worse, there isn't a major expenditure of funds with no benefit derived or even negative impact.
- Visualization—With large amounts of data digested and analyzed, understanding and interpretation may involve new ways to visualize the data and analytical outcomes.
- Incentive—Industries that lack competitive pressures may delay implementation of data analytics, even though customers might benefit and efficiencies could be achieved. Examples of this group of industries include government, public education, and niche market businesses.
- Privacy—This last challenge straddles the domains of challenges and risks. The challenge is to develop the processes and technology to protect the privacy of privileged information of all types, whether "company proprietary," customer/supplier proprietary, or the personal information of employees or customers.
And here are the four groups of risks:
- Privacy—Let's start with the risk aspects of the last challenge. The key risk is the loss of privileged information, personal and otherwise, about employees, customers, and companies. This information is not only information in individual datasets but potentially more significant conclusions drawn from data analytics. The analytical determinations could also pose ethical concerns about the release of conclusions that might be extrapolations from data.
- Security—The security risk goes beyond breaches of privacy to the protection of assets, even if external breaches do not occur. Security also deals with "need-to-know" aspects of sensitive data. It deals with datasets stored outside the organization's control—in the cloud, for example.
- Intellectual Property (IP)—This risk relates to the protection of organizational IP, as well as IP given to the organization for use by suppliers, partners, and customers.
- Liability—This risk deals with the financial and reputational aspects of big data breaches.
What Are the Opportunities for Big Data?
Looking beyond the illustrative uses for big data that are discussed above, the opportunities for whole new fields of scientific endeavor and human benefit are not fully fathomable today but are unbelievably exciting. These potential opportunities led to a "community white paper" by leading researchers across the United States.
Think of the potential unleashed by relational datasets that cross genomics, chemistry, mathematics, and engineering to develop totally new fields of study and benefit for humankind. Think of the potential for models of global economics that prevent large-scale poverty in any country. The human genome project was undertaken at the extreme infancy of big data, and it has already created tremendous opportunities and promising scientific and health care breakthroughs.
The future for big data is exciting!
What Does This Mean for Baldrige?
In my opinion, the role for the Baldrige Criteria, at this time, is in the enablers, opportunities, and risks associated with big data and not in requiring implementation of big data strategies. Many of these enablers, opportunities, and risks are already addressed in the Criteria because of their wider applicability than just for big data, but some additional emphasis or commentary may be appropriate. The future competitive advantage that will flow from big data is not based on the datasets collected but on the analytics performed, the conclusions drawn, and the actions—including intelligent risks—pursued.
In a recent HBR blog, Tom Davenport writes about the needs to make big data projects succeed. Many of these needs align with large project management in general and relate to management systems considerations: formation of effective cross-functional teams, good change management practices as organizational opportunities are gleaned or deduced, clear business objectives, and good project management skills. The path from big datasets to strategic advantage is an organizational effort that involves many areas of expertise and everyone from front-line employees to senior management.
The Criteria need to address (and already do address) data confidentiality, accuracy, verification, and access. Cybersecurity is of growing importance for many reasons, including the new challenges from big data.
Big data and the analytical conclusions drawn add to the ethical considerations that organizations must already address. The results of analytics could reveal information that is very sensitive for people and companies.
While "big data" is the current buzzword, in the end it is all about the foundational category of the Baldrige framework, Measurement, Analysis, and Knowledge Management (category 4), and how knowledge is turned into strategic insights and advantage.