Doug Laney

A member of the Gartner Blog Network

Doug Laney
VP Research, Business Analytics and Performance Management
1 years at Gartner
25 years IT industry

Doug Laney is a research vice president for Gartner Research, where he covers business analytics solutions and projects, information asset valuation and management, "big data" strategy, and data-governance-related issues. ...Read Full Bio

Who Owns (really owns) “Big Data”

by Doug Laney  |  April 3, 2013  |  3 Comments

Given all the hype over Big Data and concerns of data ownership, I thought it would be interesting to explore who actually owns Big Data, no I mean really owns “big data.” Yes, the trademark. Next stop, the United States Patent and Trademark Office online database.

Talk about Big Data. The database contains a treasure trove of over 8 million patents and 16 million filings dating back to Samuel Hopkins’s 1790 registered process of making potash, an ingredient used in fertilizer (signed by President George Washington no-less), and the oldest active trademark, SAMSON, registered for a brand of rope in 1884, among the nearly 3 million trademarks. With almost 200,000 patent applications and 100,000 trademark applications a year and growing, so are the ranks of the examiners–almost doubling since 2005.

But back to “Big Data”. The term has been in use since at least the mid 1990′s, seemingly coined by Silicon Graphics chief engineer John Mashey who gave a seminar entitled “”Big Data & the Next Wave of InfraStress.” However since he never trademarked it, who did?

Those of you pioneers in data warehousing will remember a boutique consulting firm, often joined at the hip with Teradata, based in Chicago called Knightsbridge Solutions. Knightsbridge specialized in building large databases and data warehouses before it was absorbed into HP. On January 9, 2001, a Knightsbridge attorney filed the trademark and “big data” became a US citizen or whatever. However, they must have liked the term about as much as most of the industry does today (despite its popularity), as they abandoned the trademark less than a year later.

It wasn’t until ten years passed that an enterprising man in Texas reclaimed it only to abandon it again months later. Poor Big Data! It’s been declared dead twice even before it slides into the Gartner® Hype Cycle™ Trough of Disillusionment™.  Not to worry, a fledgling VC called Big Data Boston Ventures nabbed the mark last summer. Until they launch, it seems to be the only asset in their portfolio.

Good news for those of you feeling like you missed the boat, there are plenty of variants still available. The USPTO site lists only 44 related marks including clever ones such as “Bigdata”, “Making Big Data Small”, “Big Data for the Little Guy”, “Rocket Fuel for Big Data Apps”, “Dominating Big Data”, “Wala! Big Data Simplified”, and my personal favorite that integrates large information and lager libation: “Big Data on Tap.”

Here’s to you Big Data!  You’ve made your mark.

Follow Doug on Twitter: @Doug_Laney

3 Comments »

Category: Uncategorized     Tags: , , ,

New Year’s Resolution: Innovate with Information

by Doug Laney  |  December 26, 2012  |  2 Comments

2012 has seen an acknowledgement and mainstream awareness of the challenges of managing the burgeoning streams of information generated and available to organizations, particularly big data. In 2013, I expect the focus to shift to the challenges of developing and implementing enterprise strategies for making use of all this data.

Opportunities abound for deploying information in transformative ways. Gartner’s 2013 research agenda will help IT and business leaders develop and execute strategies for achieving higher returns on their information assets. This includes leveraging big data, enhancing analytic capabilities, achieving more disciplined information asset management approaches, and incorporating new and expanded information-related roles:

The volume, velocity, and variety of information sources available today to organizations is more than just an information management challenge. Rather this phenomenon represents an incredible opportunity to improve enterprise performance significantly and even transform their businesses or industries. More than merely reporting on information or even basic decision making support, information assets are an instrument for innovation. Making this strategic shift quickly enough to meet create competitive advantage is the real challenge for most businesses.

Key issues Gartner will be exploring throughout the coming year are also questions business and IT leaders should be asking themselves:

Business uses and sources of information

  • What are the range of internal and external sources of data available, starting with our own underutilized “dark data”?
  • How can information be used, not just for decision-making, but for greater business insights and process automation?
  • How can information be used to foster relationships and improve collaboration with our employees, partners, customers and/or suppliers?
  • How can information facilitate business transformation and innovation, beyond just incremental performance improvements?
  • How can information be monetized by packaging, sharing and/or selling it?
Information leadership
  • How can we evolve to a more information-centric culture?
  • How can our IT and business groups organize for achieving higher levels of information performance?
  • What emerging information-related skills and methods should be considered, planned for, used or acquired?

Big Data

Value and economics of information (infonomics)

  • Why should and how can we inventory, measure and quantify their information assets?
  • How can information’s value be used to justifying and gauge the ROI of information-related initiatives, as well as other IT and business initiatives?
  • How can we manage information as an actual corporate asset?

So if you’re looking to make a corporate New Year’s resolution to do more with data for driving corporate value, consider developing answers to each of these questions. And keep an eye on Gartner’s Information Innovation research throughout 2013.

Follow Doug on Twitter: @doug_laney

 

2 Comments »

Category: Uncategorized     Tags: , , , , , , , , ,

Gartner Shares Findings from North Pole Inc. Big Data Assessment

by Doug Laney  |  December 20, 2012  |  7 Comments

Going into the 2012 holiday season, North Pole Inc. (ticker: XMAS), the leading global distributor of presents to good girls and boys, called upon Gartner to assess and advise on its information related needs and opportunities. 

STAMFORD, Conn., December 18, 2012—

Over the past quarter, Gartner was given exclusive access to the operations and information systems of North Pole Inc. (NPI), to help it set a strategic path for improved information management and analytic capabilities. For nearly two centuries NPI has struggled to support its growing operation and respond proactively to competitive pressures through the use of emerging technologies and best practices.

“We do a jolly good job year after year,” claims NPI’s Founder and CEO, Santa Claus, “but I have really put the pressure on my IT management team to achieve better efficiencies and creatively use information to innovate.”

As a long-time Gartner client, NPI has read about how other enterprises have selectively adopted information technologies, embraced new architectures and approaches, and acquired the necessary skills. “Now it’s our turn,” exclaimed NPI’s CIO Frederick Ellefsen. “We’ve heard a lot about the term ‘big data’ and the significant opportunities indicated by the confluence of mobile, cloud, social and information—Gartner’s Nexus of Forces—so we didn’t want to be left out in the cold, so to speak.”

“This is a unique opportunity for Gartner to be exposed to the inner workings of one of the world’s most secretive yet successful enterprises,” said Peter Sondergaard, Gartner SVP Research. “We were pleased to be able to offer our services and insights to NPI.”

Gartner’s review of NPI’s systems revealed an operation not too dissimilar to other distributors and some major retailers, but on a much larger scale. However due to NPI’s unique legal status  it has no finance department, nor does it have a sales or marketing function.

North Pole Inc. Operations

Figure 1 - North Pole Inc. Operations

Santa’s Systems Portfolio

Key systems in NPI’s portfolio manage orders, inventory, quality testing, elfin performance and activities, along with tracking human behavior, correspondence, wish lists and contact information, and also environmental impact data. To achieve NPI’s objective of managing and leveraging information as an actual enterprise asset, Gartner first completed an inventory of the NPI’s extensive wealth of information assets:

  • Toy Order Management System (“Tommy”) – Toy orders and order tracking of 5.5 billion orders; supplier and 2nd level supply chain and parts level visibility of 4.6 million suppliers; toy orders and order tracking of 5.5 billion orders
  • Toy Inventory Management System (“Timmy”) – Receiving and inventory data on 6.9 billion toys
  • Toy Assurance Management System (“Tammy”) – Test results and repairs/returns data on all toys received (average of three safety and quality tests per toy) totaling 21 billion tests annually
  • Content system for Relations, Inbound Gift Request and Letters (CRINGLE) – Processing, scanning, content extraction and analysis of 6.5 million letters, emails and calls, and recording 19.5 million gifts requested
  • Naughty or Nice Information Tracking System (NITS) – Processing and tagging of 16.8 trillion person-to-person interactions throughout the year
  • Scheduling, Logistics and Expedited Distribution System (SLEDS) — Handling of 500,000 appearance requests and 280,000 actual mall and other appearances; the operation of 7700 gift express hubs and the logistics and maintenance of the half-million sleighs servicing them; and night-of-delivery (NOD) routing
  • Kontact Information & Directory System (KIDS) – Basic contact, rooftop and chimney configuration information on 2.3 billion gift recipients and their 880 million households
  • Helper Organization, Operations and Orchestration (HO-HO-HO) – Scheduling and coordination of elf workforce job responsibilities and activities; also coordinates elf housing and food service
  • Job Information, Guidance, Learning & Elf Management System (JINGLES) – General elf resource (ER) system for tracking the performance, benefits and training activities of 230 million elves, along with ongoing recruiting activities
  • Study for Negating the Outcome of Warming (SNOW) – A longitudinal study as part of NPI’s sustainability efforts. Millions of climate, atmospheric, emissions, deforestation, and animal and human population data points are collected annually to help NPI achieve its target of carbon neutrality by 2020

[See bottom of article for North Pole Inc. Core Data Requirements and Database Sizing]

Data Quality as Pure as the Driven Snow

Due to impeccable data governance and quality processes, a world-class master data management program, an impressive team of data elves, robust data quality technology, and unwavering executive-level commitment and involvement, NPI’s information assets show no signs of significant completeness, accuracy, integrity or other quality issues according to sample data profiling using Gartner’s data quality assessment toolkit.

Analytic Opportunities Beyond Just “Naughty or Nice”

From a business intelligence perspective, Gartner found that NPI is lagging others in the shipping and distribution industry. Its enterprise data warehouse , called “Chimneys”, is really a collection of stovepipe query and reporting systems, some still relying on first-generation BI tools like Red Brick. Gartner recommended evolving to a logical data warehouse architecture for most low-frequency queries to enable more insightful cross-functional, federated analytics.

Some predictive analytics is done to select appropriate toys based on NITS behavior modeling, demographics and prior-year presents. Gartner recommended that this system be enhanced to account for factors such as sibling response, damage/loss propensity, and social content analysis. NPI however is working on mobile-enabling Santa in the field during mall appearances so he can advise on toy availability and alternatives (as necessary) in real-time while a child is on his lap. This system is expected to be in place for the 2013 holiday season. Gartner analysts pointed out that this new capability would also require enhancing its “Tommy” toy order management system to capture full catalog and supply chain information from its suppliers. Today NPI only maintains this tracking data on actual orders.

Although NPI does a great job of social media participation, including a multi-channel Twitter strategy (i.e. @santa, @officialsanta, @santaclaus, @santa_claus, etc.), Gartner recommended that NPI begin tapping and analyzing social media streams. Social sentiment analysis will help NPI identify emerging “hot toys” for pre-ordering, and identify early warning signals of quality-related issues. NPI also took into consideration the idea of integrating global economic data to better focus its gift giving on those in the greatest need. However, NPI like many organizations is struggling to hire or train a team of data scientists. “Advanced analytics just isn’t a core elfin competency,” lamented Mr. Ellefsen. “We’re definitely going to have to fly up outside talent for a period of time.”

Operational Efficiency at Times Glacial

Gartner also advised NPI on how to consolidate its ordering process and information. Since the late 1970s, NPI has being consolidating inbound shipments using its gift express hubs scattered secretly in forests around the world. However it still orders and inventories gifts from suppliers one-by-one. “Our ‘Tommy’ system is definitely outmoded,” admitted Mr. Ellefsen. With sophisticated demand analysis, order pattern matching and smart RFID-enabled inventory management, Gartner believes NPI could save 70-80% of its current TOM processing expense.

No More Cookie Cutter Approaches to Data Management

Regarding the human behavior tracking system (NITS), Gartner suggested that in today’s world perhaps both online interactions (text, email, social media) and human-to-animal interactions should also be captured and tagged as “naughty” or “nice”, and that a broader 5-point Likert scale or automated video/audio analysis might improve measurement precision. NPI is obviously concerned by the size and performance of this already 168 terabyte system, but will be looking into HDFS or other NoSQL alternatives to support expanded tracking ideas. “For obvious reasons, we got away from inverted tree data management structures years ago,” Mr. Ellefsen chuckled.

Gartner and NPI also discussed a long-term cloud strategy. But with over 200 terabytes of online operational data, austere personally identifiable information (PII) privacy and security requirements, and spotty connectivity at its arctic headquarters, Gartner recommended that at this time NPI only consider hosted data solutions for its 7700 gift express hubs.

A Big Sack of New Ideas for Big Data

During the “Workshop at the Workshop” session as it was called, Gartner and NPI generated many innovative ways to use information, including:

  • selecting toys that would encourage naughtier kids to be nicer
  • putting de-identified data online for suppliers to analyze
  • realtime NOD (night of delivery) routing and navigation via integrated weather, GPS and air traffic data to optimize Santa’s 10,200 takeoffs, landings and deliveries per second.

However the entire NPI management team was quick to squash the subject of transitioning to an outsourced, mobile-enabled parental workforce. “Elves have magical capabilities beyond those of most humans,” Mr. Claus interrupted, “Not to mention a tremendously strong union.”

 

Contact

Doug Laney, VP Analytics and Information Management
Gartner
Twitter: @doug_laney


 

About Gartner:
Gartner, Inc. (NYSE: IT) is the world’s leading information technology research and advisory company. Gartner delivers the technology-related insight necessary for its clients to make the right decisions, every day. From CIOs and senior IT leaders in corporations and government agencies, to business leaders in high-tech and telecom enterprises and professional services firms, to technology investors, Gartner is the valuable partner to clients in 12,000 distinct organizations. Through the resources of Gartner Research, Gartner Executive Programs, Gartner Consulting and Gartner Events, Gartner works with every client to research, analyze and interpret the business of IT within the context of their individual role. Founded in 1979, Gartner is headquartered in Stamford, Connecticut, U.S.A., and has 5,000 associates, including 1,280 research analysts and consultants, and clients in 85 countries. For more information, www.gartner.com.

______________________________________

North Pole Inc. Core Data Requirements and Database Sizing*

* For non-believers, these data sizings were derived from various sources: Population data used to determine the number of worldwide Christians (2.3B) and Christian households (884M) is from the US Census, the Catholic Education Resource Center, the Christian Post, and the the Global Population Clock. The average number of presents from Santa (3, excluding stocking stuffers) is  from Babycenter.com and CircleofMoms.com. The number of person-to-person interactions (20/day) for calculating the volume of “naught/nice” data comes from the Tilted Forum Project on Humanity, Sexuality and Philosophy. The amount of correspondence Santa receives is from a Wired Magazine article (500K letters annually) and extrapolated to include emails and worldwide correspondence.  The number of toy makers (1547 in US) is from toydirectory.com and is extrapolated to include worldwide toy makers, suppliers and parts. The number of shopping malls (105,000 in US) is from the International Council of Shopping Centers. And package delivery, transportation and personnel numbers are extrapolated from public FedEx data.

 

7 Comments »

Category: Uncategorized     Tags: , , , , , , , , , , , , , ,

Mayan Big Data and Predictive Analytics

by Doug Laney  |  December 18, 2012  |  3 Comments

To understand the significance of December 21, 2012 to the Mayans (and today’s mass media) it’s necessary to recognize and understand the Mayan numbering system, theology and astronomical prowess.

First, the Mayan had two numbering systems which more-or-less are akin to our distinct decimal system for counting things, and our Gregorian system for counting dates. However, their numerical system is a base-20 vigesimal, not base-10 decimal system. This owes to the fact that they felt perfectly comfortable using their toes for counting, and relished the ability to represent petabyte-scale numbers like faraway dates efficiently. The downside of this and some unfortunate anomalies they introduced was that they never were able to master multiplication or division. Unlike the ancient Romans though, Mayan data modelers did invent a symbol for the number zero, which turns out to be an important part of the story.

However, unlike most of our cultures the Mayans also had two distinct calendar systems: the “Short Count” and “Long Count”.  The Short Count derives from a sacred count of 260 days known as the tzolkin munged with Venus’s relatively-protracted year. Although based in part upon astronomical observations, this calendar was purely for ritualistic purposes, still used by Guatemalan highlanders today, and bears no relevance to our imminent ominous occasion. The Long Count calendar is also based on astronomical observations and cycles, and multiples thereof.

The longest of the five nested Long Count cycles is the Baktun which is 144,000 days or about 400 years – interestingly the same as our present-day quadricentenial leap year cycle. The 13-Baktun “Great Cycle” spans 5125.36 years, completing (and iterating, I hope) on December 21, or 13.0.0.0.0 in Mayan nomenclature.

But why December 21st? What happened 5125 years ago on 0.0.0.0.0? The answer that has perplexed scholars until recently is: nothing. Nothing happened on that date—which happens to predate the Mayan civilization by some 3000 years. Unlike most modern-day cultures whose ethnocentric calendars begin on an important date in their own history, the Mayans saw themselves as part of a much bigger and longer picture…one of astronomical scale. It wasn’t until scholars determined that the date 13.0.0.0.0 coincides with a confluence of Mayan theology and rare astronomical events (due to the astrological precession caused by the slow wobbling of the Earth’s axis) that they realized the Mayan calendar is reverse-engineered.

After decades and centuries of data collection (i.e. ancient Big Data curating methods), the Mayan’s best data scientists projected that on December 21, 2012 the Sun’s ecliptic will pass through the center (“dark region” or “dark road”) of the Milky Way, not just on any old day, but on the Winter solstice. It is on this day that the Mayan’s depict their sun god Pacal (no relation to Blaise) traveling into the underworld to do battle with the lords of Xibalba.

So if you want to really impress someone this holiday season, wish them a Happy 14th Baktun or “May you have a renewed Great Cycle!”

Follow Doug on Twitter: @Doug_Laney

3 Comments »

Category: Uncategorized     Tags: , , ,

Tobin’s Q & A: Evidence of Information’s Real Market Value

by Doug Laney  |  August 15, 2012  |  4 Comments

Tobin’s q is a simple ratio first posited by Nobel-winning American economist James Tobin in the 1960s to understand the relationship between a company’s market value and the replacement value of its assets. Analysis shows that this quotient has been growing since financial statements were standardized following the Great Depression. Smoothing economic boom and bust cycles via linear regression, Tobin’s q has more than doubled from 0.4 in 1945 to a predicted 1.1 in any given year currently.

This means that in general markets now value companies more than the sum of their tangible assets. How can this be?  Non-reportable intangible assets of course.

We know that due to 75 year old accounting standards, certain intangibles cannot be valued and reported.  These unreportable intangibles frequently cited include human capital and intellectual capital. Yet, could these alone have doubled over seven decades? Do corporations of similar revenue have twice the number of employees they once did? No, quite the opposite as we’ve become more efficient and reliant on technology. Do humans have twice the knowledge capacity than we did back in the day?  Not only my teenager would fervently disagree with that.

Then what is it that companies have so much more of, has been accumulating for over half a decade, and that is hidden from balance sheets?

Information.

Ever since Arthur Andersen computerized a GE payroll plant in 1953, companies have become better and better at amassing information assets (leading up to this age of Big Data) and finding ways to leverage them. Yet the value of information isn’t quantified or reported in any way. Even today’s infocentric companies whose business models revolve around collecting, buying and selling data (e.g. Facebook, Google, Experian, Nielsen, etc.) have balance sheets devoid of their most valuable asset.

Furthermore, a study by intellectual capital research firm, Ocean Tomo, shows that the portion of corporate market value attributable to intangibles has grown from 17% in 1975 to a whopping 81% in 2010. Indeed, information accumulation has not only increased dramatically in businesses, but the importance of information itself has supplanted traditional assets in generating revenue, and therefore in contributing to market value as well.

So what are CEOs to do knowing that information comprises a majority of their corporate value?  First, forget what the accountants say, and listen to what the market is saying. Stop just talking about information as such an important asset and start valuing and managing it like one.

For further reading on the topic of infonomics:

To Facebook You’re Worth $80.95 (Wall Street Journal)
Infonomics-The Practice of Information Economics (Forbes)
Extracting Value from Information (Financial Times, free registration)
Introducing Infonomics (Gartner, client access)
Infonomics (Wikipedia)

Follow me on Twitter @doug_laney

4 Comments »

Category: Uncategorized     Tags: , , , , ,

Defining and Differentiating the Role of the Data Scientist

by Doug Laney  |  March 25, 2012  |  5 Comments

The research note, Emerging Role of the Data Scientist and the Art of Data Science, I authored with colleague Lisa Kart just hit the Gartner wires this week. Since most of the data scientist role dissenters  we come across seem to believe that the role’s title is is nothing more than a pretentious moniker for a statistician or business intelligence (BI) analyst, we decided to take an…er…scientific approach to making that determination. We thought it would be entirely fitting to perform text analysis of hundreds of job descriptions for “data scientist,” “statistician,” and “BI analyst” to learn what the commonalities and differences are according to those actually hiring for the the role.

Data Scientist Job Description Wordcloud

I’d like to believe that these findings led us to more clearly define and distinguish the role of the data scientist, without speculation, than anyone else to-date. Through our research we learned that data scientists are expected to work more in teams, have a comfort and experience with “big data” sets, and are skilled at communication. They also frequently require experience in machine learning, computing and algorithms, and are required to have a PhD nearly twice as often as statisticians. Even the technology requirements for each role differed, with data scientist job descriptions more frequently mentioning Hadoop, Pig, Python and Java among others.

The piece then goes on to define and describe the three core data science skills: data management, analytics modeling and business analysis. But beyond these, there’s an art to data science. We detail several soft skills that our research showed are also critical to success, i.e., communication, collaboration, leadership, creativity, discipline and passion (for information and truth).

With the need for data scientists growing at about 3x those for statisticians and BI analysts, and an anticipated 100,000+ person analytic talent shortage through 2020, we also included a listing of university programs around the world offering degrees in advanced analytics.

5 Comments »

Category: Uncategorized     Tags: , , , , ,

Big League Business Influence: The Super Bowl versus the Super PAC

by Doug Laney  |  February 6, 2012  |  1 Comment

Yesterday during the on-air buildup to the Super Bowl a reporter mentioned that over one billion people were expected to watch this year’s big game. It occurred to me how few of these individuals, including some Americans, fully understand what the Super Bowl really means.  The next news story was about Super PACs (a new form of political action committee), and it occurred to me how, despite Stephen Colbert’s best efforts, even fewer people understand what a Super PAC is. So for both fun and education I created a little side-by-side comparison of the Super Bowl (and American football) versus a Super PAC (and the American elections).

Super Bowl Super PAC
Enabled by antitrust exemption under the Sports Broadcasting Act of 1961 Enabled by expenditure exception under the revised Federal Election laws of 2010
Enables players to run for touchdowns Enables candidates to run for office
Money comes from citizens and businesses Money comes from citizens and businesses
Funds players’ lifestyles Funds candidates’ campaigns…and lifestyles
Pays for hysterical ads Pays for histrionic ads
Helps players get enshrined in Hall of Fame Helps a candidate get ensconced in Oval Office
Players communicate with fans through the media Candidates communicate with fund through the media
Fans can bestow with unlimited fame Fans can bestow with unlimited funding
As a result of their fame, many individual players become corporations As a result of the courts, laws don’t discriminate between individuals and corporations
Foreign teams not allowed to participate in US football Foreign businesses allowed to participate in US elections
Initial goal is winning a series of playoff games in multiple cities; ultimate goal is winning the national championship Initial goal is winning multiple primary elections in multiple states; ultimate goal is winning the general election
Offense wins games; defense wins championships Being offensive wins primaries; being on the defensive loses general elections
Halftimes are spectacular Debates are spectacles
Required to disclose injuries Required to disclose donors
Trash-talking Trash-talking
Players wear eye black Candidates get black-eyes
Players leave it all on the field for their teammates and fans Candidates leave a little left over for themselves
Coaches stand on the sidelines and call plays; quarterbacks audible Fund manager stands on the sidelines and call plays; candidates are audible
Players make a bit more money each playoff game they win Candidates raise a lot more money each primary election they win
Sports networks are the real winners News networks are the real winners


Ultimately the larger story for both the Super Bowl and Super PACs is about corporate influence. Super Bowl ads may be expensive, but the cost per second per viewer is on par with any other TV show. Moreover, due to social media these Super Bowl ads often take on a life in the Twittersphere, on YouTube and in Facebook after (and even before) they air, thereby enabling a business to reach a much larger audience than those viewing the ad when it aired. Many businesses also use the power of social media to actively engage potential customers by drawing them to their website or Facebook page. Think: Danica Patrick. Similarly, US elections are expensive, and reaching voters today also requires a social multichannel approach. Super PACs now provide the unbounded means for individuals and corporations from anywhere on the planet to influence US elections. So if your business wants to and has the financial means to reach a large swath of both consumers and voters, the Super Bowl and the Super PAC have got you covered.

1 Comment »

Category: Uncategorized     Tags: , , , ,

Highlights from Today’s #GartnerChat on Big Data

by Doug Laney  |  January 28, 2012  |  Comments Off

Today the Gartner Information Management and Analytics Community held its weekly Twitter Chat, (Tweetchat, Tweetjam, TweetUp, whichever you prefer) to discuss concepts around big data, the role of the data scientist, and data quality. Over a half dozen Gartner analysts shared their ideas and research. (Where else can you get access to that many Gartner analysts in one place at the same time?) And dozens more individuals from other organizations also shared their perspectives and questions.

Big Data—Hey What’s the Big Idea?

First we discussed whether “Big Data” is an animal, vegetable or mineral, concluding that it has become very much a marketing term. Gartner analyst Andy Bitterer (@bitterer) jabbed, “Is Big Data nothing but a marketing play, since many organizations had ‘big data’ for a long time?’ Tim Elliott (@timoelliott) concurred, stating that “new terms arise because of new technology, not new business problems.” Esteban Kolsky (@ekolsky) thought the term was a more specific “marketing word used to describe the incredible volume coming out of social [networks].”

Yves de Montcheuil (@ydemontcheuil) suggested that organizations “have had Big Data all along but couldn’t get value out of it, except with lots of $$$,” and Gartner analyst Doug Laney (@doug_laney) agreed with a quip about Big Data being relative: “Big Data is merely data that’s an order of magnitude greater than data you’re accustomed to…Grasshopper.”

Hadoop was mentioned more than a few times as both an enabler and also a driver of big data, with Mark Troester (@mtroester) summing it up that the “hype of Hadoop is driving pressure on people to keep everything.” Some suggested archiving or even unloading data that is unused, but John Haddad (@JohnM_Haddad) and Martin Schneider (@mschneider718) both reminded everyone that data retention may depend on industry regulations and government mandates.

Some inquired about how to finding value in data so Doug Laney offered that there are two sides to that equation: 1) “looking beyond basic BI to advance analytics” and 2) “quantifying data’s potential and actual value.” Doug also summarized one of Gartner’s strategic planning assumptions for 2012: “Through 2015, >90% of business leaders say info is a strategic asset, yet <10% will quantify its economic value.” Gartner analyst Merv Adrian (@merv) admittedly had some fun with the notion of hidden value in data, asking, “Would it be a bad thing for organizations to say ‘Maybe there is value in the dark fiber of our information fabric?’”

The Art of Data Science

This led into a discussion about data science and the realization of data value. Gartner analyst Ted Friedman (@ted_friedman) wrote that it’s “good that analytics roles are becoming key, but ‘data scientist’ is a little bit elitist IMO.” Esteban disagreed contending that the term “scientist is not elitist, it defines a specific role.” Gartner analyst Carol Rozwell (@CRozwell) responded by suggesting, “But shouldn’t the average person be able to derive value from data?…[even though] some people refuse to see the truth in data.”

Nenshad Bardoliwalla (@nenshad) contended that the need for data scientists may be overblown. He believes that “Purpose-built apps can democratize making sense of Big Data for business folks without the need for data scientists (in some domains).” @Brett2point0 agreed, offering that “ideally end users should be empowered to explore their own data, seek their own insights through self-service.”

Gartner’s Doug Laney shared his analysis of current job descriptions for “data scientist” versus those for “BI analyst”. Key words in the “data scientist” job title include: design, knowledge, research, complex, learning, machine, models, problems, and performance; whereas top words used in “BI scientist” job descriptions are reporting/reports, company, technical, industry, user, sql, applications, and metrics. Tony Baer (@TonyBaer) and Doug agreed that communication is the skill that differs theoretical from applied science.

Mark Troester argued that someone needs to have “real intelligence to identify relevance and rationalize data,” and Jill Hulme (@jill_hulme) chimed that “a data scientist needs skills in math, engineering, writing, and a healthy dose of skepticism.” Adrian Bowles (@ajbowles) philosophized that a data scientist is like “a sculptor, finding a figure in material,” and that “Science is discovery, but not all who discover are scientists.”

Mopping Up with Data Quality

Finally we wrapped up with some thoughts on data quality in a Big Data context. Esteban claimed that “Big Data has compounded the [data quality] problem” and that now 40% of the data he sees now is bad. Seth Grimes (@SethGrimes) similarly lamented that “questionable data is the rule rather than the exception in my specialization areas: text and sentiment analysis.”

Yves thinks that “data volumes make it hard for traditional data quality architectures to keep up with big data.” However, Gartner’s Ted Friedman offered up another perspective that “data quality problems can be eased by big volumes in that individual flaws may have less impact when the data set is bigger.”

Mark Troester turned the idea of analytics on its head, recommending, “We shouldn’t just apply data quality for analytics, we should use analytics to help with quality.” He said he’s also “seen people so aggressive about cleansing that they cleanse away insight.”

When some participants suggested that data should ideally be cleansed at the source or when received, Doug Laney cautioned that “you can’t always cleanse data before storing it because of performance and the need to integrate and analyze it first.” Ted Friedman added that data quality is a “harder problem when organizations wish to use data they didn’t produce or don’t own it. The greater competency is assessing data quality…but that depending upon the usage and type of data, some you will still have to get nearly perfect.”

———-

Thanks again to the following individuals and organizations for their participation:
@ajbowles @arbeiza @berkson0 @bgassman @bikespoke @bitterer @Brett2point0 @briellenikaido @chirag_mehta @cpreston64 @cpydimuk @CRozwell @datachick @DataIntegrate @DavideCamera @decisionmgt @DivineParty @donloden @doug_laney @eIQnetworks @ekolsky @erao @EventCloudPro @furukama @howarddresner @iam_joshd @infanteAL @InformaticaCorp @jamet123 @JayMOza @jessewilkins @jill_hulme @johndavidstutts @johnlmyers44 @JohnM_Haddad @JSussin @juliebhunt @loranstefani @marciamarcia @merv @mschneider718 @mtroester @Natasha_D_G @NeilRaden @NekkidTech @nenshad @OhThisBloodyPC @pishabh @RobertsPaige @RomanStanek @rqtaylor @ryanprociuk @s_pritchard @seamuswalsh @SethGrimes @SocialMediaJeff @StacyLeidwinger @stevesarsfield @Tanvi_MR @techguerilla @ted_friedman @timoelliott @TonyBaer @userevents @ValaAfshar @Vivisimo_Inc @wiseanalytics @XeroxDocuShare
@ydemontcheuil

Please join or follow Gartner’s BI, analytics and information management analysts each Friday at 12:00pm ET on Twitter at #GartnerChat.

Note: Some tweets have been edited slightly in this blog to improve their comprehension and/or enhance context.

Comments Off

Category: Uncategorized     Tags: , , , , , , , ,

Blunderfunding: How Organizations Use Failure as a Basis for Budgeting

by Doug Laney  |  January 17, 2012  |  3 Comments

A major Wall Street securities ratings firm ignores the recommendations of a consultant report it paid for on rating collateralized debt obligations (CDOs)–contributing to the collapse of the mortgage industry, near-collapse of the banking industry and a multi-year global recession requiring $trillions in government (tax payer) dollars to avoid a full-blown Depression.

A major video game maker has millions of user IDs and credit card numbers pilfered, and spends many times more than was actually lost in revenue on bolstering its online security.

Thousands of credit cards belonging to Israeli citizens are exposed resulting in an actual military build-up in response.

A major retailer gets slammed by a Twitter and Facebook barrage then decides to implement a social media program.

A shipping line suffers numerous attacks by pirates off the Somali coast. They spend millions paying ransom, beefing security and reconfiguring routes.

The US Post Office continues to borrow from government coffers to run at a financial loss without making changes to its business model. Raising postage rates only exacerbates the problem.

And a an online shoe retailer announced yesterday the potential exposure of account information for as many as 24 million customers. What level of investment will they have to make to prevent this kind of event, let alone to identify and tie-up other loose ends?

True, major snafus are a part of business life, but knee-jerk budgeting in their immediate aftermath to prevent similar future incidents shouldn’t be.  In a recent online discussion of the topic I referred to this kind of behavior as “blunderfunding.” So let’s make it official:

blunderfunding [BLUHND-er-FUHND-ing]
verb

1. basing the level of investment in a business initiative upon the amount of loss incurred from a recent mistake or mishap
2. making a hasty outlay for a project to deflect or cover up for those responsible for a mistake
3. allocating monies or budget to fix a problem symptom rather than its actual cause

Origin:
Tweet by Gartner analyst Doug Laney on 13 Jan 2012

Etymology:
“blunder”: n. a mistake, v. to make a mistake
“funding”: [fund] n. a collection of money for a specific purpose, v. to allocate money for a specific purpose

While examples of enterprise-scale blunderfunding make regular headlines, it is also pervasive throughout lower levels of most organizations.  E.g. Buying “caution cones” to place when recently washed floors may be slippery–only after a hurried person or two did a back-side plant, or the overhaul of server farm air conditioning after overheating resulted in degraded online customer response times.

Some of these blunderfunded investments may be perfectly justified. That is the outlay is less than the risk-adjusted cost of their re-occurrence, and addresses the actual cause. In other cases the risk-adjusted loss (financial loss X the probability of re-occurrence) is much lower than the budget allocated to prevent any such problems in the future. Worse, and perhaps more frequent, money is allocated to fix, repair or even hide the symptom rather than resolve the root cause of the problem.

Organizations tend to compound the damage by neglecting to:

  • calculate the actual economic loss
  • estimate the likelihood of re-occurrence
  • identify similar possible incidents
  • compute the risk-based loss potential of future incidents
  • discover the factors that led to this incident
  • deal directly with the root cause(s), and avoid funding their resolution

What we’ve got here is also a recipe to avoid blunderfunding.

So why is it that we tend to see most blunderfunding is related to information mishandling, misappropriation and misuse? I believe this is because information asset are more easily accessed, more often in-movement, more easily transported. In addition, since information “theft” or “usage” almost never actually involves its depletion in any way (I.e. it’s merely copied not deleted), instances of information breach are that much harder to recognize. Finally, because information assets are not regularly covered by property rights laws, perpetrators if caught can get off easier than if they’d stolen actual “balance sheet” assets.

Just imagine, if you’re a criminal, what kind of loot would be better to heist than one in which:

  • You steal it by sitting at your desk rather than scaling walls, dealing with armed guards or blowing up safes
  • After you steal it, it still remains in place (as if nothing happened)
  • You don’t need a fast truck to carry it off
  • It is the kind of asset that increasingly makes up a large part of a company’s overall valuation
  • Companies don’t measure its economic value, so they typically fail to manage or secure it with the same discipline as their traditional assets
  • You can sell it multiple times to multiple black-market buyers (even on Amazon-like marketplaces)
  • The courts only sometimes consider it to be covered under property laws

I’m not advocating cyber crime, just merely stating why organizations need to be proactive rather than reactive in securing their information assets, and to do so based on these assets’ actual computed value. The alternative is blunderfunding…and potentially more unwelcome headlines.

You can follow Doug on Twitter @doug_laney

3 Comments »

Category: Uncategorized     Tags: , , , , , ,

Deja VVVu: Others Claiming Gartner’s Construct for Big Data

by Doug Laney  |  January 14, 2012  |  4 Comments

In the late 1990s, while a META Group analyst (Note: META is now part of Gartner), it was becoming evident that our clients increasingly were encumbered by their data assets.  While many analysts were talking about, many clients were lamenting, and many vendors were seizing the opportunity of these fast-growing data stores, I also realized that something else was going on. Sea changes in the speed at which data was flowing mainly due to electronic commerce, along with the increasing breadth of data sources, structures and formats due to the post Y2K-ERP application boom were as or more challenging to data management teams than was the increasing quantity of data.

In an attempt to help our clients get a handle on how to recognize, and more importantly, deal with these challenges I began first speaking at industry conferences on this 3-dimensional data challenge of increasing data volume, velocity and variety.  Then in late 2000 I drafted a research note published in February 2001 entitled 3-D Data Management: Controlling Data Volume, Velocity and Variety.

Fast forward to today:  The “3V’s” framework for understanding and dealing with “big data” has now become ubiquitous.  In fact, other research firms, major vendors and consulting firms have even posited the 3Vs (or an unmistakable variant) as their own concept.  Since the original piece is no longer available in Gartner archives but is in increasing demand, I wanted to make it available here for anyone to reference and attribute:

Original Research Note PDF: 3-D Data Management: Controlling Data Volume, Velocity and Variety

Date: 6 February 2001     Author: Doug Laney

3-D Data Management: Controlling Data Volume, Velocity and Variety. Current business conditions and mediums are pushing traditional data management principles to their limits, giving rise to novel and more formalized approaches.

META Trend: During 2001/02, leading enterprises will increasingly use a centralized data warehouse to define a common business vocabulary that improves internal and external collaboration. Through 2003/04, data quality and integration woes will be tempered by data profiling technologies (for generating metadata, consolidated schemas, and integration logic) and information logistics agents. By 2005/06, data, document, and knowledge management will coalesce, driven by schema-agnostic indexing strategies and portal maturity.

The effect of the e-commerce surge, a rise in merger & acquisition activity, increased collaboration, and the drive for harnessing information as a competitive catalyst is driving enterprises to higher levels of consciousness about how data is managed at its most basic level.  In 2001-02, historical, integrated databases (e.g. data warehouses, operational data stores, data marts), will be leveraged not only for intended analytical purposes, but increasingly for intra-enterprise consistency and coordination. By 2003-04, these structures (including their associated metadata) will be on par with application portfolios, organization charts and procedure manuals for defining a business to its employees and affiliates.

Data records, data structures, and definitions commonly accepted throughout an enterprise reduce fiefdoms pulling against each other due to differences in the way each perceives where the enterprise has been, is presently, and is headed.  Readily accessible current and historical records of transactions, affiliates (partners, employees, customers, suppliers), business processes (or rules), along with definitional and navigational metadata (see ADS Delta 896, 21st Century Metadata: Mapping the Enterprise Genome, 7 Aug 2000) enable employees to paddle in the same direction.  Conversely, application-specific data stores (e.g. accounts receivable versus order status), geographic-specific data stores (e.g. North American sales vs. International sales), offer conflicting, or insular views of the enterprise, that while important for feeding transactional systems, provide no “single version of the truth,” giving rise to inconsistency in the way enterprise factions function.

While enterprises struggle to consolidate systems and collapse redundant databases to enable greater operational, analytical, and collaborative consistencies, changing economic conditions have made this job more difficult.  E-commerce, in particular, has exploded data management challenges along three dimensions: volumes, velocity and variety.  In 2001/02, IT organizations must compile a variety of approaches to have at their disposal for dealing with each.

Data Volume

E-commerce channels increase the depth and breadth of data available about a transaction (or any point of interaction). The lower cost of e-channels enables and enterprise to offer its goods or services to more individuals or trading partners, and up to 10x the quantity of data about an individual transaction may be collected—thereby increasing the overall volume of data to be managed.  Furthermore, as enterprises come to see information as a tangible asset, they become reluctant to discard it.

Typically, increases in data volume are handled by purchasing additional online storage.  However as data volume increases, the relative value of each data point decreases proportionately—resulting in a poor financial justification for merely incrementing online storage. Viable alternates and supplements to hanging new disk include:

  • Implementing tiered storage systems (see SIS Delta 860, 19 Apr 2000) that cost effectively balance levels of data utility with data availability using a variety of media.
  • Limiting data collected to that which will be leveraged by current or imminent business processes
  • Limiting certain analytic structures to a percentage of statistically valid sample data.
  • Profiling data sources to identify and subsequently eliminate redundancies
  • Monitoring data usage to determine “cold spots” of unused data that can be eliminated or offloaded to tape (e.g. Ambeo, BEZ Systems, Teleran)
  • Outsourcing data management altogether (e.g. EDS, IBM)

Data Velocity

E-commerce has also increased point-of-interaction (POI) speed, and consequently the pace data used to support interactions and generated by interactions. As POI performance is increasingly perceived as a competitive differentiator (e.g. Web site response, inventory availability analysis, transaction execution, order tracking update, product/service delivery, etc.) so too is an organization’s ability to manage data velocity.  Recognizing that data velocity management is much more than a physical bandwidth and protocol issue, enterprises are implementing architectural solutions such as:

  • Operational data stores (ODSs) that periodically extract, integrate and re-organize production data for operational inquiry or tactical analysis
  • Caches that provide instant access to transaction data while buffering back-end systems from additional load and performance degradation. (Unlike ODSs, caches are updated according to adaptive business rules and have schemas that mimic the back-end source.)
  • Point-to-point (P2P) data routing between databases and applications (e.g. D2K, DataMirror) that circumvents high-latency hub-and-spoke models that are more appropriate for strategic analysis
  • Designing architectures that balance data latency with application data requirements and decision cycles, without assuming the entire information supply chain must be near real-time.

Data Variety

Through 2003/04, no greater barrier to effective data management will exist than the variety of incompatible data formats, non-aligned data structures, and inconsistent data semantics.  By this time, interchange and translation mechanisms will be built into most DBMSs. But until then, application portfolio sprawl (particularly when based on a “strategy” of autonomous software implementations due to e-commerce solution immaturity), increased partnerships, and M&A activity intensifies data variety challenges. Attempts to resolve data variety issues must be approached as an ongoing endeavor encompassing the following techniques:

  • Data profiling (e.g. Data Mentors, Metagenix) to discover hidden relationships and resolve inconsistencies across multiple data sources (see ADS898)
  • XML-based data format “universal translators” that import data into standard XML documents for export into another data format (e.g. infoShark, XML Solutions)
  • Enterprise application integration (EAI) predefined adapters (e.g. NEON, Tibco, Mercator) for acquiring and delivering data between known applications via message queues, or EAI development kits for building custom adapters.
  • Data access middleware (e.g. Information Builders’ EDA/SQL, SAS Access, OLE DB, ODBC) for direct connectivity between applications and databases
  • Distributed query management (DQM) software (e.g. Enth, InfoRay, Metagon) that adds a data routing and integration intelligence layer above “dumb” data access middleware
  • Metadata management solutions (i.e. repositories and schema standards) to capture and make available definitional metadata that can help provide contextual consistency to enterprise data
  • Advanced indexing techniques for relating (if not physically integrating) data of various incompatible types (e.g. multimedia, documents, structured data, business rules).

As with any sufficiently fashionable technology, users should expect the data management market place ebb-and-flow to yield solutions that consolidate multiple techniques and solutions that are increasingly application/environment specific. (See Figure 1 – Data Management Solutions) In selecting a technique or technology, enterprises should first perform an information audit assessing the status of their information supply chain to identify and prioritize particular data management issues.

Business Impact: Attention to data management, particularly in a climate of e-commerce and greater need for collaboration, can enable enterprises to achieve greater returns on their information assets.

Bottom Line: In 2001/02, IT organizations must look beyond traditional direct brute force physical approaches to data management.  Through 2003/04, practices for resolving e-commerce accelerated data volume, velocity and variety issues will become more formalized and diverse.  Increasingly, these techniques involve trade-offs and architectural solutions that involve and impact application portfolios and business strategy decisions.

###

Over the past decade, Gartner analysts including Regina Casonato, Anne Lapkin, Mark A. Beyer, Yvonne Genovese, and Ted Friedman have continued to expand our research on this topic, identifying and refining other “big data” and “extreme data” concepts. In September 2011 they published the tremendous research note Information Management in the 21st Century.  Only time will tell how long it takes for other organizations to seize upon these great new ideas as their own!

Follow Doug on Twitter: @Doug_Laney

4 Comments »

Category: Uncategorized     Tags: ,