by Svetlana Sicular | April 2, 2013 | Comments Off
This is my article, published in Forbes last week.
Volume, velocity and variety characteristics of information assets are not three parts of Gartner’s definition of big data, it is part one, and oftentimes, misunderstood. Most people only retain about one-third of what they read — that explains the truncation. However, to get to the essence of the definition, an effort to comprehend and retain more than what is limited to a single tweet is well-advised even in our fast-paced time. Especially given that Gartner’s big data definition is not much longer than a tweet:
“Big data” is high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.
The definition consists of 23 words, 181 characters with quotation marks. The latter is a hint that Gartner believes “big data” will be the new normal in the very foreseeable future. I also like that this definition reflects relativity of big data. I use it in many dialogs with my clients not just to set a common ground, but to point out where big data challenges and opportunities are. This is how I usually explain it.
Part One: 3Vs
Gartner analyst Doug Laney came up with famous three Vs back in 2001. In 2011, Gartner has identified twelve dimensions of data management — all of which interact with each other and confound each other. We have four dimensions of Management & Control and four dimensions of Qualification. The three V’s are the driving dimensions of big data Quantification (there is a fourth too).
The most interesting of 3Vs is variety: companies are digging out amazing insights from text, locations or log files. Elevator logs help to predict vacated real estate, shoplifters tweet about stolen goods right next to the store, emails contain communication patterns of successful projects. Most of this data already belongs to organizations, but it is sitting there unused — that’s why Gartner calls it dark data. Similar to dark matter in physics, dark data cannot be seen directly, yet it is the bulk of the organizational universe.
Velocity is the most misunderstood data characteristic: it is frequently equated to real-time analytics. Yet, velocity is also about the rate of changes, about linking data sets that are coming with different speeds and about bursts of activities, rather than habitual steady tempo. It is important to realize that events in data arise out of the available data and that available data forms its own “social network”. This means that some data serves as a “canary”, other data influences and yet more data results in decisions. When the temporal relationship between two or more data sets changes (more data suddenly becomes less data), then everything else changes, even the definition of a “data event”.
Volume is about the number of big data mentions in the press and social media. I contribute!
Part Two: Cost-Effective, Innovative Forms of Information Processing
This picture illustrates a typical situation when all problems are labeled as big data problems.
To sort out what can indeed be solved by the new technologies — and this is not one technology — apply part two of our big data definition. Think about technology capabilities to store and process unstructured data; to link data of various types, origins and rates of change; and to perform comprehensive analysis, which became possible for many, rather than for selected few. Don’t expect inexpensive solutions, but expect cost-effective and appropriate answers to your problems.
One of my clients even asked about “big processing of small data.” That counts.
Part Three: Enhanced Insight and Decision Making
Part Three is the ultimate goal. Business value is in the insights, which were not available before. Acting upon the insights is imperative. Missing part three is the most laborious and painful path to the bottom of the Trough of Disillusionment in the Gartner Hype Cycle, especially when parts one and two are present. Other paths to the Trough are also thorny, but necessary on the way to the Slope of Enlightenment.
I tell my clients that their main goals for now are to learn how to identify and formulate big data problems, and to grow their own skills and experience with big data technologies, while these technologies are evolving and maturing. Good solutions are possible although not easy. Just this week I have had several briefings with the companies that deliver unique innovations. These innovations combined represent the power of big data. May its force be with you.
Additional analysis is available in the Gartner Special Report, “Big Data, Bigger Opportunities: Investing in Information and Analytics” at http://www.gartner.com/technology/research/big-data.
Follow Svetlana on Twitter @Sve_Sic
Category: "Data Scientist" Big Data data revolution innovation Trough of Disillusionment Uncategorized Tags: big data, Gartner predicts, Information Everywhere, innovation
by Svetlana Sicular | March 27, 2013 | 1 Comment
I always thought that if all the efforts, energy and money, spent on technology, were invested in studying humans, we would have been equally advanced through mastering our own capabilities. I realized just recently that maybe harnessing the incredible human potential is not a direct process, and technology is only a step of the journey to our true, limitless selves.
There are already huge advancements in genomics and biology, which discover who we are on the lowest levels of granularity, through molecules and DNA. There are little steps of using data to learn about human behavior, conditions and capabilities at scale. The Quantified Self movement already exists. It tells how much time we spend running chores, it reminds us to get in touch with our loved ones if we forget or it shows that by breathing like Buddhists, we can become more energetic and less stressed. Technology gives us hints here and there how to inquire within.
Data-driven society is becoming real thanks to the Nexus of Forces, defined by Gartner as the convergence and mutual reinforcement of social, mobile, cloud and information patterns that drive new scenarios. Google demonstrated technology success as a B2C model, as opposed to the typical BI and analytics models. The focus is shifting from an enterprise to an individual, the technology scope is shifting from the virtual to physical world. Business analytics is shifting from descriptive and diagnostic to predictive and prescriptive, driven by the liberated data that keeps many answers, but not all.
The question What’s next? never ends. I think, personal analytics is next. It will tell people about themselves. It will be descriptive, diagnostic, predictive, prescriptive and most of all, empowering. New googles and oracles of personal analytics are yet to come.
Follow Svetlana on Twitter @Sve_Sic
Category: analytics Big Data data revolution Inquire Within life-logging Tags: analytics, Gartner predicts
by Svetlana Sicular | March 11, 2013 | 1 Comment
From my esteemed colleague Mark Beyer, the co-lead for Gartner big data research.
The intent of Gartner’s advice is to restore order while we provide guidance to a panicking throng scrambling to buy limited edition big data toys. Gartner has advised caution to the market. But, panic in the streets and chaos often drowns out reasonable advice. Gartner believes big data is headed for the Trough of Disillusionment. This has given rise to cries of “foul” and “wrong”. There is an amusement factor here. Overall, it is proof that some vendors simply do not understand exactly what the Trough means. Even more fun, is that Gartner’s presumptive market pundit competitors don’t understand market dynamics as well as the Hype Cycle portrays. Perhaps some remedial education is in order. Gartner’s hype cycle keeps proving itself year after year (Yes, it is Gartner’s trough and Our Hype Cycle. Effectively, all those nay-sayers don’t get votes anyway).
But, first things first. To our end-user and professional organizations, we say simply “Implementers take heed!” Both System Integrators and employee internal developers alike should know how hype and Our Hype Cycle work. The Trough means that market dynamics have changed. There are false claims of simplicity and promises beyond reason which should be carefully vetted and even ignored in favor of maturing solutions. It means that among the honest vendors, a few charlatans selling “cheap knock-off merchandise” from the back of their shady parking lot truck have arrived. Legitimate vendors and legitimate solutions will be confounded by these itinerant peddlers who offer lower prices for shiny trinkets that glitter then break under the first strain. As the Trough deepens, everyone will start to think the practices and solutions are failures because of this unwarranted dilution in the market offerings. And then the very real and highly valuable progress big data contributes to IT and computer science could be lost. It is the job of the hype cycle to warn implementers to be wary of poor solutions — but to NOT give up hope.
Gartner is not saying big data is dead, or gone. To the contrary — we say it becomes the new normal and does so somewhere between 2015 and 2017.
For those vendors that proffer legitimate solutions, we call them to join Gartner to help big data mature into normal IT practices. Experienced market vendors and implementers KNOW what it takes for a solution to mature and reach enterprise capacity. When the market starts to reach 15-20% adoption, then big data will have reached the Plateau, that’s the end of “hype” and the beginning productivity. So, for something to move into the Trough is a maturation process. Implementers and organizations will begin to choose the winning solution architectures and technologies that support them. The chaff will be reduced away from the wheat. By the way, the definition of hype is over-promising without a basis of market experience and proof. The Trough is what does that. So, were I vendor offering a solution, I would be GLAD to see the Trough arrive. My confidence in my own engineering and practices will prove out — in the Trough. Then it will rise along the Slope of Enlightenment while others drop by the way side. You see, if big data were a red-cross dot on the hype cycle, that would mean it is doomed to never reach maturity. But big data is not a red-cross dot. Hmmm? So, when you hear the salesman on the street offering some new, cheap big data trick — give us a call. We’re already hearing where organizations are happy with their solutions and from those who are getting incomplete solutions. Of course, Gartner is not the only place to get answers — but we talk to literally thousands of end-users and implementers every quarter.
And, our customers pay us to tell them the good, the bad and the ugly. “Psst! You wanna buy a big data watch?”
Category: Big Data big data market Crossing the Chasm Gartner hype cycle Trough of Disillusionment Tags: big data, big data adoption, data paprazzi, Gartner predicts, hype, Information Everywhere, innovation, Trough of Disillusionment
by Svetlana Sicular | March 4, 2013 | Comments Off
The most re-tweeted phrase at the SAS analyst conference (#sassb) today is “A data scientist is a business analyst that lives in California.” Every joke has a bit of truth behind it.
Here is a paradox: some simple but essential things slip our attention. What is the last name of Queen Elizabeth II? When did World War II end? Recently, I asked my brother-in-law over a barbeque what exactly he does for a living. I knew vaguely that he is a negotiator. The barbeque juices played their role, and he suddenly started telling me about California employment laws, settlements and people who make their living by accusing companies in … I learned from him that, since 2008, in California, exempt and non-exempt employees are treated equally under certain conditions, and therefore, everyone has to have a right job title and many other right things.
Under the influence (of my brother-in-law), I recalled that around four years ago, my husband received a note from his previous company about a hefty settlement amount for working overtime. I also realized that ever since, my husband’s title has been periodically changing during his employer’s campaigns on giving the right titles. Because my husband does something unique at work and lives in California, his latest title is five words long.
And suddenly I realized that DJ Patil, in his book Building Data Science Teams, wrote that he and Jeff Hammerbacher, under the pressure of LinkedIn’s HR, invented the title of data scientist in 2008. Yes, this was the same law which caused my husband’s multiple titles and which infuriates my brother-in-law, who turned out to be a head of corporate affairs.
Category: "Data Scientist" data paprazzi Information Everywhere skills Uncategorized Tags:
by Svetlana Sicular | March 1, 2013 | Comments Off
I have a multiple choice question: What kind of data revolution are we in?
- All of the above
The industrial revolution of 18-19th century transitioned the world from hand production methods to machines. The ability to affordably analyze data is similar: first-generation Internet companies, Amazons and Googles, publically performed data alchemy – turning data into gold by building analytic factories. IBM Watson’s spectacular victory on Jeopardy demonstrated a new role of technology. Big data analytics revolutionized advertisement, securities trading and retail. Healthcare is about to be transformed through using data for engaging the most underutilized resource – a patient.
The proletarian revolution of 1917 devastated a huge country, Russia. Proletariat is the social class that does not own the means of production and therefore, sells its labor power. Similarly, I observe a class of data proletarians who don’t have tools and sweat over manual coding, unaware of classical culture, such as data quality, master data management or even ETL. (I recently told one of them, when he was grappling with customer data: “You need an MDM,” – “Remind me what is stands for.”) A proletarian idea that “a cook can govern the state” can undermine credibility of new technologies.
Social Data Revolution is the term, coined by Dr Andreas Weigend. For the first time in history we can learn about humans at scale. Now, we have an unprecedented knowledge of what they think and how they are connected. The most important aspect of social data revolution is in connecting virtual and physical worlds, and making both of them different.
When I don’t have a good answer, I usually pick the longest among multiple choices – all of the above. It includes our social responsibility to remember that a proletarian revolution is devastating. And an industrial revolution can last decades. It is just the beginning.
Category: analytics Big Data data data governance data paprazzi data revolution Information Everywhere market analysis open source Uncategorized Tags: big data, data paprazzi, hadoop, Information Everywhere, Silicon Valley
by Svetlana Sicular | February 12, 2013 | 1 Comment
Like Lilliputians from Gulliver’s Travels, divided over which end of a boiled egg to crack, modern technologists are debating whether big data is a revolution or an evolution. Roughly 60% of the delegates at the last week’s Gartner BI Summit in Europe voted on evolution. It looks more like an evolution to these BI and analytics practitioners, because data is part of their job routine. It is a revolution for the others, who have not dealt with data before, and now they must. Gartner predicts enormous amount of big data jobs, only one third of which will be filled. Those, who voted for the evolution, are already a part of this one third.
Why are we on the brink of data revolution?
- First of all, revolutions disrupt. The signs of data liberation and democratization are everywhere. Organizations are interested in dark data, trapped in dungeons and silos, because they want to free it and use. People type text messages, blog and tweet — they democratize data. Machines talk to machines via data. Cars send information back to the manufacturers — they liberate data (which disrupts the automobile industry). Only non-disruptive effects of big data are evolutionary; the game-changing consequences of data are revolutionary.
- Second, revolutions happen, when the “lower classes” do not want to continue in the old way and the “upper classes” cannot carry on in the old way. Enterprise information management and analytics practitioners do not want to process new and liberated data in the old way — they start using new technologies, sometimes in the cloud closet, but more often, openly. Upper management cannot carry on in the old way: companies have to compete and win in the data-driven economy. They are attracted by the shine of gold, a pure result of data alchemy.
- Finally, as it always happens during revolutions, most people are confused about what is going on. They struggle to understand where, when and how to use data. They have ideas for big data or consider possibilities, but knowing which to pursue is difficult. These people are exactly the majority that usually decides the fate of revolutions.
If you are currently dealing with the dilemma of data revolution vs. evolution, keep in mind a Lilliputian prophet, who has said, “All true believers shall break their eggs at the convenient end.”
Follow Svetlana on Twitter @Sve_Sic
Category: "Data Scientist" Big Data cloud data data paprazzi data revolution EIM Information Everywhere Inquire Within market analysis Uncategorized Tags: data revolution
by Svetlana Sicular | January 31, 2013 | 2 Comments
Big data is moving from closets full of hidden Hadoop clusters into corner offices, where CEOs and boards of directors issue orders to deliver “big data strategy,” whatever it means. This is called “Activity beyond early adopters” in Gartner’s Hype Cycle.
Once in awhile, I get requests to help explain big data, so it would be understood by any curious pedestrian, who is eager to learn the meaning of that very technical term, which has something “big” in it.
Gartner Hype Cycle Explained
Power of Distributed Processing of Large Datasets In-place Explained
Some years ago, a friend’s brother wanted a mail order bride, but his capacity of bringing in the brides from overseas was limited, so he decided to travel to a strange land, “go to catalog,” so his choice of brides would radically scale up. The ability to process data in-place is similar: processing moves “to catalog,” the initial overhead is high and the land is strange until you get used to it. Some people enjoy it, some – go back home.
Microsegmentation — the segment of one instead of clusters of many: a 32-year old blond who likes coffee with sweet cream is a more vivid marketing target than a cluster of soccer moms between 30 and 40. The best explanation of microsegmentation I can think of is by the Argentinean writer Julio Cortazar, a master of magic realism (a literary genre about magic elements being a natural part in an otherwise mundane, realistic environment). Magic realism sounds to me like big data analytics, as presented by many vendors. I couldn’t find Cortazar’s “Cronopios and Famas” online, so I re-typed this little story below from a book. Even its title alludes to data sciences!
HIS FAITH IN THE SCIENCES (Julio Cortazar on microsegmentation)
Follow Svetlana on Twitter @Sve_Sic
An esperanza believed in physiognomical types, such as for instance the pug-nosed type, the fish-faced type, those with a large air intake, the jaundiced type, the beetle-browed, those with an intellectual face, the hairdresser type, etc. Ready to classify these groups definitively, he began by making long lists of acquaintances and dividing them into the categories cited above.
He took first group, consisting of eight pug-nosed types, and noticed that surprisingly these boys divided actually into 3 subgroups, namely pugnoses of the mustaches type, pugnoses of the pugilist type, and pugnoses of the ministry-appointee sort, composed respectively of 3, 3, and 2 pugnoses in each particularized category. Hardly had he separated them into their new groupings (at the Paulista Bar in the calle San Martin, where he had gathered them together at great pains and no small amount of coffee with sweet cream, well whipped) when he noticed that the first subgroup was not homogenous, since two of mustache-type pugnoses belonged to the rodent variety while the remaining one was most certainly a pugnose of Japanese-court sort. Well. Putting this latter aside, with the help of a hefty sandwich of anchovies and hard-boiled eggs, he organized a subgroup of the two rodent types, and was getting ready to set it down in his notebook of scientific data when one rodent type looked to one side and the other turned in the opposite direction, with the result that the esperanza, and furthermore everyone there, could perceive quite clearly that, while the first of the rodent types was evidently brachycephalic pugnose, the other exhibited a cranium much more suited to hanging a hat on than to wearing one.
So it was the subgroup dissolved, and as for the rest, better not to mention it, since the remainder of the subjects had graduated from coffee with sweet cream to coffee with flaming cognac, and the only way in which they seemed to resemble one another at the height of these festivities was in their common and well-entrenched desire to continue getting drunk at the expense of the esperanza.
Category: "Data Scientist" analytics Big Data Crossing the Chasm data data paprazzi market analysis Uncategorized Tags: analytics, big data, big data adoption, data paprazzi, Information Everywhere, vendors
by Svetlana Sicular | January 22, 2013 | 36 Comments
My presentation on big data for the upcoming BI Summit in Barcelona is obsolete. In this presentation, I use the Gartner Hype Cycle curve to show that big data is at the peak of inflated expectations. And, as it happens with quickly developing technologies, I am already behind and big data goes ahead.
Last several weeks show that big data is falling into the trough of disillusionment. I realized it earlier today, when I was describing a recent Elephant Riders meetup to my colleagues at Gartner. MapR, HortonWorks and Cloudera were debating the state of Hadoop. And I heard from the very core of the Hadoop movement that MapReduce has always been Hadoop’s bottleneck or that Hadoop is “primitive and old-fashioned.” This is the video of the event. If you watch it, you can notice more points, which signal the beginning of disillusionment (and get a lot of useful information too). Congratulations, big data technology is maturing fast!
Gartner Hype Cycle: Where is Big Data Now?
Meanwhile, my most advanced with Hadoop clients are also getting disillusioned. They do not realize that they are ahead of others and think that someone else is successful while they are struggling. These organizations have fascinating ideas, but they are disappointed with a difficulty of figuring out reliable solutions. Their disappointment applies to more advanced cases of sentiment analysis, which go beyond traditional vendor offerings. Difficulties are also abundant when organizations work on new ideas, which depend on factors that have been traditionally outside of their industry competence, e.g. linking a variety of unstructured data sources. Several days ago, a financial industry client told me that framing a right question to express a game-changing idea is extremely challenging: first, selecting a question from multiple candidates; second, breaking it down to many sub-questions; and, third, answering even one of them reliably. It is hard.
Formulating a right question is always hard, but with big data, it is an order of magnitude harder, because you are blazing the trail (not grazing on the green field). At the upcoming BI Summit in Barcelona, I will facilitate a user round table exactly about this — From “Satisficing” to Satisfying Business Requirements. Validating answers is also a tough job — big data analytics deals with uncertainty: you do not deduct the number and say that the meaning of life is 42 — you get a proof of your hypothesis with a certain degree of confidence. And it is up to you to decide what level of confidence is satisfying and what is “satisficing.” (A “satisficing” solution is the first solution that appears good enough.)
Back to the trough of disillusionment. Or, rather, forward to the trough. To minimize the depth of the fall, companies must be at a high enough (satisficing) level of analytical and enterprise information management maturity combined with organizational support of innovation. Oops, I promised myself to be a reporter, not an analyst in my blogs.
The only consistent success, reported by my clients, is with log analysis using Splunk. Why? Because Splunk is a (nice) tool. And plateau of productivity will be reached when tools and product suites saturate the market. Meanwhile, according to the Gartner Hype Cycle, the next stop for big data is negative press. Does this blog post count as such?
Follow Svetlana on Twitter @Sve_Sic
Category: "Data Scientist" analytics Big Data big data market Crossing the Chasm data paprazzi EIM events Hadoop Information Everywhere innovation Local News Uncategorized Tags: BI Summit, big data, big data adoption, data paprazzi, data scientist, data spy, end users, hadoop, Hadoop distribution, Information Everywhere, innovation, Silicon Valley, vendors
by Svetlana Sicular | January 15, 2013 | Comments Off
A list of my top five Gartner favorites could be top 25 but I decided to make it short (Gartner’s New Year resolution is to write shorter documents and fewer too). In the past year, prefix “un-“ was quite popular: un-learn, un-conference, un-remarkable. My un-analysis compares apples to oranges, and I love all five of them in no particular order.
Top speaker – Mark Beyer, a father of the Logical Data Warehouse. His presentation, a piece of art, “How will we survive without EDW?” at the Gartner BI Summits in London and the encore in Los Angeles — convincing, though-provoking and artistic. For those who can access BI Summit recordings, I recommend listening to Mark when they are lost in a maze of their data warehouse or just in need of inspiration. This is Mark’s latest predict, “Through 2014, 20% of enterprise warehouses will add distributed processes into their production like MapReduce to begin deploying the logical data warehouse.”
Top politician – Tina Nunno, who preaches to the silent choir that IT is about politics. Picture this:
Top big data authority – Merv Adrian, @Merv. He is not just a thinker, but a man who cares. Without Merv, big data would have been smaller.
Top overall content – by Mike Rollings who writes about organizational issues in the context of enterprise architecture. “The transformation from … to … cannot be purchased,” fill in the blanks, “it is a mindset change that demands new thinking and new practices.” The necessity of the mindset shift, often overlooked, and even denied, is the key to success to embracing big data in particular.
Top research agenda – 2013 Planning Guide: Data Management. This is our team agenda, hence, it is on my top five list. My future contributions are on the subjects of big data analytics, governance, enablement and monetization. And also on customer centricity (my spell checker prompts eccentricity instead, which is not far-fetched, given that every single customer is rational but aggregations turn them into an irrational crowd just to boomerang unforeseen insights as personalization). Combined, my research topics make up another top five list.
To finish, I’ll quote our 2013 Planning Guide’s bottom line, “There has never been a better time to be an information management professional.” For those who might not remember, this was the bottom line last year too, e.g. it’s a better and better time for information management professionals.
Follow Svetlana on Twitter @Sve_Sic
Category: analytics Big Data data data governance data paprazzi Uncategorized Tags: Gartner, Top five
by Svetlana Sicular | December 18, 2012 | Comments Off
“Data scientist is the sexiest job of the 21st century” (Harvard Business Review)
“Cybernetics is a whore of imperialism” (Stalin, 20th century)
I cannot quite catch an association between these two sentences. Maybe it’s in the transitions: from a whore — to the sexiest job, from East — to West, from socialism — to capitalism, from the past — to the present.
The most significant science, associated with the sexiest occupation of the 21st century, is math. This gets me thinking about what happens with mathematicians over time. In my high school for advanced math studies, we subscribed to a magazine for teenagers who were interested in physics and math. The name of the magazine was “The Quant.” And coincidentally, these were the teenagers some of whom later became the Wall Street quants. My schoolmates with advanced math degrees turned into actuaries, underwriters, hedge fund managers, entrepreneurs, quants, analytics experts, programmers and just mathematicians.
Here is the story of one of them. His dream was to be a math professor, and he was almost there. He got a job in the good East Coast university. But his wife, who worked on Wall Street, kept nagging him about great possibilities in the business world. He loved math, not money. Finally, he agreed to go to a job interview. The result was peculiar: my hero came back home super-excited because he met more advanced and accomplished mathematicians in the corporate world than in academia. The rest is history with a happy ending and impressive bonuses.
Last month, I met my classmate, who was the best in math among us, math kids. He dedicated his life to this science. I told him excitedly how sexy data science is, and consequently, about math with its sex appeal to data science. He did not understand despite his diligent attempts. I said I regret they did not teach us in college how math applies to various manifestations of life. He shrugged and said that it does not. I said there are new data paradigms that could use theoretical math in practice, for example, Banach spaces might expose intricacies of big data — that would be sexy. He told me that there are two maths: one is pure and high, and another one… Well, it also has a right to existence.
Bottom line. If we apply the math that is pure and out of this world to data, we might get unseen insights, not accessible otherwise. Some mathematicians can see with their mind’s eye spaces way more complex than just 3D — big data is not 3D either (it is viewed as 2D matrices at best right now). But there is a disconnect between two maths, the pure and the other, worse than a disconnect between the business and IT.
Follow Svetlana on Twitter @Sve_Sic
Category: "Data Scientist" analytics Big Data data Information Everywhere innovation Inquire Within Mathematics skills Uncategorized Tags: analysis, analytics, big data, data, data paprazzi, data scientist, hiring, Information Everywhere, innovation, math, pseudo-tweets, scientists