by Doug Laney | May 24, 2013 | 1 Comment
As we watch America’s greatest auto racing spectacle this Memorial Day weekend, what we won’t see is even bigger than the event itself, faster than the cars themselves, and more varied than the driver personalities. Of course I’m talking about the data. Racing teams now eat Big Data for breakfast, lunch and dinner. And for snacks in-between.
Outside, Indy cars and their cousin Formula-1 cars may be covered with dozens of sponsor logos, but inside they’re smattered with nearly 200 sensors constantly measuring the performance of the engine, clutch, gearbox, differential, fuel system, oil, steering, tires, drag reduction system (DRS), and dozens of other components, as well as the drivers’ health. These sensors spew about 1GB of telemetry per race to engineers pouring over them during the race and data scientists crunching them between races. According to McClaren, its computers run a thousand simulations during the race. After just a couple laps they can predict the performance of each subsystem with up to 90% accuracy. And since most of these subsystems can be tuned during the race, engineers pit crews and drivers can proactively make minute adjustments throughout the race as the car and conditions change.
Throughout the season, based on this accumulated data warehouse of information on car performance, driver performance, tracks and conditions, racing teams will make 50 or more mods per day. And for each season, new cars are built from the ground up using 95% new parts designed using this data.
Of course all these modifications need to adhere to fluctuating, fastidious and unforgiving racing league specifications. So analytics to ensure compliance is just as important.
Telemetry Tech on the Track
So what’s behind all this Big Data wizardry? Here’s a summary of some of what McLaren Electronics has built and baked into and around its team’s cars:
- Its latest data collection device, the TAG-320, features 4000MIPS of processing power, 512MB internal RAM, 8GB of logged data capacity, 13 buses, up to 100kHz analog sampling rate, internal accelerometer, 4000 logging channels, and a 1Gbps Ethernet link speed. Most of these characteristics are a 5-10x improvement over the previous 2008 TAG-310b model.
- The ATLAS (Advanced Telemetry & Linked Acquisition System) is a suite of analytics tools for real time storage, analysis, visualization and manipulation of data. It provides a customizable workbook, graphical timelines and other comparative visualization, heuristic car system checks, automated data alignment and sequencing, and a Microsoft SQL Server API. ATLAS offers analysis features called functions to combine parameters and develop sophisticated analytics, checks to automatically assess any car component, and markers to automatically or manually pinpoint the time when some anomaly happens.
- Accelerated data analytics is achieved using SAP’s HANA in-memory database
- Its Remote Data Server (RDS) enables live telemetry to be viewed simultaneously anywhere in the world by factory engineers, parts suppliers and data analysts
- Simulation capabilities using MATLAB (Simulink) can determine what might happen under different track or race situations, or if a driver behavior or car system were changed
- Special servers are used for collecting and integrating weather and other external data
Is Your Business on Track with Big Data?
All the excitement of auto racing aside, consider the key underlying components of what racing teams are doing to accelerate the performance of their cars and drivers and how these techniques can and should apply to your albeit relatively mundane business.
Use this checklist to see if your business will have a checkered future or get the checkered flag:
- Are you sufficiently monitoring key business processes, systems and personnel using available sensors and instrumentation?
- Are your data streams collected frequently enough for real-time process adjustments (i.e. complex event processing)?
- Do your business processes support real-time or near real-time inputs to adjust their operation or performance?
- Can you anticipate business process or system failures before they occur, or are you doing too much reactive maintenance?
- Do you centrally collect data about business function performance?
- Do you make use of advances in high-performance analytics such as in-memory databases, NoSQL databases, data warehouse appliances, etc.?
- Do you gather important external data (e.g. weather, economic) to supplement and integrate with your own data?
- Do you synchronize, align and integrate data that comes from different streams?
- Do you make your data available to key business partners, suppliers and customers to help them provide better products and services to you?
- Do you have a common, sophisticated analytics platform that includes the ability to establish new analytic functions, alerts, triggers, visualizations?
- Can you run simulations on business systems while they’re operating and also between events to adjust strategies?
- Does your architecture support multiple users around the world seeing real-time business performance simultaneously?
- Do you have teams of business experts, product/service experts and data scientists collaborating on making sense of the data?
- Do you modify your products or services as frequently as you could or should based on available data?
- Do you also use data you collect to develop new products or services as frequently as you could or should?
Racing teams are able to invest in advanced analytics because millions of dollars and euros are on the line from hundreds of sponsors. Hopefully your own big data project sponsors appreciate that big money is on the line for your business as well. Winning the race in your industry now probably depends on it.
Also follow Doug on Twitter @Doug_Laney
Category: Uncategorized Tags: analytics, auto racing, big data, business intelligence, indianapolis 500, indy 500, operational technology, performance management, racing, telemetry
by Doug Laney | April 3, 2013 | 3 Comments
Given all the hype over Big Data and concerns of data ownership, I thought it would be interesting to explore who actually owns Big Data, no I mean really owns “big data.” Yes, the trademark. Next stop, the United States Patent and Trademark Office online database.
Talk about Big Data. The database contains a treasure trove of over 8 million patents and 16 million filings dating back to Samuel Hopkins’s 1790 registered process of making potash, an ingredient used in fertilizer (signed by President George Washington no-less), and the oldest active trademark, SAMSON, registered for a brand of rope in 1884, among the nearly 3 million trademarks. With almost 200,000 patent applications and 100,000 trademark applications a year and growing, so are the ranks of the examiners–almost doubling since 2005.
But back to “Big Data”. The term has been in use since at least the mid 1990′s, seemingly coined by Silicon Graphics chief engineer John Mashey who gave a seminar entitled “”Big Data & the Next Wave of InfraStress.” However since he never trademarked it, who did?
Those of you pioneers in data warehousing will remember a boutique consulting firm, often joined at the hip with Teradata, based in Chicago called Knightsbridge Solutions. Knightsbridge specialized in building large databases and data warehouses before it was absorbed into HP. On January 9, 2001, a Knightsbridge attorney filed the trademark and “big data” became a US citizen or whatever. However, they must have liked the term about as much as most of the industry does today (despite its popularity), as they abandoned the trademark less than a year later.
It wasn’t until ten years passed that an enterprising man in Texas reclaimed it only to abandon it again months later. Poor Big Data! It’s been declared dead twice even before it slides into the Gartner® Hype Cycle™ Trough of Disillusionment™. Not to worry, a fledgling VC called Big Data Boston Ventures nabbed the mark last summer. Until they launch, it seems to be the only asset in their portfolio.
Good news for those of you feeling like you missed the boat, there are plenty of variants still available. The USPTO site lists only 44 related marks including clever ones such as “Bigdata”, “Making Big Data Small”, “Big Data for the Little Guy”, “Rocket Fuel for Big Data Apps”, “Dominating Big Data”, “Wala! Big Data Simplified”, and my personal favorite that integrates large information and lager libation: “Big Data on Tap.”
Here’s to you Big Data! You’ve made your mark.
Follow Doug on Twitter: @Doug_Laney
Category: Uncategorized Tags: big data, intellectual property, patent, trademark
by Doug Laney | December 26, 2012 | 2 Comments
2012 has seen an acknowledgement and mainstream awareness of the challenges of managing the burgeoning streams of information generated and available to organizations, particularly big data. In 2013, I expect the focus to shift to the challenges of developing and implementing enterprise strategies for making use of all this data.
Opportunities abound for deploying information in transformative ways. Gartner’s 2013 research agenda will help IT and business leaders develop and execute strategies for achieving higher returns on their information assets. This includes leveraging big data, enhancing analytic capabilities, achieving more disciplined information asset management approaches, and incorporating new and expanded information-related roles:
The volume, velocity, and variety of information sources available today to organizations is more than just an information management challenge. Rather this phenomenon represents an incredible opportunity to improve enterprise performance significantly and even transform their businesses or industries. More than merely reporting on information or even basic decision making support, information assets are an instrument for innovation. Making this strategic shift quickly enough to meet create competitive advantage is the real challenge for most businesses.
Key issues Gartner will be exploring throughout the coming year are also questions business and IT leaders should be asking themselves:
Business uses and sources of information
- What are the range of internal and external sources of data available, starting with our own underutilized “dark data”?
- How can information be used, not just for decision-making, but for greater business insights and process automation?
- How can information be used to foster relationships and improve collaboration with our employees, partners, customers and/or suppliers?
- How can information facilitate business transformation and innovation, beyond just incremental performance improvements?
- How can information be monetized by packaging, sharing and/or selling it?
- How can we evolve to a more information-centric culture?
- How can our IT and business groups organize for achieving higher levels of information performance?
- What emerging information-related skills and methods should be considered, planned for, used or acquired?
Value and economics of information (infonomics)
- Why should and how can we inventory, measure and quantify their information assets?
- How can information’s value be used to justifying and gauge the ROI of information-related initiatives, as well as other IT and business initiatives?
- How can we manage information as an actual corporate asset?
So if you’re looking to make a corporate New Year’s resolution to do more with data for driving corporate value, consider developing answers to each of these questions. And keep an eye on Gartner’s Information Innovation research throughout 2013.
Follow Doug on Twitter: @doug_laney
Category: Uncategorized Tags: analytics, big data, bigdata, infonomics, information management, innovation, new year's, planning, strategy, vision
by Doug Laney | December 20, 2012 | 7 Comments
Going into the 2012 holiday season, North Pole Inc. (ticker: XMAS), the leading global distributor of presents to good girls and boys, called upon Gartner to assess and advise on its information related needs and opportunities.
STAMFORD, Conn., December 18, 2012—
Over the past quarter, Gartner was given exclusive access to the operations and information systems of North Pole Inc. (NPI), to help it set a strategic path for improved information management and analytic capabilities. For nearly two centuries NPI has struggled to support its growing operation and respond proactively to competitive pressures through the use of emerging technologies and best practices.
“We do a jolly good job year after year,” claims NPI’s Founder and CEO, Santa Claus, “but I have really put the pressure on my IT management team to achieve better efficiencies and creatively use information to innovate.”
As a long-time Gartner client, NPI has read about how other enterprises have selectively adopted information technologies, embraced new architectures and approaches, and acquired the necessary skills. “Now it’s our turn,” exclaimed NPI’s CIO Frederick Ellefsen. “We’ve heard a lot about the term ‘big data’ and the significant opportunities indicated by the confluence of mobile, cloud, social and information—Gartner’s Nexus of Forces—so we didn’t want to be left out in the cold, so to speak.”
“This is a unique opportunity for Gartner to be exposed to the inner workings of one of the world’s most secretive yet successful enterprises,” said Peter Sondergaard, Gartner SVP Research. “We were pleased to be able to offer our services and insights to NPI.”
Gartner’s review of NPI’s systems revealed an operation not too dissimilar to other distributors and some major retailers, but on a much larger scale. However due to NPI’s unique legal status it has no finance department, nor does it have a sales or marketing function.
Figure 1 - North Pole Inc. Operations
Santa’s Systems Portfolio
Key systems in NPI’s portfolio manage orders, inventory, quality testing, elfin performance and activities, along with tracking human behavior, correspondence, wish lists and contact information, and also environmental impact data. To achieve NPI’s objective of managing and leveraging information as an actual enterprise asset, Gartner first completed an inventory of the NPI’s extensive wealth of information assets:
- Toy Order Management System (“Tommy”) – Toy orders and order tracking of 5.5 billion orders; supplier and 2nd level supply chain and parts level visibility of 4.6 million suppliers; toy orders and order tracking of 5.5 billion orders
- Toy Inventory Management System (“Timmy”) – Receiving and inventory data on 6.9 billion toys
- Toy Assurance Management System (“Tammy”) – Test results and repairs/returns data on all toys received (average of three safety and quality tests per toy) totaling 21 billion tests annually
- Content system for Relations, Inbound Gift Request and Letters (CRINGLE) – Processing, scanning, content extraction and analysis of 6.5 million letters, emails and calls, and recording 19.5 million gifts requested
- Naughty or Nice Information Tracking System (NITS) – Processing and tagging of 16.8 trillion person-to-person interactions throughout the year
- Scheduling, Logistics and Expedited Distribution System (SLEDS) — Handling of 500,000 appearance requests and 280,000 actual mall and other appearances; the operation of 7700 gift express hubs and the logistics and maintenance of the half-million sleighs servicing them; and night-of-delivery (NOD) routing
- Kontact Information & Directory System (KIDS) – Basic contact, rooftop and chimney configuration information on 2.3 billion gift recipients and their 880 million households
- Helper Organization, Operations and Orchestration (HO-HO-HO) – Scheduling and coordination of elf workforce job responsibilities and activities; also coordinates elf housing and food service
- Job Information, Guidance, Learning & Elf Management System (JINGLES) – General elf resource (ER) system for tracking the performance, benefits and training activities of 230 million elves, along with ongoing recruiting activities
- Study for Negating the Outcome of Warming (SNOW) – A longitudinal study as part of NPI’s sustainability efforts. Millions of climate, atmospheric, emissions, deforestation, and animal and human population data points are collected annually to help NPI achieve its target of carbon neutrality by 2020
[See bottom of article for North Pole Inc. Core Data Requirements and Database Sizing]
Data Quality as Pure as the Driven Snow
Due to impeccable data governance and quality processes, a world-class master data management program, an impressive team of data elves, robust data quality technology, and unwavering executive-level commitment and involvement, NPI’s information assets show no signs of significant completeness, accuracy, integrity or other quality issues according to sample data profiling using Gartner’s data quality assessment toolkit.
Analytic Opportunities Beyond Just “Naughty or Nice”
From a business intelligence perspective, Gartner found that NPI is lagging others in the shipping and distribution industry. Its enterprise data warehouse , called “Chimneys”, is really a collection of stovepipe query and reporting systems, some still relying on first-generation BI tools like Red Brick. Gartner recommended evolving to a logical data warehouse architecture for most low-frequency queries to enable more insightful cross-functional, federated analytics.
Some predictive analytics is done to select appropriate toys based on NITS behavior modeling, demographics and prior-year presents. Gartner recommended that this system be enhanced to account for factors such as sibling response, damage/loss propensity, and social content analysis. NPI however is working on mobile-enabling Santa in the field during mall appearances so he can advise on toy availability and alternatives (as necessary) in real-time while a child is on his lap. This system is expected to be in place for the 2013 holiday season. Gartner analysts pointed out that this new capability would also require enhancing its “Tommy” toy order management system to capture full catalog and supply chain information from its suppliers. Today NPI only maintains this tracking data on actual orders.
Although NPI does a great job of social media participation, including a multi-channel Twitter strategy (i.e. @santa, @officialsanta, @santaclaus, @santa_claus, etc.), Gartner recommended that NPI begin tapping and analyzing social media streams. Social sentiment analysis will help NPI identify emerging “hot toys” for pre-ordering, and identify early warning signals of quality-related issues. NPI also took into consideration the idea of integrating global economic data to better focus its gift giving on those in the greatest need. However, NPI like many organizations is struggling to hire or train a team of data scientists. “Advanced analytics just isn’t a core elfin competency,” lamented Mr. Ellefsen. “We’re definitely going to have to fly up outside talent for a period of time.”
Operational Efficiency at Times Glacial
Gartner also advised NPI on how to consolidate its ordering process and information. Since the late 1970s, NPI has being consolidating inbound shipments using its gift express hubs scattered secretly in forests around the world. However it still orders and inventories gifts from suppliers one-by-one. “Our ‘Tommy’ system is definitely outmoded,” admitted Mr. Ellefsen. With sophisticated demand analysis, order pattern matching and smart RFID-enabled inventory management, Gartner believes NPI could save 70-80% of its current TOM processing expense.
No More Cookie Cutter Approaches to Data Management
Regarding the human behavior tracking system (NITS), Gartner suggested that in today’s world perhaps both online interactions (text, email, social media) and human-to-animal interactions should also be captured and tagged as “naughty” or “nice”, and that a broader 5-point Likert scale or automated video/audio analysis might improve measurement precision. NPI is obviously concerned by the size and performance of this already 168 terabyte system, but will be looking into HDFS or other NoSQL alternatives to support expanded tracking ideas. “For obvious reasons, we got away from inverted tree data management structures years ago,” Mr. Ellefsen chuckled.
Gartner and NPI also discussed a long-term cloud strategy. But with over 200 terabytes of online operational data, austere personally identifiable information (PII) privacy and security requirements, and spotty connectivity at its arctic headquarters, Gartner recommended that at this time NPI only consider hosted data solutions for its 7700 gift express hubs.
A Big Sack of New Ideas for Big Data
During the “Workshop at the Workshop” session as it was called, Gartner and NPI generated many innovative ways to use information, including:
- selecting toys that would encourage naughtier kids to be nicer
- putting de-identified data online for suppliers to analyze
- realtime NOD (night of delivery) routing and navigation via integrated weather, GPS and air traffic data to optimize Santa’s 10,200 takeoffs, landings and deliveries per second.
However the entire NPI management team was quick to squash the subject of transitioning to an outsourced, mobile-enabled parental workforce. “Elves have magical capabilities beyond those of most humans,” Mr. Claus interrupted, “Not to mention a tremendously strong union.”
Doug Laney, VP Analytics and Information Management
Gartner, Inc. (NYSE: IT) is the world’s leading information technology research and advisory company. Gartner delivers the technology-related insight necessary for its clients to make the right decisions, every day. From CIOs and senior IT leaders in corporations and government agencies, to business leaders in high-tech and telecom enterprises and professional services firms, to technology investors, Gartner is the valuable partner to clients in 12,000 distinct organizations. Through the resources of Gartner Research, Gartner Executive Programs, Gartner Consulting and Gartner Events, Gartner works with every client to research, analyze and interpret the business of IT within the context of their individual role. Founded in 1979, Gartner is headquartered in Stamford, Connecticut, U.S.A., and has 5,000 associates, including 1,280 research analysts and consultants, and clients in 85 countries. For more information, www.gartner.com.
North Pole Inc. Core Data Requirements and Database Sizing*
* For non-believers, these data sizings were derived from various sources: Population data used to determine the number of worldwide Christians (2.3B) and Christian households (884M) is from the US Census, the Catholic Education Resource Center, the Christian Post, and the the Global Population Clock. The average number of presents from Santa (3, excluding stocking stuffers) is from Babycenter.com and CircleofMoms.com. The number of person-to-person interactions (20/day) for calculating the volume of “naught/nice” data comes from the Tilted Forum Project on Humanity, Sexuality and Philosophy. The amount of correspondence Santa receives is from a Wired Magazine article (500K letters annually) and extrapolated to include emails and worldwide correspondence. The number of toy makers (1547 in US) is from toydirectory.com and is extrapolated to include worldwide toy makers, suppliers and parts. The number of shopping malls (105,000 in US) is from the International Council of Shopping Centers. And package delivery, transportation and personnel numbers are extrapolated from public FedEx data.
Category: Uncategorized Tags: analytics, BI, big data, bigdata, business intelligence, christmas, cloud, data management, data warehouse, humor, information management, mobile, predictive analytics, santa, social media
by Doug Laney | December 18, 2012 | 3 Comments
To understand the significance of December 21, 2012 to the Mayans (and today’s mass media) it’s necessary to recognize and understand the Mayan numbering system, theology and astronomical prowess.
First, the Mayan had two numbering systems which more-or-less are akin to our distinct decimal system for counting things, and our Gregorian system for counting dates. However, their numerical system is a base-20 vigesimal, not base-10 decimal system. This owes to the fact that they felt perfectly comfortable using their toes for counting, and relished the ability to represent petabyte-scale numbers like faraway dates efficiently. The downside of this and some unfortunate anomalies they introduced was that they never were able to master multiplication or division. Unlike the ancient Romans though, Mayan data modelers did invent a symbol for the number zero, which turns out to be an important part of the story.
However, unlike most of our cultures the Mayans also had two distinct calendar systems: the “Short Count” and “Long Count”. The Short Count derives from a sacred count of 260 days known as the tzolkin munged with Venus’s relatively-protracted year. Although based in part upon astronomical observations, this calendar was purely for ritualistic purposes, still used by Guatemalan highlanders today, and bears no relevance to our imminent ominous occasion. The Long Count calendar is also based on astronomical observations and cycles, and multiples thereof.
The longest of the five nested Long Count cycles is the Baktun which is 144,000 days or about 400 years – interestingly the same as our present-day quadricentenial leap year cycle. The 13-Baktun “Great Cycle” spans 5125.36 years, completing (and iterating, I hope) on December 21, or 184.108.40.206.0 in Mayan nomenclature.
But why December 21st? What happened 5125 years ago on 0.0.0.0.0? The answer that has perplexed scholars until recently is: nothing. Nothing happened on that date—which happens to predate the Mayan civilization by some 3000 years. Unlike most modern-day cultures whose ethnocentric calendars begin on an important date in their own history, the Mayans saw themselves as part of a much bigger and longer picture…one of astronomical scale. It wasn’t until scholars determined that the date 220.127.116.11.0 coincides with a confluence of Mayan theology and rare astronomical events (due to the astrological precession caused by the slow wobbling of the Earth’s axis) that they realized the Mayan calendar is reverse-engineered.
After decades and centuries of data collection (i.e. ancient Big Data curating methods), the Mayan’s best data scientists projected that on December 21, 2012 the Sun’s ecliptic will pass through the center (“dark region” or “dark road”) of the Milky Way, not just on any old day, but on the Winter solstice. It is on this day that the Mayan’s depict their sun god Pacal (no relation to Blaise) traveling into the underworld to do battle with the lords of Xibalba.
So if you want to really impress someone this holiday season, wish them a Happy 14th Baktun or “May you have a renewed Great Cycle!”
Follow Doug on Twitter: @Doug_Laney
Category: Uncategorized Tags: analytics, big data, data scientist, mayan
by Doug Laney | August 15, 2012 | 4 Comments
Tobin’s q is a simple ratio first posited by Nobel-winning American economist James Tobin in the 1960s to understand the relationship between a company’s market value and the replacement value of its assets. Analysis shows that this quotient has been growing since financial statements were standardized following the Great Depression. Smoothing economic boom and bust cycles via linear regression, Tobin’s q has more than doubled from 0.4 in 1945 to a predicted 1.1 in any given year currently.
This means that in general markets now value companies more than the sum of their tangible assets. How can this be? Non-reportable intangible assets of course.
We know that due to 75 year old accounting standards, certain intangibles cannot be valued and reported. These unreportable intangibles frequently cited include human capital and intellectual capital. Yet, could these alone have doubled over seven decades? Do corporations of similar revenue have twice the number of employees they once did? No, quite the opposite as we’ve become more efficient and reliant on technology. Do humans have twice the knowledge capacity than we did back in the day? Not only my teenager would fervently disagree with that.
Then what is it that companies have so much more of, has been accumulating for over half a decade, and that is hidden from balance sheets?
Ever since Arthur Andersen computerized a GE payroll plant in 1953, companies have become better and better at amassing information assets (leading up to this age of Big Data) and finding ways to leverage them. Yet the value of information isn’t quantified or reported in any way. Even today’s infocentric companies whose business models revolve around collecting, buying and selling data (e.g. Facebook, Google, Experian, Nielsen, etc.) have balance sheets devoid of their most valuable asset.
Furthermore, a study by intellectual capital research firm, Ocean Tomo, shows that the portion of corporate market value attributable to intangibles has grown from 17% in 1975 to a whopping 81% in 2010. Indeed, information accumulation has not only increased dramatically in businesses, but the importance of information itself has supplanted traditional assets in generating revenue, and therefore in contributing to market value as well.
So what are CEOs to do knowing that information comprises a majority of their corporate value? First, forget what the accountants say, and listen to what the market is saying. Stop just talking about information as such an important asset and start valuing and managing it like one.
For further reading on the topic of infonomics:
Category: Uncategorized Tags: big data, data, infonomics, information, information assets, information management
by Doug Laney | March 25, 2012 | 5 Comments
The research note, Emerging Role of the Data Scientist and the Art of Data Science, I authored with colleague Lisa Kart just hit the Gartner wires this week. Since most of the data scientist role dissenters we come across seem to believe that the role’s title is is nothing more than a pretentious moniker for a statistician or business intelligence (BI) analyst, we decided to take an…er…scientific approach to making that determination. We thought it would be entirely fitting to perform text analysis of hundreds of job descriptions for “data scientist,” “statistician,” and “BI analyst” to learn what the commonalities and differences are according to those actually hiring for the the role.
Data Scientist Job Description Wordcloud
I’d like to believe that these findings led us to more clearly define and distinguish the role of the data scientist, without speculation, than anyone else to-date. Through our research we learned that data scientists are expected to work more in teams, have a comfort and experience with “big data” sets, and are skilled at communication. They also frequently require experience in machine learning, computing and algorithms, and are required to have a PhD nearly twice as often as statisticians. Even the technology requirements for each role differed, with data scientist job descriptions more frequently mentioning Hadoop, Pig, Python and Java among others.
The piece then goes on to define and describe the three core data science skills: data management, analytics modeling and business analysis. But beyond these, there’s an art to data science. We detail several soft skills that our research showed are also critical to success, i.e., communication, collaboration, leadership, creativity, discipline and passion (for information and truth).
With the need for data scientists growing at about 3x those for statisticians and BI analysts, and an anticipated 100,000+ person analytic talent shortage through 2020, we also included a listing of university programs around the world offering degrees in advanced analytics.
Category: Uncategorized Tags: analytics, big data, busienss intelligence, data science, data scientist, statistician
by Doug Laney | February 6, 2012 | 1 Comment
Yesterday during the on-air buildup to the Super Bowl a reporter mentioned that over one billion people were expected to watch this year’s big game. It occurred to me how few of these individuals, including some Americans, fully understand what the Super Bowl really means. The next news story was about Super PACs (a new form of political action committee), and it occurred to me how, despite Stephen Colbert’s best efforts, even fewer people understand what a Super PAC is. So for both fun and education I created a little side-by-side comparison of the Super Bowl (and American football) versus a Super PAC (and the American elections).
|Enabled by antitrust exemption under the Sports Broadcasting Act of 1961
||Enabled by expenditure exception under the revised Federal Election laws of 2010
|Enables players to run for touchdowns
||Enables candidates to run for office
|Money comes from citizens and businesses
||Money comes from citizens and businesses
|Funds players’ lifestyles
||Funds candidates’ campaigns…and lifestyles
|Pays for hysterical ads
||Pays for histrionic ads
|Helps players get enshrined in Hall of Fame
||Helps a candidate get ensconced in Oval Office
|Players communicate with fans through the media
||Candidates communicate with fund through the media
|Fans can bestow with unlimited fame
||Fans can bestow with unlimited funding
|As a result of their fame, many individual players become corporations
||As a result of the courts, laws don’t discriminate between individuals and corporations
|Foreign teams not allowed to participate in US football
||Foreign businesses allowed to participate in US elections
|Initial goal is winning a series of playoff games in multiple cities; ultimate goal is winning the national championship
||Initial goal is winning multiple primary elections in multiple states; ultimate goal is winning the general election
|Offense wins games; defense wins championships
||Being offensive wins primaries; being on the defensive loses general elections
|Halftimes are spectacular
||Debates are spectacles
|Required to disclose injuries
||Required to disclose donors
|Players wear eye black
||Candidates get black-eyes
|Players leave it all on the field for their teammates and fans
||Candidates leave a little left over for themselves
|Coaches stand on the sidelines and call plays; quarterbacks audible
||Fund manager stands on the sidelines and call plays; candidates are audible
|Players make a bit more money each playoff game they win
||Candidates raise a lot more money each primary election they win
|Sports networks are the real winners
||News networks are the real winners
Ultimately the larger story for both the Super Bowl and Super PACs is about corporate influence. Super Bowl ads may be expensive, but the cost per second per viewer is on par with any other TV show. Moreover, due to social media these Super Bowl ads often take on a life in the Twittersphere, on YouTube and in Facebook after (and even before) they air, thereby enabling a business to reach a much larger audience than those viewing the ad when it aired. Many businesses also use the power of social media to actively engage potential customers by drawing them to their website or Facebook page. Think: Danica Patrick. Similarly, US elections are expensive, and reaching voters today also requires a social multichannel approach. Super PACs now provide the unbounded means for individuals and corporations from anywhere on the planet to influence US elections. So if your business wants to and has the financial means to reach a large swath of both consumers and voters, the Super Bowl and the Super PAC have got you covered.
Category: Uncategorized Tags: pac, politics, super bowl, superbowl, superpac
by Doug Laney | January 28, 2012 | Comments Off
Today the Gartner Information Management and Analytics Community held its weekly Twitter Chat, (Tweetchat, Tweetjam, TweetUp, whichever you prefer) to discuss concepts around big data, the role of the data scientist, and data quality. Over a half dozen Gartner analysts shared their ideas and research. (Where else can you get access to that many Gartner analysts in one place at the same time?) And dozens more individuals from other organizations also shared their perspectives and questions.
Big Data—Hey What’s the Big Idea?
First we discussed whether “Big Data” is an animal, vegetable or mineral, concluding that it has become very much a marketing term. Gartner analyst Andy Bitterer (@bitterer) jabbed, “Is Big Data nothing but a marketing play, since many organizations had ‘big data’ for a long time?’ Tim Elliott (@timoelliott) concurred, stating that “new terms arise because of new technology, not new business problems.” Esteban Kolsky (@ekolsky) thought the term was a more specific “marketing word used to describe the incredible volume coming out of social [networks].”
Yves de Montcheuil (@ydemontcheuil) suggested that organizations “have had Big Data all along but couldn’t get value out of it, except with lots of $$$,” and Gartner analyst Doug Laney (@doug_laney) agreed with a quip about Big Data being relative: “Big Data is merely data that’s an order of magnitude greater than data you’re accustomed to…Grasshopper.”
Hadoop was mentioned more than a few times as both an enabler and also a driver of big data, with Mark Troester (@mtroester) summing it up that the “hype of Hadoop is driving pressure on people to keep everything.” Some suggested archiving or even unloading data that is unused, but John Haddad (@JohnM_Haddad) and Martin Schneider (@mschneider718) both reminded everyone that data retention may depend on industry regulations and government mandates.
Some inquired about how to finding value in data so Doug Laney offered that there are two sides to that equation: 1) “looking beyond basic BI to advance analytics” and 2) “quantifying data’s potential and actual value.” Doug also summarized one of Gartner’s strategic planning assumptions for 2012: “Through 2015, >90% of business leaders say info is a strategic asset, yet <10% will quantify its economic value.” Gartner analyst Merv Adrian (@merv) admittedly had some fun with the notion of hidden value in data, asking, “Would it be a bad thing for organizations to say ‘Maybe there is value in the dark fiber of our information fabric?’”
The Art of Data Science
This led into a discussion about data science and the realization of data value. Gartner analyst Ted Friedman (@ted_friedman) wrote that it’s “good that analytics roles are becoming key, but ‘data scientist’ is a little bit elitist IMO.” Esteban disagreed contending that the term “scientist is not elitist, it defines a specific role.” Gartner analyst Carol Rozwell (@CRozwell) responded by suggesting, “But shouldn’t the average person be able to derive value from data?…[even though] some people refuse to see the truth in data.”
Nenshad Bardoliwalla (@nenshad) contended that the need for data scientists may be overblown. He believes that “Purpose-built apps can democratize making sense of Big Data for business folks without the need for data scientists (in some domains).” @Brett2point0 agreed, offering that “ideally end users should be empowered to explore their own data, seek their own insights through self-service.”
Gartner’s Doug Laney shared his analysis of current job descriptions for “data scientist” versus those for “BI analyst”. Key words in the “data scientist” job title include: design, knowledge, research, complex, learning, machine, models, problems, and performance; whereas top words used in “BI scientist” job descriptions are reporting/reports, company, technical, industry, user, sql, applications, and metrics. Tony Baer (@TonyBaer) and Doug agreed that communication is the skill that differs theoretical from applied science.
Mark Troester argued that someone needs to have “real intelligence to identify relevance and rationalize data,” and Jill Hulme (@jill_hulme) chimed that “a data scientist needs skills in math, engineering, writing, and a healthy dose of skepticism.” Adrian Bowles (@ajbowles) philosophized that a data scientist is like “a sculptor, finding a figure in material,” and that “Science is discovery, but not all who discover are scientists.”
Mopping Up with Data Quality
Finally we wrapped up with some thoughts on data quality in a Big Data context. Esteban claimed that “Big Data has compounded the [data quality] problem” and that now 40% of the data he sees now is bad. Seth Grimes (@SethGrimes) similarly lamented that “questionable data is the rule rather than the exception in my specialization areas: text and sentiment analysis.”
Yves thinks that “data volumes make it hard for traditional data quality architectures to keep up with big data.” However, Gartner’s Ted Friedman offered up another perspective that “data quality problems can be eased by big volumes in that individual flaws may have less impact when the data set is bigger.”
Mark Troester turned the idea of analytics on its head, recommending, “We shouldn’t just apply data quality for analytics, we should use analytics to help with quality.” He said he’s also “seen people so aggressive about cleansing that they cleanse away insight.”
When some participants suggested that data should ideally be cleansed at the source or when received, Doug Laney cautioned that “you can’t always cleanse data before storing it because of performance and the need to integrate and analyze it first.” Ted Friedman added that data quality is a “harder problem when organizations wish to use data they didn’t produce or don’t own it. The greater competency is assessing data quality…but that depending upon the usage and type of data, some you will still have to get nearly perfect.”
Thanks again to the following individuals and organizations for their participation:
@ajbowles @arbeiza @berkson0 @bgassman @bikespoke @bitterer @Brett2point0 @briellenikaido @chirag_mehta @cpreston64 @cpydimuk @CRozwell @datachick @DataIntegrate @DavideCamera @decisionmgt @DivineParty @donloden @doug_laney @eIQnetworks @ekolsky @erao @EventCloudPro @furukama @howarddresner @iam_joshd @infanteAL @InformaticaCorp @jamet123 @JayMOza @jessewilkins @jill_hulme @johndavidstutts @johnlmyers44 @JohnM_Haddad @JSussin @juliebhunt @loranstefani @marciamarcia @merv @mschneider718 @mtroester @Natasha_D_G @NeilRaden @NekkidTech @nenshad @OhThisBloodyPC @pishabh @RobertsPaige @RomanStanek @rqtaylor @ryanprociuk @s_pritchard @seamuswalsh @SethGrimes @SocialMediaJeff @StacyLeidwinger @stevesarsfield @Tanvi_MR @techguerilla @ted_friedman @timoelliott @TonyBaer @userevents @ValaAfshar @Vivisimo_Inc @wiseanalytics @XeroxDocuShare
Please join or follow Gartner’s BI, analytics and information management analysts each Friday at 12:00pm ET on Twitter at #GartnerChat.
Note: Some tweets have been edited slightly in this blog to improve their comprehension and/or enhance context.
Category: Uncategorized Tags: analytics, BI, big data, business intelligence, data quality, data science, data scientist, hadoop, information management
by Doug Laney | January 17, 2012 | 3 Comments
A major Wall Street securities ratings firm ignores the recommendations of a consultant report it paid for on rating collateralized debt obligations (CDOs)–contributing to the collapse of the mortgage industry, near-collapse of the banking industry and a multi-year global recession requiring $trillions in government (tax payer) dollars to avoid a full-blown Depression.
A major video game maker has millions of user IDs and credit card numbers pilfered, and spends many times more than was actually lost in revenue on bolstering its online security.
Thousands of credit cards belonging to Israeli citizens are exposed resulting in an actual military build-up in response.
A major retailer gets slammed by a Twitter and Facebook barrage then decides to implement a social media program.
A shipping line suffers numerous attacks by pirates off the Somali coast. They spend millions paying ransom, beefing security and reconfiguring routes.
The US Post Office continues to borrow from government coffers to run at a financial loss without making changes to its business model. Raising postage rates only exacerbates the problem.
And a an online shoe retailer announced yesterday the potential exposure of account information for as many as 24 million customers. What level of investment will they have to make to prevent this kind of event, let alone to identify and tie-up other loose ends?
True, major snafus are a part of business life, but knee-jerk budgeting in their immediate aftermath to prevent similar future incidents shouldn’t be. In a recent online discussion of the topic I referred to this kind of behavior as “blunderfunding.” So let’s make it official:
1. basing the level of investment in a business initiative upon the amount of loss incurred from a recent mistake or mishap
2. making a hasty outlay for a project to deflect or cover up for those responsible for a mistake
3. allocating monies or budget to fix a problem symptom rather than its actual cause
Tweet by Gartner analyst Doug Laney on 13 Jan 2012
“blunder”: n. a mistake, v. to make a mistake
“funding”: [fund] n. a collection of money for a specific purpose, v. to allocate money for a specific purpose
While examples of enterprise-scale blunderfunding make regular headlines, it is also pervasive throughout lower levels of most organizations. E.g. Buying “caution cones” to place when recently washed floors may be slippery–only after a hurried person or two did a back-side plant, or the overhaul of server farm air conditioning after overheating resulted in degraded online customer response times.
Some of these blunderfunded investments may be perfectly justified. That is the outlay is less than the risk-adjusted cost of their re-occurrence, and addresses the actual cause. In other cases the risk-adjusted loss (financial loss X the probability of re-occurrence) is much lower than the budget allocated to prevent any such problems in the future. Worse, and perhaps more frequent, money is allocated to fix, repair or even hide the symptom rather than resolve the root cause of the problem.
Organizations tend to compound the damage by neglecting to:
- calculate the actual economic loss
- estimate the likelihood of re-occurrence
- identify similar possible incidents
- compute the risk-based loss potential of future incidents
- discover the factors that led to this incident
- deal directly with the root cause(s), and avoid funding their resolution
What we’ve got here is also a recipe to avoid blunderfunding.
So why is it that we tend to see most blunderfunding is related to information mishandling, misappropriation and misuse? I believe this is because information asset are more easily accessed, more often in-movement, more easily transported. In addition, since information “theft” or “usage” almost never actually involves its depletion in any way (I.e. it’s merely copied not deleted), instances of information breach are that much harder to recognize. Finally, because information assets are not regularly covered by property rights laws, perpetrators if caught can get off easier than if they’d stolen actual “balance sheet” assets.
Just imagine, if you’re a criminal, what kind of loot would be better to heist than one in which:
- You steal it by sitting at your desk rather than scaling walls, dealing with armed guards or blowing up safes
- After you steal it, it still remains in place (as if nothing happened)
- You don’t need a fast truck to carry it off
- It is the kind of asset that increasingly makes up a large part of a company’s overall valuation
- Companies don’t measure its economic value, so they typically fail to manage or secure it with the same discipline as their traditional assets
- You can sell it multiple times to multiple black-market buyers (even on Amazon-like marketplaces)
- The courts only sometimes consider it to be covered under property laws
I’m not advocating cyber crime, just merely stating why organizations need to be proactive rather than reactive in securing their information assets, and to do so based on these assets’ actual computed value. The alternative is blunderfunding…and potentially more unwelcome headlines.
You can follow Doug on Twitter @doug_laney
Category: Uncategorized Tags: asset management, data security, funding, infonomics, information assets, information security, investment