Darin Stewart

A member of the Gartner Blog Network

Darin Stewart
Research Director
1 year with Gartner
15 years IT industry

Darin Stewart is a research director for Gartner in the Collaboration and Content Strategies service. He covers a broad range of technologies that together comprise enterprise content management. Read Full Bio

Google’s Knowledge Graph: Yeah, that’s the Semantic Web (sort of)

by Darin Stewart  |  May 17, 2012  |  2 Comments

Google is about to get a whole lot more useful. Yesterday, the search titan announced the “Knowledge Graph” a functional enhancement that attempts to provide actual information about the subject of your query rather than just a list of links. This might be helpful, but the really interesting bit is the part about the graph. As Google SVP Amit Singhal put it in his blog post:

 

The Knowledge Graph also helps us understand the relationships between things. Marie Curie is a person in the Knowledge Graph, and she had two children, one of whom also won a Nobel Prize, as well as a husband, Pierre Curie, who claimed a third Nobel Prize for the family. All of these are linked in our graph. It’s not just a catalog of objects; it also models all these inter-relationships. It’s the intelligence between these different entities that’s the key.”

 

That’s what a graph is, a structured set of meaningful relationships. The great challenge of the web is to bring some sort of useful order to the chaos of available online resources. Search is pretty good at finding stuff, but does little to show how things relate to each other. I am likely to miss huge swaths of useful information just because I don’t know enough to ask the right questions. I need a guide, something like a knowledgeable clerk in a bookstore or a good librarian who can point me to important titles and authors I would have otherwise missed. This is what Google is attempting to provide with the Knowledge Graph. Not just the answer to what you asked, but also the answers to the questions you probably should have asked. They are linking information together in a meaningful way and presenting the integrated results to the user. Pretty neat trick. Of course, the dirty little secret of the Knowledge Graph is that you don’t need to be Google to create one. You just need to know a little about how the Semantic Web works.

A couple of years ago, Google purchased a company called Metaweb. As part of the deal Google took ownership of Freebase a massive public database of Linked Open Data, data that is structured in a semantically meaningful way and linked to other useful information. In other words, Freebase was a huge graph of knowledge available to the public, one of many. With a few tools, some semantic know-how and a bit of elbow grease, you could create your own knowledge graph that integrated these public sources with your own internal, proprietary data. The biotech and intelligence industries have been doing it for years.

Google mentions Freebase in passing, but otherwise doesn’t say much about the semantic sources they are leveraging. I think this is the result of a couple of trends in the semantic realm. Last year I wrote a document for Gartner entitled “Finding Meaning in the Enterprise: A Semantic Web and Linked Data Primer.” In a section on the future of the Semantic Web, I said:

 

Semantic technology vendors … are beginning to learn that their customers don’t want to hear about ontologies, inference rules, and other nuances of the semantic technologies underlying their products. … As a result of this dynamic, semantic technologies are being absorbed into the platform and hidden from users. This trend will continue as more and more platforms add semantic capabilities and adopt semantic standards.”

 

When published, this document was received with the deafening sound of … crickets. I shouldn’t have been surprised. Unless you are an information science geek, it can be hard to relate to this stuff. One vendor recently reported that, during a meeting with a potential customer, “the client put a hat in the middle of the table and said that anyone who used the word ‘ontology’ would have to put a dollar into it.” Google understands this and is using it to its advantage, and potentially to our disadvantage.

The Knowledge Graph is not on a par with PageRank and the rest of the Google secret sauce. While they have certainly invested a lot of resources and brain sweat in Knowledge Graph, Google didn’t invent Linked Data and certainly didn’t create that vast majority of the information they are exposing. Linked Open Data is a public resource created by countless hours of effort from anonymous stewards. Acknowledging that contribution would not only be respectful, it would incentivize the creation of even more Linked Data, which would in turn make the Knowledge Graph even more powerful and valuable. The potential for a virtuous cycle is being missed here. Google has done a tremendous service in exposing some Linked Data to the end user. They could do a much greater service if they exposed it as a SPAQRL endpoint. Somehow I don’t expect it to show up in the Google API anytime soon.

I’ve expressed concern over the privatization of the semantic web before. I don’t think this is quite the same thing. Maybe this is more of a “don’t show us how the sausage is made” dynamic. It’s hard to blame Google for letting people assume the Knowledge Graph is more of their magic. But if IT leaders and practitioners continue to think they can’t do this stuff because they aren’t Google, opportunities are going to be missed. In fact, they already are. I find it ironic that one of the objections raised to the Semantic Web is that it all sounds too much like science fiction. In his blog post Singhal hails the Knowledge Graph as Google’s first baby step towards the Star Trek computer. If we don’t start to step up, when that computer eventually materializes it will be ad-driven. We need to get more comfortable with semantic technologies and bringing them into the enterprise. The more Linked Open Data available, the more powerful the graph becomes for all of us. It’s time to get more involved or as Jean-Luc Picard might say, “Engage!”

2 Comments »

Category: Knowledge Management search Semantic Web     Tags: , , , ,

Mobile Devices are the Convenience Stores of the Web

by Darin Stewart  |  May 14, 2012  |  1 Comment

Adobe conducted a study last year that found customers visiting a website from a tablet are more likely to make a purchase than those visiting from a desktop. They also spend more per purchase, as much as 21% more. This trend has not gone unnoticed by retailers and other companies looking for ways to expand their market reach. Improving mobile presence, including location-based services, personalization and tracking capabilities will be the core focus of most online retailers over the next 18 months.

Despite this commitment, most companies struggle with effectively engaging a mobile audience. They find that their homepage, carefully crafted for a desktop browser, drives away mobile visitors. Layering on the out-of-the-box mobile profile provided by the WCM system only makes matters worse. The reason is that a “shrink to fit” approach to the mobile web is not a viable solution. Mobile users have different goals and behaviors than their desktop-bound equivalents. This is true even if the mobile visitor and the desktop visitor are the same person accessing your website in a different context at a different time. For example, a desktop visitor spends more time on a homepage with a lower bounce rate than they will with other content and pages across the site. For mobile visitors, this trend is reversed with longer visits to content pages and little attention given to the homepage.

This is largely due to when mobile users come to your website, such as when they are standing in line at the grocery store. 80% of mobile web access happens during a user’s miscellaneous down time. People pull out their mobile devices when they have just a little time to kill. Whether they are waiting for a meeting to start (or under the table after the meeting has started), waiting for their kids after school or in line for a bank teller, that is when they are most likely to go to the mobile web. This leads to a convenience store approach to web surfing.

When a visitor is comfortable in their office or den, with a luxurious screen and full-sized keyboard in front of them, they are likely to spend some quality time on your website as they would perusing the aisles of a full service, brick and mortar retail store. When they only have a minute with a tiny display and cramped keypad they want to dash in, get what they need and move on. A mobile web presence needs to fulfill this need. At the same time, it should encourage the user to return to your flagship store, the desktop oriented website, when they have more time. 59% of visitors to a mobile website later follow up with a visit to the main website on a PC. When they do, you should be able to greet them at the door by name.

Content Mobility is about more than simply ensuring that your content is readable on a two-inch screen. It is about precision content delivery. When users are interacting with your content for shorter periods of time, with a smaller display and more distractions, it is critical to make the most of that limited window of opportunity. This requires more than optimizing content for display on a small screen. It is requires leveraging the unique affordances of a mobile device and creating a cohesive, focused, conversational experience for the user that crosses channels, platforms and sessions.

1 Comment »

Category: Mobile web content management     Tags: , , ,

Leveraging Expertise Beyond The Enterprise

by Darin Stewart  |  May 8, 2012  |  Comments Off

I recently received an invitation to attend the VIVO Implementation Fest being held this month in Boulder Colorado. VIVO is an open source, expertise discovery platform for the semantic web. It enables the discovery of research and scholarship across disciplinary and administrative boundaries including across institutions. It does this through interlinked public, profiles of people and other research-related information. The goal is to create a national network of scientists. In the three years since its creation, VIVO has gone a long way toward doing so and the innovations have followed. I think industry could learn a lot from this effort.

Science is complicated; multi-disciplinary science even more so. No single company, no matter how large, is going to have all the expertise and resources necessary to exploit every innovation or discover the next blockbuster product. Pharmaceutical giant Merck realized this over a decade ago in their 2000 annual report:

“Merck accounts for about 1 percent of the biomedical research in the world. To tap into the remaining 99 percent, we must actively reach out to universities, research institutions and companies worldwide to bring the best of technology and potential products into Merck. The cascade of knowledge…is far too complex for any one company to handle alone.”

This is as true for manufacturing and merchandising as it is for medicine. Innovation in all industries still tends to follow the traditional “man of genius” model, which has held sway from Edison’s Menlo Park to AT&T’s Bell Labs. We hire smart people (hopefully a few of them are really smart) give them resources and hope for the best. We stick to a vertical integration model in which internal research and development activities lead to internally developed products that are then distributed by our own company through our own channels. This model is no longer competitive. The world has moved from an environment of knowledge scarcity to one of knowledge abundance, but most of that knowledge doesn’t reside in our own firm. We go to great lengths to hire the best and the brightest, but inevitably, as Sun Microsystems cofounder Bill Joy is fond of pointing out, the smartest and most talented people still work for someone else. That doesn’t mean they can’t also work for you.

Universities and academic research centers exchange faculty and share facilities all the time. This quickly and inexpensively expands the capabilities of a team and institution for the situation at hand, enabling them to undertake projects that would otherwise be out of reach. The exchange of knowledge enriches all participating institutions long after the project ends and the team disbands. The social networks established and strengthened by the collaboration tend to outlive the project that facilitated their creation. Why should this dynamic be restricted to the ivory tower of academia? Companies make a lot of noise and spend a lot of effort trying to foster collaboration among their own teams and departments, but that is usually where it ends. We rarely look beyond our own staff, resources and business models to find non-obvious opportunity.

It shouldn’t be this way. The building blocks for expertise discovery and exchange already exist in most organizations. They just aren’t being leveraged. I discussed this in a previous post “Knowing What You Know: Expertise Discovery and Management” nearly a year ago. In the interim, the tools have improved, the opportunities have grown and the available relevant data has exploded. We should now take the next step. In addition to exchanging information, we should start exchanging bodies. We’ve embraced this approach for decades in the form of consultants and professional services. We get their expertise when we need it and they get our billable hours when they want it. The side effect of this arrangement is that we pay consultants to increase their own knowledge and competencies. We can get a bit of that to if we are willing to pay extra for “knowledge transfer.”

With a bit more openness and coordination, it is possible to move toward a less mercenary footing. Consultancies will always be useful, but loaning and receiving staff with partnering firms results in a much richer collaboration and deeper knowledge transfer. Billable hours can be replaced with simple reciprocity or even in some cases by jointly owned intellectual property. The terms of the exchange will of course be negotiation and tailored to circumstances, but the end result is that you may gain access to a person you could really use, but can’t hire. In return, that person gains new experience, new context and the all important social ties that form the bedrock of professional networks. The future of business is collaboration, not just between departments, but between companies. Documenting the expertise that exists in your organization and expertise that you need is a good first step. Publishing that information or a circumscribed portion of it, in the manner of VIVO and its compatriots, is a good next step. The smartest people may work for someone else, but that doesn’t mean they can’t help out once in a while.

Comments Off

Category: Collaboration Knowledge Management Semantic Web     Tags: , , , ,

The Real Problem with ECM (hint: it isn’t the platform)

by Darin Stewart  |  May 4, 2012  |  3 Comments

I talk daily to companies from a broad range of industries. These organizations run the whole gamut of company sizes, from small boutique operations to huge distributed enterprises. Even with that diversity, everyone wants to talk about content management. That’s probably for two reasons. First, regardless of your industry or company size, you depend on information and content to do business. To a greater or lesser extent, every organization is a content intensive enterprise. The second reason is that everyone thinks their content management practice is broken. They’re usually right. Content has become so voluminous and diverse in its forms and how it comes into the enterprise, that pretty much every organization experiences some level of content related dysfunction. This is a big problem, hence why they call Gartner.

Knowledge and information are among the most valuable assets any organization possesses. Most of those assets (Gartner pegs it at 80-90%) exist in the form of unstructured content, such as documents, rich media and web assets. Companies sense that there is untapped value to be had from those resources. Intelligence and insight are trapped in forgotten and inaccessible documents. Money is lost due to inefficiencies in content creation and use. Without consistent and reliable access to these assets it is difficult for an organization to function efficiently and impossible to perform optimally. Companies want to get control of their content, but don’t know how to go about doing so.

So we blame the platform. Actually, we blame the IT guys and then they blame the platform. We start looking at all the moving parts, the search engine, the repository (or more likely repositories. They tend to proliferate like mushrooms), the authoring tools. But the platform is only part of the problem, an often isn’t to blame at all. The real problem is primarily the content itself and the processes and practices surrounding its lifecycle. That is what ECM is really about. Not the technology.

At the root of this issue this the fact that most enterprises simply don’t know what unstructured content they have. Interestingly, they often do have a handle on structured content. For example, they usually know how many customer databases they have and which systems maintain them. However, most information managers would be hard pressed to provide definitive answers to basic questions about their unstructured resources. Where is a particular piece of content? Who owns it? What version is current? How long should we keep it? Answers to such rudimentary questions remain out of reach for most organizations. It is easy to say that there is just too much content to be managed, but this misses the point. If you don’t know what you have, you cannot say you have too much. It is entirely possible you have too little content, or too much of the wrong sort. The real issue is that most organizations have too much unmanaged content.

Unstructured content tends to grow in an uncontrolled, ungoverned manner. Users create, distribute and store information according to their own needs. When they cannot find information they will recreate it. This leads to the ongoing proliferation of redundant and often conflicting content. Organizations in general and IT departments in particular do not know how to arrest and reverse the situation. The most common response when leadership complains is to simply provide more storage thus kicking the can down the road. They never directly address the content problem.

I recently took a look at this problem of creating an effective ECM environment and boiled the process down into six steps. (These are elaborated in the Gartner Solution Path “Creating an Effective ECM Environment“).

  1. Review Content Lifecycle and Define Requirements.
  2. Determine Appropriate Form of Content Management.
  3. Evaluate Current State of Your Content.
  4. Establish ECM Governance.
  5. Establish Content Management Environment.
  6. Perform Ongoing Content Hygiene and Enhancement.

 

Each of these steps is applicable to all ECM environments. The extent to which they are implemented will depend on resources and circumstances. The most important thing to remember is that you don’t create an effective ECM environment overnight and you don’t do it all at once. Too many companies start with a vague sense that things aren’t working and try to boil the ocean. I’ve seen a lot of rip and replace exercises triggered by a single, highly visible (and often unrelated) incident ranging from a failed discovery request to a CEO with an iPad. Fire drills and knee jerks are never the foundation of a solid content strategy. You have to know in advance what you are trying to accomplish and what the desired end state should look like. Once you have that vision articulated, stick to your roadmap and you’ll get there one step at a time.

3 Comments »

Category: Enterprise Content Managment     Tags: , ,

Cloud Content Management Is Not A Cure-All

by Darin Stewart  |  May 2, 2012  |  1 Comment

I am not yet fully converted to the gospel of Cloud Content Management. The data center is undeniably in decline, but it’s not quite dead yet. Despite the evangelism of cloud-oriented vendors, moving things off-premises is not always a good idea. This is especially true for content management.  Don’t get me wrong. In many cases moving to the cloud is a slam dunk.  Web Content Management is a good example.

Serving up basic web content from an on-premise data center is more-or-less indefensible these days. Devoting scarce resources to the care and feeding of a WCM platform and the supporting infrastructure makes no sense when the content is intended for a broad public audience beyond the firewall. A SaaS WCM solution (as opposed to simply moving your platform to some outsourced hosting environment) can handle the traffic spikes, emergency content changes and platform upgrades that would otherwise consume your staff and frustrate your users. Remember though, basic web content, is the operative phrase here.  This rosy picture starts to break down a bit once your web offerings move beyond brochure-ware.  Tight integration with on-premise backend systems can still present a challenge for cloud solutions.  For example, driving web personalization off of a CRM system can create a powerful web experience.  Integrating a legacy CRM platform entrenched in the enterprise infrastructure with a WCM personalization engine floating around the cloud can be a nightmare.  Sometimes our old plumbing gets dragged along with us when we move outside our own walls.

Then there is the matter of security. Yes, technically this is a non-issue.  If you layer on the right security protocols, ensure the content is encrypted both in motion and at rest and manage access appropriately, keeping your content in the cloud isn’t a whole lot different from keeping it in your own data center down the street.  The big difference is that in your own data center, you’re the only one with the keys.  The Patriot Act empowers the U.S. government to compel any organization to turn over any and all data they may possess, including yours, without informing the data owner that they have done so.  For many non-U.S. companies, this makes the cloud a non-starter. If your content management provider also provides your encryption, your data is effectively wide open to Uncle Sam. Companies like CipherCloud have seized on this as a selling point for their cloud-based encryption services.  Your data may not be in your own data center, but at least you are the only one who can decrypt it.  The government can still demand your data, but at least you’ll know about it. This is more of a concern with documents than public web content, but it is a legitimate concern.

The list of concerns, exceptions and corner cases goes on and on. The point is that while the issues of security and privacy can be addressed in the cloud as well as they can on-premises, the cost and complexity of the solution can potentially outweigh the benefits and savings. What is emerging is a multi-tiered hybrid approach to content management that leverages both the cloud and the data center.  Content is being segregated into two tiers: critical and collaborative.  These broad categories provide a reasonable principle of division for the current state of cloud content management.

Critical content presents either high value or high risk to the enterprises and in many cases both.  It tends to be stable and finalized in its form. Depending on your industry, you may be legally required to declare certain content as critical and therefore subject to compliance requirements and records management.  In other cases, content may be critical simply because of its role in the enterprise. Collaborative content tends to be of “lesser” value and represents a lower risk to the enterprise than critical content.  It also tends to be much more volatile. Collaborative content is meant to be shared.  This can be while the content is being developed and reviewed or as the end result of that process.  At some point in its lifecycle, collaborative content may become critical content.  When this happens there is a well defined point, process and procedure for the transformation. Or at least there should be.  That hand-off can also provide a very nice interface between the data center and the cloud.

The paper free office never materialized.  The data center free infrastructure has yet to materialize.  It may happen, but we are not there yet, especially in the context of content management.  At this point, it is still necessary to look very closely at what you need to do with your content and assess the costs and risks associated with both cloud and on-premise solutions.  Sometimes it makes sense to move critical content into the cloud.  Sometimes collaborative content needs to stay behind the firewall.  Neither approach meets all needs under all circumstances in an effective manner. For now, its best to have a foot firmly planted in both worlds and to keep your options open.

1 Comment »

Category: Collaboration Enterprise Content Managment web content management     Tags: , , , ,

Musings on eBook Publishing

by Darin Stewart  |  February 16, 2012  |  3 Comments

I own a lot of books.  Our family library (an enclosed bay of our garage) is lined floor to ceiling with shelves sagging under the weight of a few thousand cloth-bound volumes.  I’m also an avid fan of electronic readers. At one time or another, I have owned just about every eReader ever produced. Last night, I purchased a new Barnes & Noble Nook Simple Touch. We’ve come a long way from Sony’s original DD-1 Electronic Book Player with its ascii texts on mini-cd, but not as far as I’d like.  As I was loading eBooks onto my new device I had to explain (read: justify…again) to my wife why I often buy both the electronic and physical versions of any given title.  The short answer is,  “I need both.”

I prefer the reading experience on an eReader.  When lying in bed, a 1,000 page novel can be a bit cumbersome and carrying a dozen books through airport security is not fun. My nook weighs about as much as a hearty sandwich, holds hundreds of titles and fits in my pocket. This is great for casual reading and traveling, but eReaders still fall flat when it comes to research and reference. When I need to find to a specific passage or some annotation I made while reading (I am a compulsive margin scribbler), I’ll take hardcopy every time.  It just isn’t possible to “flip through” an eBook. So I buy the electronic copy to read and the physical copy to reference. (yeah, my wife doesn’t buy the argument either).

So why doesn’t the publishing industry offer me a package deal?  Sell me the physical book and throw in an access code that lets me download the electronic version.  I would be much happier paying the full retail price of the book, maybe even a bit of a premium, if I didn’t have to make a separate purchase to take it with me on a plane.  This is already the model with many Blu-ray DVDs;  buy the physical disk and get a code to download the movie to your iPad.  Not only would this make me (and my wife) much happier and less concerned about discount prices, it could also bolster Brick and Mortar bookstores.  I’d be much more likely to make the trip to a physical store to peruse titles, if I knew that any book I buy will also be available on my Nook or iPad (I’ve given up on the Kindle ever being open).  Don’t make me choose between books and bits.

While there I’d probably buy a biscotti and cappuccino as well.  hmm…lots of cross-selling potential…

3 Comments »

Category: Uncategorized     Tags: , , , ,

Sometimes it’s good to be a little evil

by Darin Stewart  |  February 2, 2012  |  2 Comments

The Wall Street Journal maintains a list of websites that collect information about their visitors and sell it to marketers.  The associated  “What They Know” infographic ranks the 50 most popular U.S. websites according to an "exposure index" determined by the degree to which each site exposes visitors to monitoring. The top site, dictionary.com, boasts 239 trackers for each visitor: 159 cookies, 23 flash, 41 beacons and 11 first party. I mentioned this to my wife who is a dictionary.com addict.  Her only comment pretty much sums up the reaction of most people when then learn their online activities and interests are monitored. “That is so evil.” As husbands are expected to do, I adopted a solemn expression and nodded my head in agreement.  Secretly, I was thinking about all the cool ways that information could be used to improve the online experience.

Overly aggressive and intrusive marketing is not my idea of an improved online experience. However, when I visit a news portal it should know that I’m a science junkie and have never read a sports related article in my life. When I visit a technology vendor’s website, it should remember that I’m an analyst, not a consumer. It should present me with technical and functional details rather than shill the vendor’s products.  With a little user history and the judicious use of metadata, its really not that hard. Unfortunately, this just doesn’t seem to occur to most website publishers and that treasure trove of tracking data is wasted. 

The missed opportunity is even more tragic when mobile devices enter the picture (and at this point, mobile devices ARE the picture).  A smart phone or a tablet bends over backward to tell a website where it is, what it can do and what type of content it wants.  You can and should do more with that information than simply serve up a stripped down version of your homepage.  If I visit a public transit website from my iPhone, chances are I’m not looking for annual pass options or a history of the Portland bus system.  I want to know where the nearest stop for the 96 express is located and when the next bus arrives (and I don’t want to install a dedicated app to do so!).  When I visit that same website from home, it should know that I always seem to ride the 96 and that I usually just miss it.  That little bit of tacit information, gleaned from my history and mobile habits, can facilitate a tailored online experience that goes beyond micro-segmentation to make true personalization practical.

When I access an online resource from a mobile device, I want quick, targeted information relevant to my immediate situation. When I access that same resource from my desktop, I want more details, more options and more aesthetics. Most importantly, I want the two experiences linked together into one, ongoing, conversational relationship.  If I have to reintroduce myself every time we meet, chances are we are not going to become friends. A comprehensive cross-channel strategy can leverage user history and contextual information to provide a cohesive experience across devices and across sessions. If this is the goal of your tracking cookies and beacons, its okay to be a little evil.

2 Comments »

Category: web content management     Tags: , , , , ,

Searching in the Echo Chamber

by Darin Stewart  |  January 4, 2012  |  1 Comment

At a time when technology should be broadening our information and knowledge, it may in fact be narrowing our minds. Attention management is becoming an instinctive self-preservation behavior.  We don’t have the time and energy (nor the interest) to read and evaluate everything presented about every issue. So we filter. Whether we admit it or not, we tend to filter out what does not reinforce our worldview. This dynamic is nothing new, but in the past this selective-openness was largely self-inflicted. Now we are beginning to bake bias into our technology.

This dynamic played out recently when PolitiFact.com announced their annual “Lie of the Year” selection. The fact-checking website chose the Democratic Congressional Campaign Committee’s claims about Medicare as the most egregious distortion of 2011.   This news isn’t particularly interesting in and of itself. As in war, the first victim of an election year is always the truth.  Politicians lie.  Always have.  Probably always will.  Thanks to services like PolitiFact.com and FactCheck.org, it’s now a bit easier to call them on it. We just don’t like it when a foul is called on our team.

The major fact checking sites are equal opportunity whistle-blowers. The Republicans held the Lie of the Year title for the previous two years. During those two years conservative pundits vilified the site while progressives praised it. Now that the Democrats wear the “pants-on-fire” distinction, Politifact’s former champions are on the attack. Gawker’s Jim Newman went so far as to call it “dangerous” and the New York Times proclaimed it “dead”. A few days later PolitiFact editor Bill Adair responded to the uproar.  In a brief article, he laments the fact that most discourse now takes place in an echo chamber.

At a Republican campaign rally a few years ago, I asked one of the attendees how he got his news. “I listen to Rush and read NewsMax,” he said. “And to make sure I’m getting a balanced view, I watch Fox.”

My liberal friends get their information from distinctly different sources – Huffington Post, Daily Kos and Rachel Maddow.  To make sure they get a balanced view, they click Facebook links – from like minded liberal friends.

This is life in our echo chamber nation.  We protect ourselves from opinions we don’t like and seek reinforcement from like-minded allies.

We tend to reinforce our views and values by surrounding ourselves with people who see things our way.  As Bill Bishop notes in his book The Big Sort, “we have segregated ourselves into enclaves of people who look like us, talk like us and act like us.”  I would add that too often this translates to people who think like us and believe like us.  This isn’t just reflected in our neighborhoods and social clubs, but increasingly in our online activities. 

Personalization has been a feature of Google for years. If the search engine knows who you are (and it usually does) it is going to tailor the SERP (Search Engine Results Page) to your history, behaviors and preferences.  Facebook amplifies the filtering effect of your self-selected social network by presenting only “important” updates and allowing you to “hide all stories by…”.  This of course lets you publish your views without having to see any response or rebuttal. Eli Pariser has written an excellent introduction to the subject in his book The Filter Bubble: What the Internet Is Hiding from You.

Personalization is now moving to the next level in the form of social search.  Search engines consider many factors when matching search terms to content and ranking them according to relevance.  Social search adds a new ingredient to this secret sauce.  In addition to more or less objective relevance criteria (yes, I’m ignoring for now how sponsored searches cook the books) social search takes your social graph into account .  If your Facebook BFF likes a story, its likely that you will like it too and so it gets boosted in the SERP.  When searching for music, restaurants and movies this can be useful.  When searching for news or competitive intelligence, it can be myopic. If our friends and colleagues look, act and think like us, our social search engine results are unlikely to be “fair and balanced.”

Microsoft’s FUZE labs is developing what is potentially the most ambitious manifestation of social search to date. The new site, called So.cl, has noble aspirations.  In a Technology Review interview Lili Cheng, the Microsoft researcher who led development of So.cl, says "So.cl is really an experimental research project focused on how social networking and search can be used for the purpose of learning." So.cl made a brief public debut last July but access was quickly limited to the University of Washington, Syracuse University, and New York University. This was likely due to Microsoft’s insistence that So.cl is about learning how people learn. Cheng went on to say "The project isn’t specifically for formal learning, but learning as a general activity on any topic."  So keeping it on campus might make sense.  Then again, isn’t that how Facebook got started?

Like any technology, search personalization and social search can be used for good or evil.  It can be a bug or a feature.  The trick is to be aware of what your search engine of choice is doing behind the curtains and compensate for bias as necessary.  I may be a bit hypersensitive to this issue as we head into an election year, but we would do well to heed the advice of John Stuart Mill at all times.

"It is hardly possible to overrate the value…of placing human beings in contact with persons dissimilar to themselves, and with modes of thought and action unlike those with which they are familiar…Such communication has always been, and is peculiarly in the present age, one of the primary sources of progress.

1 Comment »

Category: search Social Computing     Tags: ,

It’s time for taxonomy

by Darin Stewart  |  October 12, 2011  |  4 Comments

I’ve been writing and speaking about taxonomies and metadata for a little over a decade.  In the early days, my audiences consisted mostly of library science refugees seeking shelter in corporate IT departments.  I considered myself lucky if there were a dozen people in the room.  Last week I attended the annual Microsoft SharePoint Conference in Anaheim, California, and realized things have changed a bit.  The session on taxonomies was held in a room with a capacity of 900 people.  It was standing room only.  People are finally starting to “get it”.  Five years ago, taxonomy was all about “findability”.  Consistent terminology and tagging makes search engines work better and navigation easier to…well navigate.  This is as true as ever but today taxonomy and metadata are more about content lifecycle management running the gamut from content creation to disposal.  It is finding its way into every corner of the enterprise.

With popularization comes the increased likelihood of dilution.  As people, vendors in particular, jump on the buzzword bandwagon and co-opt terminology for their own nefarious purposes, concepts get muddled and best practices are lost.  At the conference, I heard the phrase “unstructured taxonomy” being thrown around.  This is an oxymoron at best and utter nonsense at worst. A taxonomy, by definition, is a structured vocabulary.  The hierarchy is the whole point.  There are other forms of vocabulary that are unstructured, but they are not taxonomies.  The offending vendor in this case was attempting a neologism for “folksonomy” and in the process confusing his audience and annoying the analysts.  (maybe it was just me).

As people start to get religion with metadata, other heresies are sneaking in as well.  The most common I’ve encountered recently is managers placing artificial and arbitrary constraints on vocabularies.  I’ve heard teams say things like “we are not allowed to have more than 200 terms in the vocabulary” or “a document can’t have more than two tags”.  When pressed for the motivation behind such strictures the answer usually amounts to “we want to keep it simple.”  A noble goal, but too often one issued as a fiat rather than as the result of analysis.  Simple means the least amount of work necessary, but no less.  It should be driven by functional requirements (and possibly platform limitations), not by artificial mandates from on high. Decision makers are starting to understand the benefits and potential of managed vocabularies and metadata, but don’t yet understand the practice of managed vocabularies and metadata. 

Start with the standards. Z39.19 and ISO2788 are as close to scripture as it gets in the taxonomy world, though ISO 25964 “Thesauri and Interoperability with other Vocabularies” should soon be canonized as well.  Invest in training.  The practice is mature enough and the community large enough that you no longer need to go it alone.  Don’t’ reinvent the wheel.  Vocabularies and metadata frameworks are available for most common domains.  License and modify is usually more effective than from scratch DIY.  And of course, call Gartner for guidance.

It was gratifying to see so many people packed into a session on taxonomy at the SharePoint conference.  The practice has come a long way, but some things never seem to change.  In the early days, metadata champions were a small group of oddballs that couldn’t get funding for their projects.  Now, it is managers, architects and business analysts who still can’t get funding for their projects.  The practitioners have seen the light.  Now we need to convince the people who sign the cheques.

4 Comments »

Category: Collaboration Enterprise Content Managment metadata taxonomy     Tags: , , , ,

Schema.org: Webmaster One-Stop or Linked Data Land Grab?

by Darin Stewart  |  June 4, 2011  |  6 Comments

Yesterday, Google, Microsoft and Yahoo! jointly announced schema.org, a new service intended to “create and support a common vocabulary for structured data markup on web pages.”  The idea is to provide a library of vocabularies that can be used in conjunction with the W3C HTML Microdata format to embed machine-readable data into webpages in a manner that can be fully exploited across search engines.  This is being pitched as a breakthrough among the big search engines, namely Google, Bing and Yahoo!  A shared vocabulary should make life simpler for everyone. Developers now only have to deal with one flavor of markup and should have the foundation for richer search functionality in the future.  Search engines know what to expect and how to leverage it.  Users get a slicker and more meaningful result set when they search. My initial reaction is don’t drink the Kool-Aid.

My concern is the announcement’s claim that “the site aims to be a one stop resource for webmasters looking to add markup to their pages.”  This may simplify some coding, but it also locks you into a system that is not under the direction or control of the web community as a whole.  Rather, the vocabularies are driven and controlled by the interests and objectives of a small group of corporate interests.  This has rarely proven to be a good thing for the web. It is also unnecessary.

All of the capabilities promised by schema.org are already fully supported in a richer more scalable manner in the form of RDFa and the Linked Data approach to the Open Web.  As I discussed in an earlier post (When does “Semantic” really mean Semantic?) Linked Data leverages four fundamental principles to provide consistent, machine-readable access to structured (and semi-structured) content on the web.  Schema.org appears to be Linked Data Lite with extremely limited support for vocabularies outside of the service.   It may be more comfortable for webmasters, as the microdata approach keeps things squarely in the HTML world (or at least the HTML5 world). However, that familiarity may come at the cost of flexibility and functionality.  At first brush, schema.org seems like little more than semantic search engine optimization.  I may be proven wrong.  It’s happened before.

Google makes an attempt at being “Big Tent” with the RDFa and microformat camps.  They indicate that RDFa will still work if your markup is “currently supported by rich snippets.”  The implication seems to be that they’ll let you use what’s in place, but if you want to extend it you’re on your own.  There is a subtle air of intimidation throughout the schema.org announcements and documentation.  While not stated overtly, the implication is that if you adopt the microdata approach you will be well treated by their search algorithms.  Those who stick to RDFa and microformats are likely to get lost in the crowd or even pushed to the bottom.  Again, I could just be paranoid, but this is Microsoft and Google we’re talking about. Whatever happened to “do no evil?”

Then there is the concern about competing formats.  Does the schema.org Person definition (http://“schema.org/person”) compete with the RDF Friend of a Friend (“http://xmlns.com/foaf”)?  Support of multiple vocabularies is baked into Linked Data and alternate definitions are nicely accommodated.  Schema.org appears to be of a more jealous sort, demanding exclusivity with your markup.  Again, there appear to be some limited provisions for extending their vocabulary and continuing with your current markup, but it seems to be more for pacification purposes than true integration and interoperability.

This is all a first (admittedly knee-jerk) response to what could potentially be a boon to webmasters and search users.  Microsoft, Google and Yahoo! could have only the best interests of the web in mind and all could turn out just swell.  Things seem to have worked out okay with sitemaps after all. Here’s hoping. It will be interesting to see how the semantic web and search communities react to this development. Next week, I will be attending the Semantic Technologies 2011 conference in San Francisco  where I’m sure Schema.org will be one of the hot topics of conversation.  Tune in next week for reports from the front lines at SemTech2011.

6 Comments »

Category: Semantic Web     Tags: , , , , , ,