Gartner Blog Network


Enterprise Search and Distributed Computing Are the New NoSQL

by Darin Stewart  |  April 7, 2014  |  4 Comments

In my last post I discussed how enterprise search can bring big data within reach. To achieve this, however, crawling and indexing must move beyond traditional vertical scaling and move into a truly distributed model ala Hadoop and its cousins. The end product of integrating enterprise search and distributed computing is a scalable, flexible, responsive environment for information discovery and analytics. The system scales easily and efficiently in terms of both content size and query handling capacity. Both batch oriented content processing and near-real time information access can be supported. The key-value oriented nature of MapReduce along with the flexible schema and dynamic field support of the search engine allow any form of content, structured, unstructured and semi-structured, to be fully leveraged. In short, the enterprise search infrastructure becomes a powerful NoSQL database.

There is currently no canonical definition of what constitutes a NoSQL database. According to Wikipedia, “NoSQL (Not only SQL) is a movement promoting a loosely defined class of non-relational data stores that break with a long history of relational databases. These data stores may not require fixed table schemas, usually avoid join operations and typically scale horizontally.” Even so, there are certain characteristics all NoSQL databases share. First and foremost, NoSQL supports schema-on-read which means that any form of information can be written to the database with a schema only being applied when that information is retrieved for use. This reverses the schema-on-write approach of traditional relational databases in which information must be made to conform to a particular schema before it can be stored. In addition, NoSQL databases prefer eventual consistency over support for ACID compliance and strict consistency. A search-driven approach to Big Data can facilitate each of the four models for NoSQL information management described in that document: key-value stores, document databases, table-style databases and graph databases.

The search-driven approach I describe in the Gartner document The New NoSQL: How Enterprise Search and Distributed Computing Bring Big Data Within Reach offers several additional advantages that are likely not a standard part of more pure-play NoSQL solutions. An enterprise search platform will offer text oriented features that simplify the index generation process. Such standard features will include free text search, faceting, spell checking, vocabulary management, similar item search, hit highlighting, recommendation engine, visualizations, content rating and many other capabilities that will augment and enhance an analytical index. The combination of NoSQL capabilities with the near-real time information access afforded by enterprise search and the ability to do so in the cloud have the potential to unlock the full value of enterprise information assets and finally bring Big Data within reach.

 

Category: big-content  

Tags: big-content  big-data  enterprise-search  nosql  

Darin Stewart
Research Director
1 year with Gartner
15 years IT industry

Darin Stewart is a research director for Gartner in the Collaboration and Content Strategies service. He covers a broad range of technologies that together comprise enterprise content management. Read Full Bio


Thoughts on Enterprise Search and Distributed Computing Are the New NoSQL


  1. Nice to see this.
    I just gave a talk at BazaarVoice IO conference/hackathon in Austin a week ago. The title of the walk was Open Source Search Evolution. One of the things I pointed out about tools like Solr and Elasticsearch is that they are regularly being used as Key Value stores and NoSQL DBs and that, if you look at them, they really do provide what a lot of pure NoSQL solutions provide (sharding, replication, horizontal scaling, etc.), with the added benefit of (full-text) search, faceting, and all the usual search features being built-in, thus avoiding having to bolt “Secondary Indexes” to non-search NoSQL DBs, which we regularly see in the industry (e.g. Cassandra+Solr from DataStax, Riak+Solr from Basho, Neo4J+Lucene, etc.)



Comments are closed

Comments or opinions expressed on this blog are those of the individual contributors only, and do not necessarily represent the views of Gartner, Inc. or its management. Readers may copy and redistribute blog postings on other blogs, or otherwise for private, non-commercial or journalistic purposes, with attribution to Gartner. This content may not be used for any other purposes in any other formats or media. The content on this blog is provided on an "as-is" basis. Gartner shall not be liable for any damages whatsoever arising out of the content or use of this blog.