Blog post

OK, I Give Up! Unstructured Wins!

By Donald Feinberg | October 04, 2014 | 11 Comments

GeneralDBMSData Management

Donald Africa

Another post from the DBMS Curmudgeon

For the past years (many), I have waged the battle over the use of Structured vs. Unstructured in data management.  I have tried every logical argument and tried many other terms to describe Unstructured Data, as have many of my colleagues at Gartner and throughout the industry.  I have even used the phrase “The ‘U’ Word” for Unstructured to imply it is similar to the Seven Dirty Words, a routine from one of my favorite comedians, George Carlin.  Regardless of how often some of us have tried, the word Unstructured continues to be used widely to describe all the data that cannot be simply described as Relational Data.  For some it is XML or text data.  For others, it covers the spectrum from XML to Voice and Video, including e-mail and SMS (sometimes referred to as noise data).  In simple terms, it is all the “other stuff” we would store in files or a database.

According to Wikipedia  “Unstructured Data (or unstructured information) refers to (usually) computerized information that either does not have a data model or has one that is not easily usable by a computer program.  The term distinguishes such information from data stored in fielded form in databases or annotated (semantically tagged) in documents.”  Where I have a problem is that XML does have a data model, see XML Schema.  In addition , a JPEG (Joint Photographic Experts Group), TIFF  (Tagged Image File Format) or other image file is easily usable in a computer program – for example, in Adobe PhotoshopWikipedia even says, “The term is imprecise for several reasons…”  This has always been the underlying basis for my argument against using it – it is imprecise with no formal definition of the type of data to which it refers.

So why do we use Unstructured to describe all of the data that does fit nicely into a data model ?  Because it has become generally accepted throughout the industry.  When one uses the words Unstructured Data, everyone understands that we are describing data that is not a column of numbers, characters or dates.  In reality, data actually fits in a continuum from Structured to Unstructured, from relational numbers, dates and characters through XML to unstructured, such as voice, video and e-mail.  Some data is more structured than other.

Therefore, I give up.  Some battles are simply not worth havin. I am finished fighting this battle.  The age-old proverb “If you can`t beat ’em, join ’em” wins.  I will now use Unstructured to describe all the “other stuff” that is not Structured. Of course, now we call this Big Data – Opps, let’s not go there (at least today).

The Gartner Blog Network provides an opportunity for Gartner analysts to test ideas and move research forward. Because the content posted by Gartner analysts on this site does not undergo our standard editorial review, all comments or opinions expressed hereunder are those of the individual contributors and do not represent the views of Gartner, Inc. or its management.

Leave a Comment


  • Vanessa says:

    Such a catchy title I had to read the whole story. You giving up 🙂 on the other hand you are always right afterall…

  • MIke Olson says:

    That’s the problem with marketing — it doesn’t matter if you’re right, it only matters if you can get people to say what you say.

    I agree wholeheartedly on “unstructured.” Data without any structure is just noise, but nobody cares, so it’s not worth the fight.

  • Nancy says:

    Hahaha! You giving up? That’s scary!

  • Unstructured is horribly imprecise but still so much easier to use than alternatives. Just shows that simplicity will trump everything, even accuracy.

  • Wonder when Gartner will now begin referring to NoSQL databases as Unstructured databases. -:)

  • Bruce Robertson says:

    Funny, we have the same word problem over in the process world at Gartner and the industry. We’ve tried non-routine, but that doesn’t cover it all either. Oh well. I guess we will have to just explain a bit more or go with the flow.

    PS: that was a process joke.

  • Merv Adrian says:

    Three words I never thought I’d hear from you mouth: “I give up.” And you call yourself a curmudgeon.
    (by the way, I agree)

  • Donald Feinberg says:

    LOL – NOT!

  • Valsoir Tronchin says:

    “Human suffering has been caused because too many of us cannot grasp that words are only tools for our use, and that the mere presence in the dictionary of a word … does not mean it necessarily has to refer to something definite in the real world.”Richard Dawkins – The Selfish Gene – 30th Anniversary Edition, page 18

    A big hug, my friend! 😉

  • I suppose labeling complex structured information as unstructured is easier for some people to grasp. IMHO It shifts the onus of understanding, or at least, of trying to understand. I like the idea of simplification, but not to the point where simplification means significant loss of meaning.

  • Alain says:

    The only useful things from an unstructured data is the meta-data from itself from a computer point of view.