Another post from the DBMS Curmudgeon
For the past years (many), I have waged the battle over the use of Structured vs. Unstructured in data management. I have tried every logical argument and tried many other terms to describe Unstructured Data, as have many of my colleagues at Gartner and throughout the industry. I have even used the phrase “The ‘U’ Word” for Unstructured to imply it is similar to the Seven Dirty Words, a routine from one of my favorite comedians, George Carlin. Regardless of how often some of us have tried, the word Unstructured continues to be used widely to describe all the data that cannot be simply described as Relational Data. For some it is XML or text data. For others, it covers the spectrum from XML to Voice and Video, including e-mail and SMS (sometimes referred to as noise data). In simple terms, it is all the “other stuff” we would store in files or a database.
According to Wikipedia “Unstructured Data (or unstructured information) refers to (usually) computerized information that either does not have a data model or has one that is not easily usable by a computer program. The term distinguishes such information from data stored in fielded form in databases or annotated (semantically tagged) in documents.” Where I have a problem is that XML does have a data model, see XML Schema. In addition , a JPEG (Joint Photographic Experts Group), TIFF (Tagged Image File Format) or other image file is easily usable in a computer program – for example, in Adobe Photoshop. Wikipedia even says, “The term is imprecise for several reasons…” This has always been the underlying basis for my argument against using it – it is imprecise with no formal definition of the type of data to which it refers.
So why do we use Unstructured to describe all of the data that does fit nicely into a data model ? Because it has become generally accepted throughout the industry. When one uses the words Unstructured Data, everyone understands that we are describing data that is not a column of numbers, characters or dates. In reality, data actually fits in a continuum from Structured to Unstructured, from relational numbers, dates and characters through XML to unstructured, such as voice, video and e-mail. Some data is more structured than other.
Therefore, I give up. Some battles are simply not worth havin. I am finished fighting this battle. The age-old proverb “If you can`t beat ’em, join ’em” wins. I will now use Unstructured to describe all the “other stuff” that is not Structured. Of course, now we call this Big Data – Opps, let’s not go there (at least today).