by Donald Feinberg | December 22, 2014 | 2 Comments
Another post from the DBMS Curmudgeon
“What’s in a name? that which we call a rose by any other name would smell as sweet;”, From Shakespeare‘s Romeo and Juliet, 1594. Not true for a Database Management System (DBMS).
The IT industry uses the words database and DBMS interchangeably. They are not! Simply put, a DBMS is software used to create and manage a database. A database contains data; a DBMS is a software product. First, let me define the two terms.
Gartner defines DBMS and database as follows: A database management system (DBMS) is the software used to organize, support and maintain the information or data in a structure stored in a computer. Although, normally stored on magnetic storage media such as disc or flash, it can also be stored in memory. The software includes the rules to organize the data and enforce the model (for example relational, network, hierarchical), insert, update and delete data, provide security for the data, enforce persistence and facilitate backup and recovery of the data. Vendors such as IBM, Microsoft, Oracle and SAP sell DBMSs.
A database is a structured
collection of records or data. A computerised database
relies upon software
to organize the storage of data. The software models the database structure in what are known as database models
. The model in most common use today is the relational model
. Other models such as the hierarchical model
and the network model
use a more explicit representation of relationships. Vendors such as Dun & Bradstreet
My problem is that many professionals in the DBMS software world, continue to call the DBMS product a database product. Are they referring to the DBMS or database? Some might think I am being picky, however, when reading documents that misuse the terminology, it is often difficult to understand if the author is referring to the software or the data. E.g., “My database crashed”. Does this imply a software problem or a hard disk crash? We all must be more careful to use the correct terminology.
So let’s make a New Year’s resolution: Try to use the correct terminology.
If you sea someone using terminology in air, you should should let them no.
Category: Banco de Dados Data Management DBMS IT Infrastructure Tags: Banco de Dados, Curmudgeon, Data Management, Database Management System, DBMS, In-Memory, Structured Data
by Donald Feinberg | December 18, 2014 | 5 Comments
Donald Feinberg (@Brazingo) & Merv Adrian (@merv)
Every so often, there’s a wave of interest in the “imminent retirement” of one or more legacy database management systems (DBMS). Usually, it’s because someone with very little knowledge of the actual use and distribution of the products becomes enthusiastic about someone’s sales pitch, or an anecdote or two. Sometimes it’s the result of a “replacement” marketing campaign by a competitor. It takes longer than 40 years for DBMS technology to die, and for a (competing) marketer, it is like the villain in a horror story who just keeps coming back. And so far, it’s usually as illusive- and as far off – as the “death of the mainframe”.
Recently, a financial analyst report stated that in 2015, the industry would begin retiring Sybase products (owned now by SAP) and Informix (owned now by IBM). We and our colleagues have since had several inquiries about this and our response is simple: poppycock. DBMS market data, and our thousands of interactions with customers, do not support any such assertions.
Let’s start with Sybase, or specifically, SAP ASE and SAP IQ, acquired by SAP from Sybase in 2010. (Full disclosure: Merv worked at Sybase in the 1990s.)
Since its acquisition of Sybase, SAP has released several enhanced versions of both SAP ASE and SAP IQ (including recently in 2014), and there’s no reason to question its intent to continue development and support of both.
Generally, the customers using these products are happy, and are not looking to replace them. We receive a steady stream inquires from Gartner clients asking about them, which have not changed in character or in volume. It is true that customers ask the question, however the vendor’s intent is not questioned. They are not typically or disproportionally about removing these products, though we regularly get inquiries about replacing all the “legacy” RDBMS offerings with new products.
SAP IQ is the oldest and most widely installed column-store DBMS on the market. It is used for both analytics and as a general purpose data warehouse; it’s also part of the SAP HANA infrastructure, used as a near-line storage engine for cooler data not required in-memory in SAP HANA.
SAP ASE has retained a sizable loyal customer base on Wall Street, where it is part of the infrastructure used for trading systems, and elsewhere. It’s been certified as a DBMS platform for SAP Applications for about two years, and its use there is growing: Gartner estimates over 6000 instances of SAP Applications using SAP ASE as a platform at the beginning of 2014 [Edited Dec 19 to change number to 6000 – see below for comment from SAP]. That rate of growth for SAP ASE is actually faster than it had been in the 10 years before SAP acquired it – most likely because now SAP ASE is an alternative to Oracle, as a platform for SAP Applications.
Given the SAP sales force’s focus on SAP HANA, and the minimal marketing of SAP ASE and SAP IQ, we do understand how a misconception around the future of these products could happen. But it is just that – a misconception.
What about Informix, acquired by IBM in 2001? Over a decade later, it remains an integral part of an IBM information management portfolio that includes three primary DBMSs – DB2, IMS and Informix, and newer entrants such as Cloudant. IBM has continued to release new enhanced versions of Informix since the acquisition; for example it has recently added JSON support with MongoDB JSON Drivers. Due to the implementation of embedded indexes, Informix is a good choice for audio and video indexing. Finally, the number of IBM Informix customers has continued to increase and its user base is very loyal, with one the largest and most active User Groups.
IBM positions Informix for three primary use cases:
- High-speed processing in verticals like retail (point of sale systems) and manufacturing
- Time-series DBMS – one of its primary features, and a “timely” one
- The Internet of Things, where its high-speed ingest capabilities and small footprint are well-suited
So, it’s our opinion that the report referenced above is erroneous, and is not based in fact. At the end of the day, one of the most powerful forces in DBMS is inertia. Just ask Oracle, whose 2Q15 financial results press release on 17-Dec-2014 noted that “software updates and product support revenues drove nearly half of total company revenue.” Legacies are sticky – if it works, people don’t take lightly to changing it. In all these cases, legacy products are not only holding their own, but finding new markets in the hands of large companies with loyal customer bases.
Don’t believe everything you read (unless, of course, we wrote it.)
Category: Analyst Banco de Dados Data Management DBMS Operational DBMS Tags: Banco de Dados, Data Management, Database Management System, DBMS, Informix, Online Transaction Processing, Operational DBMS, RDBMS, Relational DBMS, SAP ASE, SAP IQ, Sybase
by Donald Feinberg | October 9, 2014 | 2 Comments
Durante os últimos (muitos) anos, eu travei a batalha sobre o uso da expressão “estruturados” versus “não estruturados” na gestão de dados. Eu tentei cada argumento lógico e tentei muitos outros termos para descrever dados não estruturados, como também fizeram muitos dos meus colegas do Gartner e toda a indústria. Até mesmo usei a expressão “a palavra U” para dados não estruturados (“Unstructured” em inglês) para implicar que é semelhante aos sete palavrões (em inglês), uma rotina de um dos meus comediantes favoritos, George Carlin. Independentemente de quantas vezes, alguns de nós tem tentado, a expressão “não estruturados” continua a ser amplamente utilizada para descrever todos os dados que não podem ser simplesmente descritos como dados relacionais. Para alguns, é XML ou texto. Para outros, ela abrange todo o espectro de XML para voz e vídeo, incluindo e-mail e SMS (por vezes referido como dados de ruído). Em termos simples, é tudo a que nos referimos como “as outras coisas” que iríamos armazenar em arquivos ou banco de dados.
Segundo a Wikipedia “dados não estruturados (ou informações não-estruturadas) referem-se (geralmente) a informação computadorizada que, ou não tem um modelo de dados, ou tem um que não é facilmente utilizável por um programa de computador. O termo distingue tais informações a partir de dados armazenados em formato de campo em bancos de dados ou anotada (com etiqueta semântica) em documentos”. Onde eu tenho um problema é que XML tem, sim, um modelo de dados (consulte XML Schema). Além disso, um JPEG (Joint Photographic Experts Group), TIFF (Tagged Image File Format) ou outros arquivos de imagem são facilmente utilizáveis em um programa de computador – por exemplo, no Adobe Photoshop. A Wikipedia chega a dizer: “O termo [não estruturado] é impreciso por várias razões …” Esta sempre foi a base para eu não usá-lo – é impreciso e sem definição formal do tipo de dados a que se refere.
Então, por que nós usamos “não estruturados” para descrever todos os dados que não se encaixam muito bem em um modelo de dados? Porque se tornou geralmente aceito em toda a indústria. Quando alguém usa a expressão “dados não estruturados”, todos entendem que estamos descrevendo os dados que não são uma coluna de números, caracteres ou datas. Na realidade, os dados realmente se encaixam em um contínuo que vai do estruturado ao não estruturado, desde números relacionais, datas e caracteres através XML, até não estruturados, tais como voz, vídeo e e-mail. Alguns dados são mais estruturados que outros.
Portanto, eu desisto. Algumas batalhas simplesmente não valem a pena. Chega de lutar essa batalha. Vitória do antigo provérbio “Se você não pode vencê-los, junte-se a eles”. Agora vou usar “não estruturados” para descrever todas as “outras coisas” que não são estruturadas. É claro que agora nós chamamos isso de Big Data – Opa, não vamos entrar nisso (pelo menos hoje).
Obrigadão ao meu amigo e colega do Gartner, Cássio Dreyfuss por obter ajuda com o meu português
Category: Analyst Banco de Dados Big Data Data Management DBMS Tags: Banco de Dados, Database Management System, Structured Data, Unstructured Data, XML
by Donald Feinberg | October 4, 2014 | 11 Comments
Another post from the DBMS Curmudgeon
For the past years (many), I have waged the battle over the use of Structured vs. Unstructured in data management. I have tried every logical argument and tried many other terms to describe Unstructured Data, as have many of my colleagues at Gartner and throughout the industry. I have even used the phrase “The ‘U’ Word” for Unstructured to imply it is similar to the Seven Dirty Words, a routine from one of my favorite comedians, George Carlin. Regardless of how often some of us have tried, the word Unstructured continues to be used widely to describe all the data that cannot be simply described as Relational Data. For some it is XML or text data. For others, it covers the spectrum from XML to Voice and Video, including e-mail and SMS (sometimes referred to as noise data). In simple terms, it is all the “other stuff” we would store in files or a database.
According to Wikipedia “Unstructured Data (or unstructured information) refers to (usually) computerized information that either does not have a data model or has one that is not easily usable by a computer program. The term distinguishes such information from data stored in fielded form in databases or annotated (semantically tagged) in documents.” Where I have a problem is that XML does have a data model, see XML Schema. In addition , a JPEG (Joint Photographic Experts Group), TIFF (Tagged Image File Format) or other image file is easily usable in a computer program – for example, in Adobe Photoshop. Wikipedia even says, “The term is imprecise for several reasons…” This has always been the underlying basis for my argument against using it – it is imprecise with no formal definition of the type of data to which it refers.
So why do we use Unstructured to describe all of the data that does fit nicely into a data model ? Because it has become generally accepted throughout the industry. When one uses the words Unstructured Data, everyone understands that we are describing data that is not a column of numbers, characters or dates. In reality, data actually fits in a continuum from Structured to Unstructured, from relational numbers, dates and characters through XML to unstructured, such as voice, video and e-mail. Some data is more structured than other.
Therefore, I give up. Some battles are simply not worth havin. I am finished fighting this battle. The age-old proverb “If you can`t beat ‘em, join ‘em” wins. I will now use Unstructured to describe all the “other stuff” that is not Structured. Of course, now we call this Big Data – Opps, let’s not go there (at least today).
Category: Data Management DBMS General Tags: Data Management, Database Management System, RDBMS, Relational DBMS, Structured Data, Unstructured Data, XML
by Donald Feinberg | September 28, 2014 | 3 Comments
Recently, we published a Market Guide for In-Memory Computing. The document covers all forms of IMC, including Database Management Systems (DBMS). Gartner defines In-Memory Computing (IMC) as a computing style where applications assume all the data required for processing is located in the main memory of their computing environment. Although we define many styles of IMC (Application Servers, Data Grids, Messaging and Complex Event Processing), I want to concentrate specifically on DBMS technology in-memory. Why? There appears to be some level of misconception about what does and does not qualify as an In-Memory DBMS (IMDBMS).
Our definition of IMDBMS requires the database structure to be in-memory, specifically the main memory of the server. Data in the database is accessed through instructions for accessing memory and not using I/O instructions. This should not be confused with products that buffer data in a disk-block cache. Disk-block caching has been used in the industry for many years, pre-dating relational technology. For example, IBM’s IMS DBMS was, from its introduction in 1968, able to cache data in memory, also referred to as pre-fetch or read-ahead; however, it is not an IMDBMS. While we agree that caching does improve performance, over accessing disk or flash, it is not IMC.
One major difference between traditional disk-based DBMS engines and IMDBMS is the implementation of the consistency model. IMDBMS covers all DBMS consistency models from ACID consistency to eventually consistent models, the latter found in many of the noSQL DBMS engines. However, regardless of the consistency model, a commit operation will be performed. Disk-based systems, even if all the data is cached in memory buffers, require the transaction to be written to disk or flash. Regardless of the length of time taken to perform this operation, it is greater than zero. With IMDBMS products, the commit operation takes place in memory. Although this requires unique methods or assuring the persistence of the data, due to the volatility of memory, such as synchronous writing of data to a second server using Remote Direct Memory Access (RDMA), the latency is less than writing to external media. This illustrates why the performance of IMDBMS is higher, even over using a disk-block buffer.
With our precise definition of true IMDBMS, we seek to dissipate the hype in the market over IMDBMS and claims made by some vendors that their technology is IMDBMS when, in fact, it is not.
Category: Analyst Data Management DBMS General In-Memory Computing In-Memory DBMS Operational DBMS Tags: ACID, Database Management System, IMDBMS, In-Memory, Online Transaction Processing, Operational DBMS, RDBMS, Relational DBMS
by Donald Feinberg | June 15, 2009 | 1 Comment
On June 9, Google Labs announced Google Fusion Tables , a new system for managing data in the Google cloud from Google Labs. I want to be clear about one point – this is an experiment from Google Research not exactly ready for production systems (Google is clear about this also). The issue I have is how the press exaggerates the announcement by warning the Database Management System (DBMS) vendors to watch out as they are being blindsided by Google. You must be kidding!
First, what is Fusion Tables? It is a system for managing data in the cloud for collaboration with data from disparate sources in a simple way, including the ability to “drill-down” to the sources of the data. It allows the user to “join” (in a loose definition) data without the constraints of the data model, normally found in a relational DBMS. What it is not is a DBMS to manage data for an On-Line Transaction Processing (OLTP) system or a Data Warehouse. Fusion Tables is based on Data Spaces, defined in Wikipedia as “a container for domain specific data” and further “A Data Space system is a multi-model data management system that manages data sourced from a variety of local or external sources”. Data Spaces were originally defined in the early 1990’s during the Object Oriented DBMS (OODBMS) era.
As with many new ideas, there are elements of the technology that may have value. When this happens, we find that the original relational model is evolved to incorporate this new technology or model. We saw this occur with OODBMS – the modern DBMS does use inheritance and user defined classes. We saw this happen with XML – now the modern DBMS has full native XML as a data type as robust as the original pure-play XML DBMSs. Today we are seeing this happen with MapReduce as several DBMS vendors have incorporated it into its DBMS engine. We will see this happen also with the column-store construct, which we believe will be incorporated into many modern DBMS engines as an indexing technique for optimization. As to the validity of Fusion Tables and the ability to mix disparate data source and types, there is little question as to the usefulness of this. Oracle has already put a capability in its current release (11g) as SecureFiles and Microsoft in SQL Server 2008 has a feature called FILESTREAM. These are not experimental or beta test features but implemented in full production.
Is Fusion Tables worth watching? Of Course! The concept of easily combining disparate sources of data for analysis and collaboration is important and has been around since the inception of IT. Mashups and other Web 2.0 constructs have made some of this available today (see The Rise of Collaborative Decision Making). Google has a good start on this with the ability to use data from Google Apps and other spreadsheet style data with the initial version of Fusion Tables. Organizations must take care or these types of applications will cause additional turmoil in the governance and security space (see Developing a Strategy for Dealing With Desktop Database Management System Proliferation ). Will this technology replace your DBMS for OLTP and DW systems – not soon or in the future. Many have tried (e.g., OODBMS). There are other new techniques and systems being researched today that have promise (e.g., Akiba), however, the relational model continues to demonstrate flexibility and resiliency (over 30 years) and you can expect that to continue. Products like DB2, Informix, Ingres, MySQL, Oracle, PostgreSQL, SQL Server and Sybase ASE will be used in new IT systems for many years to come.
Category: DBMS Tags: Akiba, Collaborative Decision Making, Data Spaces, Data Warehouse, Database Management System, DBMS, Desktop DBMS, DW, FILESTREAM, Fusion Tables, Google, Google Labs, MapReduce, Object Oriented Database Management System, Online Transaction Processing, OODBMS, OPTP, Oracle, RDBMS, Relational DBMS, SecureFiles, SQL Server, SQL Server 2008, XML
by Donald Feinberg | April 17, 2009 | Submit a Comment
The Merriam-Webster dictionary defines curmudgeon as “Archaic”. That’s me – sometimes. Many of the open source bloggers might agree. In years past, they all thought I was a curmudgeon. Go ahead Tony – laugh. But I can come around – although I still believe that companies like to make money and developers like a pay check. When it comes to blogging – that is not me – or so I thought. I do not read blogs and this is my first attempt at writing one. Funny that two years ago, I won an award in our group for being mentioned more than anyone else in blogs that year! And this, when I never read blogs or comment on them. So the curmudgeon changes again and here I am with my own blog.
Category: Analyst General Tags: Curmudgeon