Blog post

In-Memory DBMS vs In-Memory Marketing

By Donald Feinberg | September 28, 2014 | 3 Comments

Operational DBMSIn-Memory DBMSIn-Memory ComputingGeneralDBMSData ManagementAnalyst

Recently, we published a Market Guide for In-Memory Computing. The document covers all forms of IMC, including Database Management Systems (DBMS). Gartner defines In-Memory Computing (IMC) as a computing style where applications assume all the data required for processing is located in the main memory of their computing environment. Although we define many styles of IMC (Application Servers, Data Grids, Messaging and Complex Event Processing), I want to concentrate specifically on DBMS technology in-memory. Why? There appears to be some level of misconception about what does and does not qualify as an In-Memory DBMS (IMDBMS).

Our definition of IMDBMS requires the database structure to be in-memory, specifically the main memory of the server. Data in the database is accessed through instructions for accessing memory and not using I/O instructions. This should not be confused with products that buffer data in a disk-block cache. Disk-block caching has been used in the industry for many years, pre-dating relational technology. For example, IBM’s IMS DBMS was, from its introduction in 1968, able to cache data in memory, also referred to as pre-fetch or read-ahead; however, it is not an IMDBMS. While we agree that caching does improve performance, over accessing disk or flash, it is not IMC.

One major difference between traditional disk-based DBMS engines and IMDBMS is the implementation of the consistency model. IMDBMS covers all DBMS consistency models from ACID consistency to eventually consistent models, the latter found in many of the noSQL DBMS engines. However, regardless of the consistency model, a commit operation will be performed. Disk-based systems, even if all the data is cached in memory buffers, require the transaction to be written to disk or flash. Regardless of the length of time taken to perform this operation, it is greater than zero. With IMDBMS products, the commit operation takes place in memory. Although this requires unique methods or assuring the persistence of the data, due to the volatility of memory, such as synchronous writing of data to a second server using Remote Direct Memory Access (RDMA), the latency is less than writing to external media. This illustrates why the performance of IMDBMS is higher, even over using a disk-block buffer.

With our precise definition of true IMDBMS, we seek to dissipate the hype in the market over IMDBMS and claims made by some vendors that their technology is IMDBMS when, in fact, it is not.

 

Leave a Comment

3 Comments

  • Donald,

    Would a system like NuoDB (www.nuodb.com) fit the criteria you describe here for an IMDBS?

    We’ve been working on leveraging it to replace our existing storage-based DBMS systems (primarily MySQL) and from our initial work it does seem to be a true DBMS that is based primarily on in-memory storage of data, with a disk-based persistent layer.

    It appears to meet the definition you provide, but I’m curious if their notion of a durable distributed cache fits within this framework.

    Sincerely,

    Bryan “BJ” Hoffpauir
    Chief Architect & Problem Solver
    Comit Developers, LLC.

    http://www.comitdevelopers.com

  • Donald Feinberg says:

    First, all IMDBMS’s require some form of disk-based persistence. Memory is volatile, hence something must be written to flash or disk for recovery purposes. Having disk-based persistence would not stop an IMDBMS from being truly in-memory.
    If NuoDB is storing data directly and in-memory database structure and not updating disk-buffers only, it is an IMDBMS. It must however meet the definition of a DBMS. if it is a distributed in-memory cache, it could be categorized as and In-Memory Data Grid, such as Memcached, Oracle Coherence and GigaSpaces. Either way, it is a true In-memory technology.

  • Henry Cook says:

    Donald, thank you for clarifying this as there is a lot of confusion in the marketplace, as you state, simply using memory to avoid IO by using a cache does not constitute an in-memory database.
    Being an IMDB means that all of the computation you do takes place in memory, the DRAM of the server, the Level 1,2 and 3 caches of the processors and the registers, maximising the performance of each and without being held up by having to do disk accesses.
    Of course when you need to persist data it needs, by definition, to be recorded on non-volatile storage, in this case disk (can be SSD of course). However this is not a performance problems as the disk subsystem is relegated to simply accepting high volume sequential writes onto the logs, it is not affected by having to do random IO, and you can write many parallel logs to sustain any throughput that might be necessary. Plus, as you also rightly point out you can also replicate changes to another active system.
    So, using disks for persistence is normal for IMDBs, its where the database processing happens, rather than where changes are logged that makes the difference. Thanks for making the point clear and a useful place to link to in order to answer this question.
    Regards, Henry Cook – SAP