NoSQL will “Eventually” be Consistent with SQL
"The database market is in need of a big change.” So says MongoDB’s Max Schireson. And we couldn’t agree more. We just don’t agree on what that change is.
MongoDB is on a campaign to tell the world the “reign of the relational database” is over. Anyone who says that clearly doesn’t understand what “realm” relational databases “reign” over. The SQL database market has become a $24b/year industry because it delivers the data consistency at the heart of critical business processes – it enables real use cases that impact our lives more than any other category of software we know today.
Considering how MongoDB and the rest of the NoSQL ecosystem is evolving, the gentleman doesn’t seem to recognize that it is just a matter of time until MongoDB itself becomes a pretty complete “semi relational database,” and inherits the same limitations and constraints that limit today’s SQL databases.
The reason why NoSQL databases are faster for basic operations such as single item lookups, or bulk data insert or reads are because they do a lot less hard work than SQL databases, leaving all that hard work up to the application layer.
Most NoSQL database have very little in the way of “data integrity”, whether it be in the form of not storing a full transaction log that can save you in case of a failure or corruption of the core database storage, or not enforcing data consistency by the way of “ACID” properties that have defined SQL database design for ages, and have helped build the foundation of most systems (from banking, to web ads, to CRM systems) we use today. Add that integrity up, and NoSQL systems start to slow down (lookup benchmarks of Riak with Strong Consistency vs. Eventual Consistency).
Then comes the second source of performance gain; simplistic unstructured database object design. Data, by its very nature, has to finally get into some sort of structure or shape to be analyzable or usable. Ask most Hadoop / Big Data BI users where their fully-churned data ends up, and the answer would be SQL. If an applications data needs are as simple as storing individual objects with no relationship to one another, NoSQL would work just fine (and so would a bulk store file system if you could get performance out of it).
Sadly, most applications need some sort of structure and relationships in their data, and need to be able to query the data with different variables based on different use cases. Since most NoSQL databases can’t do that today, they put the burden on doing the same on the application layer, and the developers, increasing application complexity and requiring significant code rewrites to change data structure. It’s also not uncommon to hear NoSQL users retrieving the entire dataset out to flat files, and then writing it back again just to realign to a new data retrieval model, or even moving the data to an intermediate SQL database just to run reports. Some NoSQL datastores, such as MongoDB are adding relational / data type based queries, but their performance on such queries is currently significantly worse than SQL databases.
It’s that added application complexity, and the desire for consistency that led Google’s big data team (the very same team that invented BigTable, arguably the father of the NoSQL crowd) to invent Spanner, a distributed, consistent SQL database. Google’s AdWords engine, the world’s largest application, by sheer volume of transactions, also runs on a relational database engine called F1; Facebook, Twitter and Google are now major contributors to both MariaDB and MySQL, and run more SQL servers than NoSQL, despite being at the cutting edge of tech. SQL isn’t going anywhere!
In fact, if we look at the last five years of NoSQL development, you’ll see clear signs of NoSQL datastores adding more SQL-like functionality (such as MongoDB’s Indexes, or Couch’s UnQL language), and losing a bit of the gained performance advantage in the process. On the other hand, SQL databases have also been adding more NoSQL-like features (some in-memory / column store features in SAP HANA, Oracle 12c and Microsoft’s Hekaton and ScaleArc’s in-memory SQL cache are straight out of the NoSQL playbook), and gaining in performance. This leads to a pretty obvious conclusion, that NoSQL and SQL databases will soon enough collide in terms of not only features, but also performance (NoSQL gaining features and losing performance to be more like SQL, and SQL gaining performance at the cost of a restricted feature set in certain areas to be more like NoSQL).
Once again. SQL isn’t going anywhere. If anything, NoSQL itself is going to become very-SQL-like in the near future, and the NoSQL vendors should know that better than anybody else, as they’re the ones doing it.
In the meantime, the change we see coming for SQL databases is the application of database traffic management software, which improves availability and performance of applications hitting SQL Server databases.
comments powered by Disqus