ScaleArc

Why Databases Will Never Solve App Uptime Alone

Database vendors have been attempting to create transparent auto failover capabilities in their database software for decades, to enable zero downtime on mission-critical transactional apps for OLTP environments. The challenge is, apps and databases are intimately connected, and ultimately apps drive databases – not the other way around. All databases can do is respond to queries sent by apps and attempt to stay as “synchronized” as possible with advanced replication technologies. Oracle RAC is a great example of billions of dollars of database R&D spend in an effort to deliver Zero Downtime.

But it’s the apps that have to be able to withstand database failures. If apps are not meticulously engineered to withstand bursts of connection errors when a database server dies, then the app will go down when it chokes on the error burst. Most apps are not engineered to this degree – it’s actually incredibly difficult and time consuming to hard code into the app the logic to handle all types of failure scenarios. This approach requires synchronization of multiple app servers to be able to act in unison to handle and manage connection errors while the database software adjusts the cluster to account for the database failure. And of course, engineering the app to this degree helps only that particular app – every other app takes the same, repetitive meticulous engineering. This level of custom engineering isn’t even possible for third-party apps, where enterprises don’t have access to the source code. For these apps, it’s not even possible to engineer for seamless failover, leaving these customers at the mercy of whatever work has been done by the app vendor and forcing them to choose the database environment the app vendor picked.

I know this pain, all too well. For the years I was CTO at one of India’s largest media companies, my 10 best engineers, of 200, did nothing but this kind of meticulous app engineering to enable our company’s very popular websites (CNN, CNBC, Home Shopping Network, etc.) to handle database scale out and failover. When they finished tuning the app for one site, they went on to the next one and repeated the very same process, but couldn’t leverage common code to do so.

Database vendors recognize this problem and have attempted to solve it. They’ve worked, for example, to build zero downtime logic and functionality via drivers, such as JDBC drivers, but at best, the most advanced drivers have only rudimentary re-try logic, and it still requires detailed custom application engineering to make it work. And it can’t be designed to understand the way all apps work for all failure scenarios across multiple app servers for transactional application environments.

The pain I experienced, and the contrast with how we architected our web tier, inspired us to create ScaleArc. Just as web load balancers, acting as a proxy, are essential to delivering zero downtime at the web tier, a transparent abstraction layer is essential to delivering zero downtime for transactional apps. The abstraction layer approach is the only way to enable zero Downtime for all of an enterprise’s apps – with that architecture, the technology can deploy in a transparent plug-and-play manner, with no need for app changes.

How do I know this approach is right? All the “always on” consumer companies have built their own – Facebook, Google, Twitter, etc. They often call this abstraction layer a Data Access Layer, and many have published research about the technology.

The database itself cannot deliver this functionality – you need the abstraction layer. Why? To achieve transparent auto failover, the connectivity driver on the app needs to be aware of the connection state and needs to be able to provide a new database connection to the existing client connection following a database failure. So failover must happen for two distinct states: 1) new connections to the database server, and 2) for any existing connections already established to the database server. Existing drivers do not provide this functionality – they return error messages when the database server becomes unavailable. Even if a driver has re-try logic, it still mandates that customers modify their application code to work with the retry logic – there’s that meticulous app engineering work again, provided source code access is available.

An abstraction layer, like the ScaleArc software provides, handles failover differently. The ScaleArc software acts as a full wireline SQL protocol proxy – clients perceive the software as the database server, and database servers see it as the client. Sitting in between, this software is aware of the state of each connection and can perform a stateful connection failover to a new database server without letting the client know about the failure. ScaleArc failover also does not depend on any specific feature, either on the database server or the connectivity driver – it is truly transparent and so works on all apps. It also support all the drivers and database servers without requiring any changes to your application code.

Only database abstraction layer software works in this manner – it’s the only path to zero downtime without app engineering. It’ll simply never be the case that the database guys can solve app uptime on their own.

comments powered by Disqus

Recent Blog Entries

  • November 15, 2017
    Helping Inmates Stay Connected to Family
    More »
  • October 12, 2017
    ScaleArc on Google – Hitting the Cloud Trifecta
    More »
  • September 19, 2017
    Acceleration Adoption of Azure SQL Database
    More »
  • September 7, 2017
    More ScaleArc Magic – Speeding up Apps with Wrapped Transactions
    More »
  • August 15, 2017
    Prepping for Black Friday? You’re Late!
    More »
View All Blog Posts »