ScaleArc

The Web that Grows

I just boarded a 16-hour flight from Dubai to SFO, so I’d like to take this opportunity to share what I’ve learned about the database scalability landscape over the last decade or so, and explain what molded ScaleArc into the company it is today.

When we started ScaleArc, one of the guiding principles of the company was to “commoditize database scalability.” Database scalability, I believe, is one of the fundamental building blocks of all successful web/enterprise apps, and as a result of that, is something that is becoming even more critical as the number of users, apps and interactions on the open web continue to grow.

Yet, database scalability still remains relatively exotic, and the domain of a few “database experts” that tend to be revered and employed at very high salaries, by relatively few companies that are the bastion of the “open web.” One example of this is the hundreds-strong MySQL ops & maintenance team at Facebook, which is the envy of the MySQL world, or the VTOCC team at Google/YouTube, which has found new and exciting ways to make MySQL scale. These teams contribute a lot of code to MySQL, so you may think this is spreading the promise of database scalability to everyone on the open web. But, in reality, since a lot of the changes they push into the community are fairly application-specific to their own use cases, they have limited appeal to the more conventional “top 100” apps out there. And, what they push out is but a fraction of the set of tools they have built for themselves to scale MySQL.

At the Percona Live MySQL Expo this past April, I attended presentations by both Facebook and Google on how they achieve scale. They talked about the same things we do at ScaleArc: monitoring response times of servers in micro-detail to make effective load management decisions, profiling SQL traffic to identify the smallest areas of improvements, focusing more on the “frequent” than the “infrequent and slow,” replication awareness, how high availability can be achieved with control on replication and, last but not the least, how multi-level caching can really help speed up SQL performance. All this is done with a multitude of “tools and frameworks” – tools and frameworks that aren’t for public use.

Now, I don’t question Google’s or Facebook’s intentions on why they don’t release these tools in the open. I realize they would if they could, or if they felt doing so would make the world a better place. These tools come in many shapes and forms, but most are proprietary scripts and frameworks that forever stay inside these companies because they contain a lot of infrastructure or app-specific code that might reveal a bit too much to the rest of the world about how these companies operate. Other pieces might just be too fragmented (hack / patch jobs) to be usable by the rest of the world. Unless an engineer decides to polish these tools for the rest of the world as his/her 20% time project, they will forever stay internal. Facebook and Google have no vested interest in cleaning up those tools for you.

Shipping software is a lot harder than shipping services, and hence these tools remain exotic things Facebook and Google talk about on stage at conferences, but these tools can’t help the independent designer who just fired up Magento and started selling “cool tees,” or the talented teen who wants her literary blog to show the world that she’s the next J.K. Rowling.

This is all very familiar to me because we did something very similar at my previous organization – we built a lot of homegrown solutions for everything from managing and monitoring MySQL replication and backups, to managing high availability, to building frameworks within our apps to use NoSQL datastores such as MongoDB and Memcached to improve the performance of the database stack. All this was completely DIY, completely open source, all the way. Of a team of 150 engineers, my 10 brightest did nothing but DB scalability.

It was a great, liberating experience, but as things started to grow, and as we went from a few dozen to a few hundred, to nearing thousands of servers and billions of page views, it became clear that we needed even more code to manage the code we had created to manage database performance. Which meant finding even more smart engineers, and in a hurry, because the issues were becoming bigger and more critical with each passing day.

We needed tools to make sense of the data our frameworks were generating. We needed a whole lot of automation to eliminate repetitive tasks, and to reduce the time taken to respond to failures. We needed, most of all, an easy way to visualize all of the information so we could see patterns in the data which would help us make more improvements. We tried hacks and patches, and many third party solutions such as Splunk to satisfy some of these needs. But to achieve all these items, and do them well, we would have needed a team many times the size of a typical startup, and my previous organization, which was a media organization, wouldn’t have foot the bill, as this wasn’t a ‘core’ strength!

So with a mission statement as broad as “commoditize database scalability,” we founded ScaleArc. We needed to find some guiding rules to define what we should build, and here’s what we came up with:

  • Plug and Play– If you want something to be widely adopted, it has to be effortless to try out, and should work with your existing apps and databases (hence the fact that ScaleArc is the only company in the database scalability space that supports MySQL,Microsoft SQL Server, AND Oracle).
  • Easy to deploy and operate – We took a comment by the author of Perl as our guiding principle on this. Ordinary day-to-day things should be easy, and hard things should be possible. It takes less than 15 minutes to set up ScaleArc iDB and create a cluster that can take application traffic on a bare x86 VM or box. It takes 5 minutes on AWS or Azure.
  • Transparent and Open – We chose to design the entire system around the concept that your data belongs to you. All config is completely un-encrypted and open for editing even without our UI in SQLite; all our logs are open, plain text; and the spec for which columns do what is also completely open. The same goes for the data we generate. All of our analytics magic produces 100% open data in SQLite. Also, starting with our next release, all functions are 100% JSON APIs, and there are NO Private APIs, which means everything we can do, you can do.
  • Great out of the box and extremely customizable – When you build a solution for the whole world, it’s got to work out of the box, and do what it’s supposed to do easily. You’ve also got to realize that the world is diverse, and hence it has to have a lot of adaptability to adjust to changing needs. This is a fundamental part of ScaleArc iDB, and the customization offered by our Rules Engine allows you to handle a very wide variety of database loads, and do what you need with ease.
  • With that, and a whole lot more that we discovered over the last four years, we continue our journey towards achieving that goal of making database scalability easier for the whole world. We’ve come a long way, and our customers would vouch for how ScaleArc has made it easier for them to operate their database infrastructure at scale, and given them the ability to spend more time on their core business goals, rather than writing more and more code to make their databases do what they’re supposed to do!
    comments powered by Disqus

    Recent Blog Entries

    • November 15, 2017
      Helping Inmates Stay Connected to Family
      More »
    • October 12, 2017
      ScaleArc on Google – Hitting the Cloud Trifecta
      More »
    • September 19, 2017
      Acceleration Adoption of Azure SQL Database
      More »
    • September 7, 2017
      More ScaleArc Magic – Speeding up Apps with Wrapped Transactions
      More »
    • August 15, 2017
      Prepping for Black Friday? You’re Late!
      More »
    View All Blog Posts »