The Web that Grows
I just boarded a 16-hour flight from Dubai to SFO, so I’d like to take this opportunity to share what I’ve learned about the database scalability landscape over the last decade or so, and explain what molded ScaleArc into the company it is today.
When we started ScaleArc, one of the guiding principles of the company was to “commoditize database scalability.” Database scalability, I believe, is one of the fundamental building blocks of all successful web/enterprise apps, and as a result of that, is something that is becoming even more critical as the number of users, apps and interactions on the open web continue to grow.
Yet, database scalability still remains relatively exotic, and the domain of a few “database experts” that tend to be revered and employed at very high salaries, by relatively few companies that are the bastion of the “open web.” One example of this is the hundreds-strong MySQL ops & maintenance team at Facebook, which is the envy of the MySQL world, or the VTOCC team at Google/YouTube, which has found new and exciting ways to make MySQL scale. These teams contribute a lot of code to MySQL, so you may think this is spreading the promise of database scalability to everyone on the open web. But, in reality, since a lot of the changes they push into the community are fairly application-specific to their own use cases, they have limited appeal to the more conventional “top 100” apps out there. And, what they push out is but a fraction of the set of tools they have built for themselves to scale MySQL.
At the Percona Live MySQL Expo this past April, I attended presentations by both Facebook and Google on how they achieve scale. They talked about the same things we do at ScaleArc: monitoring response times of servers in micro-detail to make effective load management decisions, profiling SQL traffic to identify the smallest areas of improvements, focusing more on the “frequent” than the “infrequent and slow,” replication awareness, how high availability can be achieved with control on replication and, last but not the least, how multi-level caching can really help speed up SQL performance. All this is done with a multitude of “tools and frameworks” – tools and frameworks that aren’t for public use.
Now, I don’t question Google’s or Facebook’s intentions on why they don’t release these tools in the open. I realize they would if they could, or if they felt doing so would make the world a better place. These tools come in many shapes and forms, but most are proprietary scripts and frameworks that forever stay inside these companies because they contain a lot of infrastructure or app-specific code that might reveal a bit too much to the rest of the world about how these companies operate. Other pieces might just be too fragmented (hack / patch jobs) to be usable by the rest of the world. Unless an engineer decides to polish these tools for the rest of the world as his/her 20% time project, they will forever stay internal. Facebook and Google have no vested interest in cleaning up those tools for you.
Shipping software is a lot harder than shipping services, and hence these tools remain exotic things Facebook and Google talk about on stage at conferences, but these tools can’t help the independent designer who just fired up Magento and started selling “cool tees,” or the talented teen who wants her literary blog to show the world that she’s the next J.K. Rowling.
This is all very familiar to me because we did something very similar at my previous organization – we built a lot of homegrown solutions for everything from managing and monitoring MySQL replication and backups, to managing high availability, to building frameworks within our apps to use NoSQL datastores such as MongoDB and Memcached to improve the performance of the database stack. All this was completely DIY, completely open source, all the way. Of a team of 150 engineers, my 10 brightest did nothing but DB scalability.
It was a great, liberating experience, but as things started to grow, and as we went from a few dozen to a few hundred, to nearing thousands of servers and billions of page views, it became clear that we needed even more code to manage the code we had created to manage database performance. Which meant finding even more smart engineers, and in a hurry, because the issues were becoming bigger and more critical with each passing day.
We needed tools to make sense of the data our frameworks were generating. We needed a whole lot of automation to eliminate repetitive tasks, and to reduce the time taken to respond to failures. We needed, most of all, an easy way to visualize all of the information so we could see patterns in the data which would help us make more improvements. We tried hacks and patches, and many third party solutions such as Splunk to satisfy some of these needs. But to achieve all these items, and do them well, we would have needed a team many times the size of a typical startup, and my previous organization, which was a media organization, wouldn’t have foot the bill, as this wasn’t a ‘core’ strength!
So with a mission statement as broad as “commoditize database scalability,” we founded ScaleArc. We needed to find some guiding rules to define what we should build, and here’s what we came up with: