However, when a site grow large, amount of traffic may exceed the replication's capacity. In such a case, people may use memcached. It's an in-memory, very fast and well-known KVS, key value store, and its read throughput is far better than MySQL. It's been used as a cache for web applications to store 'hot' data with MySQL as a back-end storage, as it can reduce read requests to MySQL dramatically.
While 1:N replication can scale read workload and memcached can reduce read requests, it cannot ease write load well. So, write traffic gets higher and higher when a web site becomes huge. On such web sites, a technique called "Sharding" has been used; it's a technique that the application choose an appropriate MySQL server from several servers.
In that way, MySQL+memcached has been a de-fact standard data store on huge web sites for long time.
Since web applications are getting larger still, especially on social media sites, write load is getting higher and higher as people communicate in real-time. In such area, yet another technique is required to handle the write load. Then, some people have chosen NoSQL solutions instead of MySQL+memcached. NoSQL is a kind of buzz word, IMHO, which represents non-relational databases which doesn't require SQL access. Despite lack of SQL access, some NoSQL softwares are suitable for huge scale web applications, like Cassandra. Although people cannot JOIN records on NoSQL system, it is not possible on RDBMS over the shards as well. So, MySQL isn't used as a RDBMS, is used as a data store without joins in other words, on such a web application in the first place.
For further information of this kind of thoughts, I recommend you to read Mark Calleghan's post: http://mysqlha.blogspot.com/2010/03/plays-well-with-others.html
and this post: http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king
Technically, it is possible to handle huge amount of traffics using MySQL, but a running cost gets expensive, Twitter says. As these techniques are separate ones, so those people have to spent their time to learn all of three who implement the application over them and manage them. On the other hand, Cassandra can handle more traffics as a single database management system, so people only have to learn it instead of three. Sounds great? But, is it a really good choice?
No! They're not aware of yet another solution, say SPIDER storage engine!
SPIDER for MySQL
SPIDER is a storage engine developed by a Japanese MySQL hacker, Mr. Kentoku Shiba, it makes use of MySQL's partitioning functionality and store partitioned data onto remote servers. I may say it's a Sharding storage engine. While flexibility of MySQL's storage engine API enables such an engine, but I value Kentoku's design a lot.
The following picture depicts how SPIDER storage engine works. (This is a snippet from the site above.)
Giuseppe Maxia's post.
Please look at the following graph, which represents an INSERT performance comparing a single MySQL server (InnoDB), 2 SPIDER node + 2 backend MySQL server and 4 SPIDER node + 4 backend MySQL Server. You can see how good it scales.
For more information about SPIDER's performance test, please refer to Kentoku's slide. It's surprising.
The most significant problem for twitter is to scale out read/write load with less running cost. Unfortunately, they had chosen NoSQL solution due to the fact that "MySQL replication + memcached + sharding" cannot handle write intensive workload well. However, such a problem can be resolved using SPIDER storage engine with MySQL!
Generally, KVS cannot solve certain problems like below:
- Sort (ORDER BY)
- Aggregation (GROUP BY)
I think that the most significant benefit to use RDB is its usefulness and flexibility. It is a very important characteristic for developers in order to keep the application competitive, especially for those developers who have to add new features/functionalities day by day, like web services. I develop SPIDER storage engine in order to provide developers such useful and flexible RDB's characteristics, even on the environment where the traffic and data is huge thus Sharding is required.I 100% agree with his opinion. If you are facing the problem caused by high traffic and huge data just like twitter, please consider to use SPIDER storage engine before migrating to NoSQL solutions.