Samurai MySQL: Not Only NoSQL!! Uber Scaling-Out with SPIDER storage engine

2010-03-30

Not Only NoSQL!! Uber Scaling-Out with SPIDER storage engine

The history tells that a single RDBMS node cannot handle tons of traffics on web system which come from all over the world, no matter how the database is tuned. MySQL has implemented a master/slave style replication built-in for long time, and it has enabled web applications to handle traffics using a scale-out strategy. Having many slaves has been suitable for web sites where most of traffics are reads. Thus, MySQL's master/slave replication has been used on many web sites, and is being used still.

However, when a site grow large, amount of traffic may exceed the replication's capacity. In such a case, people may use memcached. It's an in-memory, very fast and well-known KVS, key value store, and its read throughput is far better than MySQL. It's been used as a cache for web applications to store 'hot' data with MySQL as a back-end storage, as it can reduce read requests to MySQL dramatically.

While 1:N replication can scale read workload and memcached can reduce read requests, it cannot ease write load well. So, write traffic gets higher and higher when a web site becomes huge. On such web sites, a technique called "Sharding" has been used; it's a technique that the application choose an appropriate MySQL server from several servers.

In that way, MySQL+memcached has been a de-fact standard data store on huge web sites for long time.

Since web applications are getting larger still, especially on social media sites, write load is getting higher and higher as people communicate in real-time. In such area, yet another technique is required to handle the write load. Then, some people have chosen NoSQL solutions instead of MySQL+memcached. NoSQL is a kind of buzz word, IMHO, which represents non-relational databases which doesn't require SQL access. Despite lack of SQL access, some NoSQL softwares are suitable for huge scale web applications, like Cassandra. Although people cannot JOIN records on NoSQL system, it is not possible on RDBMS over the shards as well. So, MySQL isn't used as a RDBMS, is used as a data store without joins in other words, on such a web application in the first place.

For further information of this kind of thoughts, I recommend you to read Mark Calleghan's post: http://mysqlha.blogspot.com/2010/03/plays-well-with-others.html
and this post: http://nosql.mypopescu.com/post/407159447/cassandra-twitter-an-interview-with-ryan-king

Technically, it is possible to handle huge amount of traffics using MySQL, but a running cost gets expensive, Twitter says. As these techniques are separate ones, so those people have to spent their time to learn all of three who implement the application over them and manage them. On the other hand, Cassandra can handle more traffics as a single database management system, so people only have to learn it instead of three. Sounds great? But, is it a really good choice?

No! They're not aware of yet another solution, say SPIDER storage engine!

SPIDER for MySQL
http://spiderformysql.com/

SPIDER is a storage engine developed by a Japanese MySQL hacker, Mr. Kentoku Shiba, it makes use of MySQL's partitioning functionality and store partitioned data onto remote servers. I may say it's a Sharding storage engine. While flexibility of MySQL's storage engine API enables such an engine, but I value Kentoku's design a lot.

The following picture depicts how SPIDER storage engine works. (This is a snippet from the site above.)

In this entry, I do not explain how to use SPIDER storage engine, but I tell you how great its ability is. If you want to try it out, please refer to Giuseppe Maxia's post.

Please look at the following graph, which represents an INSERT performance comparing a single MySQL server (InnoDB), 2 SPIDER node + 2 backend MySQL server and 4 SPIDER node + 4 backend MySQL Server. You can see how good it scales.

The next graph is a SELECT performance. Read scales pretty good as well.

Red circles indicate where working set sizes exceed memory sizes. While performance drops when a working set size exceeds the available memory size, SPIDER is able to expand the memory so that a working set fits in it. SPIDER can make use of memory on all remote servers, as if there is a huge buffer pool in total.

For more information about SPIDER's performance test, please refer to Kentoku's slide. It's surprising.

The most significant problem for twitter is to scale out read/write load with less running cost. Unfortunately, they had chosen NoSQL solution due to the fact that "MySQL replication + memcached + sharding" cannot handle write intensive workload well. However, such a problem can be resolved using SPIDER storage engine with MySQL!

Generally, KVS cannot solve certain problems like below:

JOIN
Sort (ORDER BY)
Aggregation (GROUP BY)

When using KVS, these problems can be handled using MapReduce, however, we can process the same task using a very simple SQL in general. Thus, SQL allows us to develop a complex logic very efficiently. When I ask Kentoku permission to write an article about his storage engine, he told me his philosophy like below:

I think that the most significant benefit to use RDB is its usefulness and flexibility. It is a very important characteristic for developers in order to keep the application competitive, especially for those developers who have to add new features/functionalities day by day, like web services. I develop SPIDER storage engine in order to provide developers such useful and flexible RDB's characteristics, even on the environment where the traffic and data is huge thus Sharding is required.

I 100% agree with his opinion. If you are facing the problem caused by high traffic and huge data just like twitter, please consider to use SPIDER storage engine before migrating to NoSQL solutions.

19 comments:

AnonymousMarch 30, 2010 at 8:45 PM
Are there QPS stats for how Spider stacks up against other engines?

Jonas' stats for NDB have set the bar pretty high:
http://jonasoreland.blogspot.com/2010/03/fast-io-and-ndb.html
ReplyDelete
Replies
Mikiya OkunoApril 9, 2010 at 10:06 AM
Hi Gary,

No. While only InnoDB is compared to SPIDER+InnoDB in the slides, SPIDER can utilize any storage engines. Even NDB can be an underlying storage engine combined with SPIDER. So, comparison will not make sense.

Kind regards,
--
Mikiya
ReplyDelete
Replies
Internet MarketingJuly 26, 2010 at 12:56 PM
Thanks for sharing.Did you know that Spider is the first step when accessing a remote database and a storage engine?
ReplyDelete
Replies
AnonymousSeptember 21, 2010 at 6:44 AM
what about this thing called data integrity? if you're using sharding, there is no longer referential or transactional integrity, the results could be disastrous in an environment that's anything more than some blog site
ReplyDelete
Replies
comment systemJune 15, 2011 at 5:23 PM
I never heard about Spider storage engine, looks pretty good, do you think it can handle a traffic of, lets say, 1000 people per day?
ReplyDelete
Replies
UnknownJanuary 24, 2012 at 4:25 PM
Packers Movers Online is the free Packers Movers Directory to get the information about Packers And Movers, Movers And Packers in india, packing moving services providers in India.
ReplyDelete
Replies
Todaydownload.comJuly 18, 2012 at 9:02 PM
These facts are really interesting. Few of them were well known for me but many of them were brand new for me too!
I will print this one out and show to my friends because they will be definitely interested in that. Thanks!
MySQL
ReplyDelete
Replies
HammedJanuary 25, 2017 at 4:33 PM
Thank you for taking the time to provide us with your valuable information. We strive to provide our candidates with excellent care and we take your comments to heart.As always, we appreciate your confidence and trust in us.
SEO Company in Chennai
ReplyDelete
Replies
UnknownMarch 14, 2017 at 7:07 PM
Nice article and explanation is good,Thank you for sharing your experience on mysql.you have clearly explained about the process thus it is very much interesting and i got more information from your blog.For more information please visit our website..
Oracle Fusion Training Institute
ReplyDelete
Replies
UnknownJune 27, 2018 at 2:29 PM
Wow, this blog is very nice I really like your blog and i am Impressed thank you very much for posting this blog.
Please Click Here for more details...
Oracle ASCP Training in Al Karama, Dubai.
ReplyDelete
Replies
AnonymousJuly 18, 2019 at 7:56 AM
kindly check below blog for more details about spider storage engine

http://www.mafiree.com/blogs.php
ReplyDelete
Replies
Digital ShoutDecember 23, 2020 at 7:25 PM
Superb post, we enjoyed each and everything as per written in your post. Thank you for this informative article because it’s really helpful, We are the Best Digital marketing company who will help you with your plan’s to execute & optimize your marketing challenges. We will work & collaborate with senior members to track and measure performance, and provide results for all your Digital Marketing challenges
ReplyDelete
Replies
NRI LIFEApril 11, 2021 at 2:08 AM
It’s an amazing blog regarding the topic Not Only NoSQL!! Uber Scaling-Out with SPIDER storage engine.
I’m here to share with you a tip to Expand your Business @ Zero Cost! Post Free Ads For B2B & B2C Needs at nrilife.com!
ReplyDelete
Replies
NRI LIFEApril 27, 2021 at 2:57 AM
It’s an amazing blog regarding the topic Not Only NoSQL!! Uber Scaling-Out with SPIDER storage engine.
I’m here to share with you a tip to Expand your Business @ Zero Cost! Post Free Ads For B2B & B2C Needs at nrilife.com!
ReplyDelete
Replies
PhotoMama.inMay 12, 2021 at 5:04 PM
It’s an amazing blog regarding the topic regarding the topic Not Only NoSQL!! Uber Scaling-Out with SPIDER storage engine.
Photomama.in is one of The Best Candid Wedding Photographers in Hyderabad.
ReplyDelete
Replies
Edit Point IndiaMay 18, 2021 at 2:57 AM
Thanks For The Marvelous Blog Regarding The topic Not Only NoSQL!! Uber Scaling-Out with SPIDER storage engine.

For Any Wedding Invitation Video Templates
Visit editpointindia.in
ReplyDelete
Replies
AnonymousNovember 17, 2024 at 6:43 PM
Great and that i have a keen present: Where Do You Get The Money To Renovate A House home exterior makeover
ReplyDelete
Replies
DigisupermartAugust 6, 2025 at 1:31 AM
Your detailed exploration of sharding strategies, performance tuning, and trade‑offs between relational and NoSQL setups is both insightful and practical. Real‑world context around Uber’s architecture is particularly valuable.

At https://digisupermart.com, our digital services platform also focuses on scalable technical solutions, and this article offers excellent guidance for architects working on high‑traffic environments. Thank you for sharing such comprehensive content!

Digital Marketing Agency In Hyderabad
ReplyDelete
Replies
breekellyDecember 18, 2025 at 8:13 PM
TechSprint Innovations is the best digital marketing company in Warangal, offering result-driven solutions that help businesses grow online. With expertise in SEO, social media marketing, PPC advertising, content marketing, web design, and brand strategy, TechSprint Innovations delivers measurable results tailored to every client’s needs.
ReplyDelete
Replies

Add comment

Samurai MySQL

2010-03-30

Not Only NoSQL!! Uber Scaling-Out with SPIDER storage engine

19 comments:

Mikiya Okuno

Followers

Blog Archive

Labels

Privacy Policy