Samurai MySQL

A Japanese Geek's MySQL Blog.

2010-03-30

Not Only NoSQL!! Uber Scaling-Out with SPIDER storage engine

The history tells that a single RDBMS node cannot handle tons of traffics on web system which come from all over the world, no matter how the database is tuned. MySQL has implemented a master/slave style replication built-in for long time, and it has enabled web applications to handle traffics using a scale-out strategy. Having many slaves has been suitable for web sites where most of traffics are reads. Thus, MySQL's master/slave replication has been used on many web sites, and is being used still.

However, when a site grow large, amount of traffic may exceed the replication's capacity. In such a case, people may use memcached. It's an in-memory, very fast and well-known KVS, key value store, and its read throughput is far better than MySQL. It's been used as a cache for web applications to store 'hot' data with MySQL as a back-end storage, as it can reduce read requests to MySQL dramatically.

While 1:N replication can scale read workload and memcached can reduce read requests, it cannot ease write load well. So, write traffic gets higher and higher when a web site becomes huge. On such web sites, a technique called "Sharding" has been used; it's a technique that the application choose an appropriate MySQL server from several servers.

In that way, MySQL+memcached has been a de-fact standard data store on huge web sites for long time.

2009-12-30

awk and mysqldump

Awk, which has been existing for long time on UNIX like operating system, may sound legacy for people who live in modern web-based programming world. However, awk is really sophisticated and useful tool for various purposes. Today, I'll show you how to use it with mysqldump ;)

2009-12-28

MySQL Cheat Sheet v1.0

For those (or those who have friends) who newly start using MySQL, I release a simple "MySQL Cheat Sheet". You can print it in A4 size pretty. Print it out and put it on the wall in front of you ;)

You can download it from here: http://www.mysqlpracticewiki.com/files/cheat-sheet-en.pdf

The license is CC-BY-SA. You can redistribute it and modify it under CC license.

Happy New Year!!

2009-07-15

Expectation For Chrome OS

I was really impressed by the news; Google plans to release "Chrome OS". When I heard it, I felt like some big puzzle in my mind was resolved as if I had a key hole and the news was a key. Most of people may think that Chrome OS will be one of Linux distribution simply which implements Chrome OS. But totally I don't think so. It will never be such a boring OS, I guess!! When I imagine the upcoming OS, I am a bit excited.

While the technology details have not been published, I expect that Chrome OS architecture will be like below:


Please note that this is my personal expectation, so the actual architecture will differ ;) If the act will totally different from this, please go ahead and laugh about this picture!

Anyway, let's go to the details about imaginary Chrome OS architecture in my mind.

1. UI = Chrome Browser

I'm confident that Chrome Browser will be an only UI which a user can access. There will be no UNIX shells which we are familiar with. Geeks may say "OS without shells are not UNIX!". It's true. But Google doesn't target such geeks at all. All operation will be performed via Chrome Browser.

2. Local Web Server

If my assumption above is correct, Chrome OS will have a web server inside like lighttpd or Apache for its own use. Why? If the only UI which a user can access is Chrome Browser, how does he or she shutdown the computer? I expect that Chrome OS will have a system management page which allows a user to manage or configure the system, which is a kind of Web application running on the local web server.

3. User Application Runs On Web Server

It is needless to say, Chrome OS will make use of Google cloud. However, it's a not good idea that doing everything, even a trivial thing, on the cloud. Will Chrome OS run Calculator application on the cloud? I don't think so. If Chrome OS will do such a stupid thing, Google cloud will not able to handle megatons of requests. Then, such a trivial application will run on the client using Javascript or as an web application. In either case, Chrome OS will have rich applications, because there are lots of web programmers who can develop such applications all over the world.

Why this is interesting is; if Chrome OS can open up the network port to the internal web server, users can access the application running on the own computer for each other. This means that the desktop application suddenly turn out to be a web application! Sounds exciting? In order to keep the security, Chrome OS may use Open IDs provided by Google.

4. Good Bye Files

Probably, this could be the most significant change upon Chrome OS. I expect that Chrome OS will not have any method to access files on the computer like "Windows Explorer". Using files for data store and arranging files on a tree based filesystem are really primitive, such a method has been used since dinosaurs ruled the earth. One should imagine the content of file based on the filename, but a filename does not always describe its content correctly, so managing data using files is messy. People really want to do so? No! They simply want to store data, keep data and search data quickly when needed. Using files upon tree based filesystem is not the only way to manage data, and it's an inefficient way.

Instead, Chrome OS will store data on a database management system. Local web applications will access database as if usual web applications make use of underlying database nowadays. Then, people will leave away from managing files.

If so, what kind of RDBMS will be used on Chrome OS? The most probable answer is "sqlite". However, the problem is that sqlite cannot handle queries quickly when data grows large. Then, Chrome OS should employ more scalable RDBMS like MySQL, firebird or PostgreSQL IMHO. I strongly recommend Google to use MySQL as a MySQLer. If Google thinks that GPLv2 is not suitable for Chrome OS, and that's why MySQL is not employed by Chrome OS, please consider to apply FOSS Exception, which enables OSS project to release the software using other licenses than GPLv2 if the license is listed in: http://www.mysql.com/about/legal/licensing/foss-exception/

5. Robust Security

On Chrome OS, a user will be unable to invoke programs like we do now, and it will be unnecessary. Since the only program which user will access is Chrome Browser, it can be invoked by OS in advance. OS can invoke the browser as "root" user, then it can change the user afterward using setuid() system call. In this way, Chrome Browser program image on the filesystem can be owned by root and can be invoked by root only. An ordinal user doesn't have to invoke any programs on Chrome OS, because a user can do everything upon the browser. Even an ordinal user doesn't have to access any files on the computer at all.

In this case, even if the browser is cracked and operated by a malicious user, the browser cannot invoke any programs! How secure such OS is?! In addition, people will be able to get off scanning virus on the computer everyday. Because there will be no user programs to scan!

.....

While the above is completely my personal imagination. I'm lost in wild fancies like this, however if it is true, Chrome OS can be recognized as "It's our attempt to re-think what operating systems should be." as described in the official blog. AFAICT, I will be a fun of Chrome OS if it is true :)

2009-04-06

Distribution Awareness - Must For MySQL Cluster

"Performance will go down when number of data nodes increases on MySQL Cluster setup"

Have you ever heard such rumor? This is a little bit true, but mostly false. Such performance disadvantages are caused by badly designed schema in most cases. Schema design is really important in order to bring out MySQL Cluster performance.

MySQL Cluster stores and distribute rows amongst node groups according to PRIMARY KEY per row. If there is no bias on its PRIMARY KEY values, tables are distributed evenly. MySQL Cluster stores data like sharding.

Lookups based on PRIMARY KEY is lightning fast on MySQL Cluster, because the SQL node knows which data node have the target row. On the other hand, if a table is accessed without PK, the SQL node doesn't know where the target rows are stored and should access all the data nodes, thus performance will go down. Look at two figures below:

Lookup by PK


Read without PK


When looking-up by PK, the SQL node send a single request to the data node which has the row, and the data node responds. However, looking-up without PK specified, the data node which receives a request transfers the request to other data nodes. This increases the network traffic between the nodes significantly. (These two operations are handled completely differently within NDBAPI.)

When a SQL nodes starts a transaction, it select one data node as a transaction coordinator, TC. TC is responsible for the transaction until the transaction finishes, and all communications between SQL node and data node go through TC. So, even if lookups are based on PK, additional communications between data nodes may be required when accessing second or later tables within the transaction. See the figure below.

These problems will be exposed when the number of data nodes is increased. Then, the network will become the bottle-neck. The former case causes many network traffics, and the latter case could be impacted when more than one tables are updated within a transaction.

MySQL Cluster, as of 6.3, implements a feature which can mitigate these problems - so called Distribution Awareness. With Distribution Awareness, developers can design the schema so that they are distributed by any keys included in PK. We can expect that tables are accessed more efficiently with Distribution Awareness.

We can see the effect of Distribution Awareness with benchmark tests such as DBT-2. When I tested its effect on my PC, I got the following result:


This benchmark is measured using a single machine, and data nodes are created on the same OS instance. While this benchmark doesn't reflect the real world's performance because the cluster is configured with a single machine, it would help understanding how Distribution Awareness impact the performance.

Benchmark environment overview:
  • OS:Ubuntu 8.10
  • MySQL Cluster 6.3.23
  • CPU:AMD/2 cores/2.3GHz
  • Physical Memory:8GB
  • HDD:SATA 7200rpm
  • DataMemory=400MB
  • IndexMemory=64M
  • NoOfReplicas=2
  • Data Nodes:10 (5 node groups)
  • SQL Node:1
  • Benchmark Software: DBT-2/# warehouses=5

Each result stands for the following.

InnoDB

This is just for comparison. MySQL Server is configured with innodb_buffer_pool_size=4G so that all data is cached in the memory.

NDB w/o DA

Distribution Awareness does not take effect. Data is loaded as MyISAM firstly, then they are simply converted to NDB using ALTER TABLE tbl_name ENGINE NDB statement.

NDB w/DA

Distribution Awareness takes effect. Most of tables on DBT-2 test have a column refers PK on `warehouses` table, and such a column is included in PK. So, I altered the table using the following statements so that they are distributed by warehouse id.
mysql> alter table customer engine ndb partition by key (c_w_id);
mysql> alter table district engine ndb partition by key (d_w_id);
mysql> alter table history add h_rowid int unsigned not null auto_increment, add primary key (h_rowid, h_w_id);
mysql> alter table history engine ndb partition by key (h_w_id);
mysql> alter table item engine ndb;
mysql> alter table new_order engine ndb partition by key (no_w_id);
mysql> alter table order_line engine ndb partition by key (ol_w_id);
mysql> alter table orders engine ndb partition by key (o_w_id);

You can distribute rows amongst data nodes based on a given key using PARTITION BY KEY clause. As the key should be included in the explicitly defined PK in order to use PARTITION BY KE clause, I added a new AUTO-INCREMENT column to `history` table.

NDB w/full DA

While `item` table is not correlated to `warehouses` table at all, Distribution Awareness will not take effect when `item` table and other tables are accessed within a single transaction. In order to make Distribution Awareness always effective, I converted `item` table and `warehouse` table to MyISAM.
mysql> alter table item engine myisam;
mysql> alter table warehouse engine myisam;

NDB single node

This is just for comparison, with the following configuration.
DataMemory=4GB
IndexMemory=640MB
NoOfReplica=1


Considerations

You can see a big difference with or without Distribution Awareness. The benchmark was best when all transactions are fully distribution aware, which was achieved by converting an uncorrelated table to MyISAM. While `item` is a kind of read-only table, it can be deployed as a MyISAM table practically in production.

MySQL Cluster's shared nothing architecture is very similar to sharding. So, when tables are defined like sharding and accessed like sharding, MySQL Cluster demonstrates the best performance. For example, sharded application accesses its underlying database using user-id by determining which database node has required data. In that case, user-id is always required when accessing a database. Similarly, in order to make SQL query distribution aware on MySQL Cluster, condition like "AND userid = 12345" should be always specified.

If Distribution Awareness does not take effect, the network will become a bottle-neck when number of data nodes increases. This causes performance degradation when data nodes increase. On the other hand, if Distribution Awareness takes effect properly, the network may rarely be the bottle-neck in this way. So, the performance is likely to scale as data node increases. So, whether tables and queries can be Distribution Aware or not is a very important point to determine if you will use MySQL Cluster or not in your application.

You can see if Distribution Awareness is effective or not using EXPLAIN PARTITIONS statement. If only a single partition is listed, the query is distribution aware!

mysql> EXPLAIN PARTITIONS SELECT no_o_id FROM new_order WHERE no_w_id = 2 AND no_d_id = 7\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: new_order
partitions: p5 <--- OK if only a single partition is listed like this
type: ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 8
ref: const,const
rows: 10
Extra:
1 row in set (0.00 sec)

The benchmark is done on a single machine. So, communications between nodes are faster than the real case because they do not go through a network. If the cluster is configured using multiple machines, difference between with or without Distribution Awareness will be more significant.

You may think like "MySQL Cluster is messy because I should take care of its distribution... Sharding is easier than it for me, isn't it?" However, you have to implement your own logic to separate load between databases when sharding, on the other hand, on MySQL Cluster, you can achieve the same thing by tuning your tables and queries a little bit. So the trade off is "Implementing own sharding logic" vs "Tuning tables and queries". Which is suitable for you? ;)


Conclusion

As the number of data nodes increases...
  • Performance goes down without Distribution Awareness.
  • Performance scales with Distribution Awareness.

In order to make use of Distribution Awareness, you have to tune both tables and queries:
  • CREATE TABLE ... ENGINE NDB PARTITION BY KEY (userid);
  • SELECT ... WHERE ... and userid = 12345;

You can determine if Distribution Awareness is take effect using...:
  • EXPLAIN PARTITIONS SELECT ...\G

You application may scale if it is distribution aware. MySQL Cluster's Distribution Awareness comes in handy because you don't have to implement additional logics on your application. So, please consider using MySQL Cluster if you are going to shard your application.

Enjoy!

2009-03-20

Settingup DBT-2

DBT-2 is a TPC-C like OLTP benchmark, and very popular amongst many MySQL users. It is used by MySQL QA team to test the stability and performance before release. However, steps to setup DBT-2 is a little bit messy, and its README files include some dummy information. So I introduce you these steps below:

1. Download it!

You can download the source code from here: http://osdldbt.sourceforge.net/

2. Required packages

The following perl packages are required to build DBT-2. Unfortunately, configure script doesn't complain even if they are missing. Install them using, e.g. CPAN.

shell> sudo cpan Statistics::Descriptive
shell> sudo cpan Test::Parser
shell> sudo cpan Test::Reporter

If you want to make a graph from the output, you have to install gnuplot in advance. e.g. Ubuntu users can install it like below:

shell> sudo apt-get install gnuplot

3. Compile it

You can build the software in very standard way, configure and make. For MySQL users, --with-mysql option is mandatory. If you want to specify install path, then use --prefix option.

shell> ./configure --with-mysql=/usr/local/mysql [options] && make
shell> sudo make install

Ubuntu users may need to install build-essentials package in advance.

shell> sudo apt-get install build-essentials

4. Generate and load data

shell> datagen -w 10 -d /var/tmp/dbt2 --mysql

datagen is a command to generate data under the given directory. -w option specifies the number of warehouse, more is larger data. If you want to test CPU bound test, use small value.

shell> scripts/mysql/build_db.sh \
-w 3 -d dbt2 -f /var/tmp/dbt2 -s /tmp/mysql.sock \
-h localhost -u user -p password -e INNODB

Then, you can load the data generated by datagen program using build_db.sh, located under scripts/mysql directory under the source code directory. The following indicates the size of data with 10 warehouses for example.

mysql> select table_name,sum(data_length) from tables where table_schema='dbt2' group by table_name with rollup;
+------------+------------------+
| table_name | sum(data_length) |
+------------+------------------+
| customer | 200081408 |
| district | 16384 |
| history | 25755648 |
| item | 11026432 |
| new_order | 4702208 |
| orders | 22626304 |
| order_line | 314572800 |
| stock | 381648896 |
| warehouse | 16384 |
| NULL | 960446464 |
+------------+------------------+
10 rows in set (0.54 sec)

5. Run benchmark

shell> export USE_PGPOOL=0
shell> export LD_LIBRARY_PATH=/usr/local/mysql/lib
shell> scripts/run_workload.sh -c 16 -d 600 -n -w 10 -s 10 -u user -x password

The benchmark is done using run_workload.sh script. This script accepts the following option.

-c ... thread concurrency
-d ... duration
-n ... no thinking time. do not miss this option if you want to see the database performance ;)
-w ... number of warehouse
-s ... delay before starting each new thread in millisecond
- u ... MySQL user name
-x ... MySQL password
-l ... MySQL TCP/IP port number
-H ... MySQL server hostname

Then, the benchmark runs during the period specified by -d option, and generates data under ./output directory. You may see the error like following. This error indicates that no data is written in the ./output directory. It is not a Test::Parser::Dbt2.pm problem at all!

Can't use an undefined value as an ARRAY reference at /usr/local/share/perl/5.10.0/Test/Parser/Dbt2.pm line 521.

One reason why this error appears is that libmysqlclient.so wasn't loaded correctly. Do not forget specifying the right PATH to the library before the benchmark.

shell> export LD_LIBRARY_PATH=/usr/local/mysql/lib


Enjoy!!