返回首页

Table Design and MySQL Index Details—表的设计以及MYSQ

时间:2010-08-28 21:07来源:未知 作者:admin 点击:
Its not often I get to work with a true homemade database design. In this case the customer (or their developer to be accurate) had designed the whole database backend of their website from scratch. However, problems arose as usual: An overly powerful ser
  

It’s not often I get to work with a true homemade database design. In this case the customer (or their developer to be accurate) had designed the whole database backend of their website from scratch. However, problems arose as usual: An overly powerful server seeing sustained high loads and CPU usage by MySQL—which is actually an understatement given that the load would spike to 50+. After enabling slow query logging and examining the log a day later one query stood out in terms of frequency:

我工作时经常不用自己设计的数据库.在这种情况客户(准确的说是他们的开发者)已经为他们的后台设计了整个后端数据库.然而,问题还是像平常一样出 现:一个很强大的服务器看上去仍然超负荷了,并且CPU的使用率实际上是MYSQL的使用提高引起的.使用慢查询日志之后,检查一天的LOG后,频繁的出 现一个 SQL QUERY.

# Query_time: 7 Lock_time: 1 Rows_sent: 2 Rows_examined: 45023
SELECT * FROM hints WHERE game_id = 374 ORDER BY date DESC;

Seven seconds is bad for a query this simple, as is examining 45,000 rows to return 2. Furthermore, this query was logged over 4,000 times in one day, meaning MySQL spent about 28,000 seconds a day on this query, or 7 hours. Let’s first consider design and later we’ll see how it effects the details of indexes. (You may notice the numbers between examples don’t match; this is because some of the examples were really taken from the production server and others were re-created lab experiements.)

7秒对于这么简单的查询是不理想的,它查询了45000条数据,仅返回2条数据.此外,这个请求一条被记录4000次,意味着MYSQL一天花费了 28000秒在这个请求上,大约7个小时.让我们首先考虑一下设计,然后我们将看一下怎么建立更有效的索引.(你也许注意到事例中的数字不对;那是因为一 些例子)
mysql> DESCRIBE hints;
+————+——————+——+—–+———+—————-+
| Field      | Type             | Null | Key | Default | Extra          |
+————+——————+——+—–+———+—————-+
| hints_id   | int(10) unsigned |      | PRI | NULL    | auto_increment |
| game_id    | int(11)          |      |     | 0       |                |
| hint_title | text             |      |     |         |                |
| hint       | text             |      |     |         |                |
| status     | text             |      |     |         |                |
| date       | text             |      |     |         |                |
+————+——————+——+—–+———+—————-+
Unless there’s some special condition I was not made aware of, the date column should not be type TEXT. What complicates the issue more is that the data in the date column is not normalized. That is, there are different representations of data (dates) in the column: Some dates are typical YYYY-MM-DD format and others are Unix timestamps. The first issue this creates is increased code complexity: Having to account for data that may be in different formats. Secondly, it’s space inefficient. A date as TEXT will require 10 + 2 bytes or, times 45,000 rows, 540k. As a DATE column type, 3 bytes or 135k. The third issue involves fixinig the query: How to index a date column as TEXT. Indexes on TEXT columns require a prefix length; that is, since TEXT columns are truly variable you have to tell MySQL how much of it you want to index. In this case, since the data is not normalized and everything counts down to the last second for dates and times, we would have to index 10 bytes, another 450k for the index (minus compression). Finally, it’s error prone: A DATE column type will reliaibly have a date. A TEXT column may have a DATE and a recipe for biscuits. Perhaps I’m just being overly zealous and picky; afterall, what’s another 450k here or there, or a little more code to determine if the date is DATE or a Unix timestamp? Nothing, actually, but the point is: Why add these complications when they’re completely unnecessary? Just use a DATE colum type. In short: KISS.

除非有一些我不知道的情况,否则DATE列不应该是text类型的.使问题更复杂的是date列的数据不规范.有同一个列中有不同的数据描述:一些 日期是典型的YYYY-MM-DD格式,然而另一些是timstamps格式.首先这个问题产生代码复杂度:必须描述不同的日期格式.第二,浪费空间.一 个TEXT类型 的日期要12个字节,45000倍就是540K.如果是DATE列,3个字节,135K.第三
是问题包含确定请求:怎么去索引一个TEXT类型的日期列.在TEXT列上索引需要前缀长度.因为TEXT是可变的,你必须告诉MYSQL你索引它的多少位.另外450K.最后,它很容易出错.

Index Details
mysql> SHOW INDEXES FROM hints;
+——-+————+———-+————–+————-+————-+———-+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Cardinality | Sub_part |
+——-+————+———-+————–+————-+————-+———-+
| hints |          0 | PRIMARY |            1 | hints_id    |       45021 |     NULL |
+——-+————+———-+————–+————-+————-+———-+

mysql> EXPLAIN SELECT * FROM hints WHERE game_id = 374 ORDER BY date DESC;
+——-+——+—————+——+———+——+——-+—————————–+
| table | type | possible_keys | key | key_len | ref | rows | Extra                       |
+——-+——+—————+——+———+——+——-+—————————–+
| hints | ALL | NULL          | NULL |    NULL | NULL | 45021 | Using where; Using filesort |
+——-+——+—————+——+———+——+——-+—————————–+
MySQL executes the query by doing a full table scan (type: ALL) plus a filesort (caused by ORDER BY). Not surprising since there are no indexes MySQL can use (possible_keys: NULL). We at least want to prevent a table scan and reduce the number of rows examined. To do this we can create an index, but the result isn’t going to be as spectacular as in was in Case 1: Basic Indexes:

MSYQL以全表扫描方式再加上filesort(order by 引起的)执行请求 .     如果MYSQL没有使用索引,你不要吃惊.我们至少要阻止全表扫描, 减少检查的行数.为达到目的,我们建一个索引,但是结果并没有达到在第一例中的那么大的效果.

mysql> CREATE INDEX game_id_date ON hints (game_id, date(10));
Query OK, 1001 rows affected (0.04 sec)
Records: 1001 Duplicates: 0 Warnings: 0

mysql> EXPLAIN SELECT * FROM hints WHERE game_id = 374 ORDER BY date DESC;
+——-+——+—————+————–+———+——-+——+—————————–+
| table | type | possible_keys | key          | key_len | ref   | rows | Extra                       |
+——-+——+—————+————–+———+——-+——+—————————–+
| hints | ref | game_id_date | game_id_date |       5 | const |    1 | Using where; Using filesort |
+——-+——+—————+————–+———+——-+——+—————————–+

Success: No table scan and only one row examined. However, Extra doesn’t say “Using index” even though our multi-column key includes both game_id and date. MySQL won’t retrieve the values from the index because “SELECT * ” requires more values: It requires all 6 columns for every matching row but the index only has 2 columns. In Case 1 only columns that were part of the key were included in the SELECT criteria, which is why things worked out better there than here. If we alter the table we can at least get rid of the filesort. The filesort is caused by date being TEXT. If we drop the index, normalize the data, change the column type to DATE, and re-add the index (these commands left out for brevity):

成功:没有全表扫描,仅有一行检查.然而,extra 没有说’usring index’,尽管我们的多列键包含了game_id,date.MYSQL没有从索引中检索值,因为”SELECT *” 需要列多的值:它需要所有的6列为每一个匹配行,但是索引仅有2列.在CASE 1(第一个讲座中)仅有一的列是包含的SELECT 列的一部分.如果我们改变表,我们至少能去掉usring filesort.filesort是因为date的TEXT引起的.如果我们删除索引,格式化数据,把text改成date,然后再重新加上索引:
mysql> EXPLAIN SELECT * FROM hints WHERE game_id = 374 ORDER BY date DESC;
+——-+——+—————+————–+———+——-+——+————-+
| table | type | possible_keys | key          | key_len | ref   | rows | Extra       |
+——-+——+—————+————–+———+——-+——+————-+
| hints | ref | game_id_date | game_id_date |       5 | const |    1 | Using where |
+——-+——+—————+————–+———+——-+——+————-+
Not a slow query anymore. And it goes to show that using efficient column types is important, unless you like filesorts.

将不再是一个慢查询.它展示出了使用一个高效的列类型是多么重要.

A Bigger, Slower Query

一个更大的,更慢的查询.
The following query was logged about 3,700 in the same day:

下列的查询请求在一天内被记录了3700次.

# Query_time: 8 Lock_time: 0 Rows_sent: 1 Rows_examined: 303908
SELECT * FROM links WHERE
game_id = 5 OR (other_id = 200 AND class = ‘articles’) AND
has_link = ‘true’ LIMIT 0,1;

Of the slow queries logged (there were four total), this query was first in terms of the number of rows examined (the previous query we looked at was second). Let’s start by looking at the existing indexes and EXPLAIN (the table structure won’t help us and it’s sufficient to say that any column ending in “_id” is an INTEGER and all others are, unfortunately, TEXT):

这个查询被记录的很多,检查的行也很多.让我们首先来看一下是否有索引,然再再EXPLAIN.
mysql> SHOW INDEXES FROM links;
+——-+————+———-+————–+————-+————-+———-+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Cardinality | Sub_part |
+——-+————+———-+————–+————-+————-+———-+
| links |          0 | PRIMARY |            1 | links_id    |      307551 |     NULL |
| links |          1 | other_id |            1 | other_id    |        NULL |     NULL |
| links |          1 | game_id |            1 | game_id     |        NULL |     NULL |
+——-+————+———-+————–+————-+————-+———-+

mysql> EXPLAIN SELECT * FROM links WHERE
-> game_id = 5 OR (other_id = 200 AND class = ‘articles’) AND
-> has_link = ‘true’ LIMIT 0,1;
+——-+——+——————+——+———+——+——–+————-+
| table | type | possible_keys    | key | key_len | ref | rows   | Extra       |
+——-+——+——————+——+———+——+——–+————-+
| links | ALL | other_id,game_id | NULL |    NULL | NULL | 307551 | Using where |
+——-+——+——————+——+———+——+——–+————-+
Of course EXPLAIN predicts a full table scan (type: ALL) because there’s no suitable key to do otherwise (key: NULL). Something has to be done about this query but simply adding an index like we usually do isn’t so easy anymore because there’s multiple conditions. In this case we get lucky:

当然EXPLAIN预计一个整表扫描,因为没有合适的键.这个请求有一些事情要做了,而不是简单的像我们以前那样加个索引,因为它有多个条件.在这样情况下,我们得到幸运:
mysql> SELECT DISTINCT(has_link) FROM links;
+———-+
| has_link |
+———-+
| true     |
+———-+
Ah ha! If every has_link = ‘true’ then we probably don’t need that AND condition. (Not to mention, referring to effecient column types again, has_link as TEXT 1.8 Megs, as BOOL (TINYINT) 308k.) Now the query becomes:

哈哈!如果每一个has_link=’true’,我们可以不需要AND条件了.请求变成:
SELECT * FROM links WHERE game_id = 5 OR (other_id = 200 AND class = ‘articles’) LIMIT 0,1;

We would like MySQL to use an index to find matching rows (avoid a table scan) but this won’t be possible given the OR conditions. For it to be possible, the conditions have to form a leftmost prefix of an index. In this case there are two conditions, one on each side of OR. You can’t have one leftmost prefix OR another leftmost prefix for the same index. In other words, an index can’t start with game_id OR other_id. Therefore, MySQL 4.x cannot use an index with this query. We’re left with two options: Reduce the query even further or use MySQL 5.x.

我们将让MYSQL使用一个索引去找到匹配的行,来避免全表扫描,但这样OR条件将不能使用索引.为了可能,条件必须以一个索引的最左前缀的形式. 在这个例子中,有两个条件,OR的两边各一个条件.同一个索引中,你不可能把OR的左边和右边都设为最多前缀.换句话说,一个索引要么以game_id开 头要么以other_id.所以,MYSQL4不能在这个请求中用索引.

In this case we can simply make the query into two queries:
mysql> EXPLAIN SELECT * FROM links WHERE game_id = 5 LIMIT 0,1;
+——-+——+—————+——+———+——-+——+————-+
| table | type | possible_keys | key | key_len | ref   | rows | Extra       |
+——-+——+—————+——+———+——-+——+————-+
| links | ref | gid           | gid |       5 | const |   36 | Using where |
+——-+——+—————+——+———+——-+——+————-+

mysql> EXPLAIN SELECT * FROM links WHERE other_id = 200 AND class = ‘articles’ LIMIT 0,1;
+——-+——+—————+——+———+——-+——+————-+
| table | type | possible_keys | key | key_len | ref   | rows | Extra       |
+——-+——+—————+——+———+——-+——+————-+
| links | ref | oid           | oid |       5 | const |    1 | Using where |
+——-+——+—————+——+———+——-+——+————-+
This is a much better approach: These two queries can execute in a fraction of the time than the original. And since either or will match what we’re looking for we can try the first query, if it matches we’re done, if not try the second.

这样更加接近:

在这种情况下,在MYQL5 中支持 index merge.

Or we can use the index merge ability in MySQL 5. Don’t feel alone if you’ve ever wanted to yell at MySQL 4 for not just using multiple indexes when it was clear that was the solution. MySQL 5 has heard our lament:
mysql> EXPLAIN SELECT * FROM links WHERE game_id = 5 OR (other_id = 200 AND class = ‘articles’) LIMIT 0,1;
+—-+————-+——-+————-+—————+———+———+——+——+————-+
| id | select_type | table | type        | possible_keys | key     | key_len | ref | rows | Extra       |
+—-+————-+——-+————-+—————+———+———+——+——+————-+
| 1 | SIMPLE      | links | index_merge | gid,oid       | gid,oid | 5,5     | NULL |   37 | Using where |
+—-+————-+——-+————-+—————+———+———+——+——+————-+
Notice in the previous two queries the first matched 36 rows, the second 1 for 37 total and this index merge matches 37 rows—fun stuff huh? Makes you feel better that “it all adds up.”

Final Thoughts
The customer had the correct idea: Redesign the whole database. Although the queries could have been fixed well enough to keep the server load reasonable, this is a case where form hinders function. A well designed database—following normal form rules, normalizing data, using efficient column types, reducing query conditions—is much easier to optimze.

顶一下
(1)
100%
踩一下
(0)
0%
------分隔线----------------------------
最新评论 查看所有评论
发表评论 查看所有评论
请自觉遵守互联网相关的政策法规,严禁发布色情、暴力、反动的言论。
评价:
表情:
用户名: 密码: 验证码:
发布者资料
小朱 查看详细资料 发送留言 加为好友 用户等级:超级会员 注册时间:2008-11-18 17:11 最后登录:2012-02-06 13:02
推荐内容
热点内容