新聞中心
本文轉(zhuǎn)載自微信公眾號(hào)「DBA隨筆」,作者DBA隨筆。轉(zhuǎn)載本文請(qǐng)聯(lián)系DBA隨筆公眾號(hào)。

讓客戶滿意是我們工作的目標(biāo),不斷超越客戶的期望值來自于我們對(duì)這個(gè)行業(yè)的熱愛。我們立志把好的技術(shù)通過有效、簡(jiǎn)單的方式提供給客戶,將通過不懈努力成為客戶在信息化領(lǐng)域值得信任、有價(jià)值的長(zhǎng)期合作伙伴,公司提供的服務(wù)項(xiàng)目有:域名申請(qǐng)、虛擬主機(jī)、營(yíng)銷軟件、網(wǎng)站建設(shè)、棲霞網(wǎng)站維護(hù)、網(wǎng)站推廣。
今天在線上遇到了一個(gè)MySQL字符比較的問題,感覺很有意思,專門研究了下,估計(jì)大家都沒有遇到過,這里跟大家分享一下。
1.背景
背景介紹:
MySQL里面有一張表,根據(jù)where條件匹配查詢某一條記錄的時(shí)候,手誤輸入了一個(gè)空格,發(fā)現(xiàn)這一條數(shù)據(jù)仍然能查出來,我建了一個(gè)測(cè)試表,還原如下:
22:57:02> create table t00 (id int primary key,name varchar(10));
Query OK, 0 rows affected (0.01 sec)
22:57:11> insert into t00 values (1,'aaa'),(2,'bbb');
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
22:57:22> select * from t00 where name='aaa';
+----+------+
| id | name |
+----+------+
| 1 | aaa |
+----+------+
1 row in set (0.00 sec)
22:57:32> select * from t00 where name='aaa ';
+----+------+
| id | name |
+----+------+
| 1 | aaa |
+----+------+
1 row in set (0.00 sec)
插入(1,'aaa')這條記錄,使用where='aaa'和'aaa '這兩個(gè)條件去匹配,居然都能夠查到這條記錄。
一開始我懷疑是這個(gè)8.0.19版本MySQL實(shí)例配置有問題,換了一個(gè)5.5低版本的MySQL實(shí)例,再次測(cè)試,還是復(fù)現(xiàn)這個(gè)問題??磥聿皇前姹旧系膯栴},一定是某種配置的問題。
晚上回到家,又用了自己搭建的一個(gè)8.0.22版本的MySQL實(shí)例重新執(zhí)行上面的命令,竟然驚奇的發(fā)現(xiàn),不復(fù)現(xiàn)了。暈死。8.0.22版本測(cè)試的結(jié)果是:
23:35:30>>select * from t0;
+------+------+
| id | name |
+------+------+
| 1 | aaa |
| 2 | bbb |
+------+------+
2 rows in set (0.01 sec)
23:35:34>>select * from t0 where name='aaa';
+------+------+
| id | name |
+------+------+
| 1 | aaa |
+------+------+
1 row in set (0.00 sec)
23:35:46>>select * from t0 where name='aaa ';
Empty set (0.00 sec)
2.分析思路
1)為什么'aaa'和'aaa '一樣?
首先我用命令在MySQL上檢測(cè)了一下這兩個(gè)字符串在MySQL中是否一樣:
### MySQL實(shí)例一
23:39:09> select 'aaa' = 'aaa ';
+------------------+
| 'aaa' = 'aaa ' |
+------------------+
| 1 |
+------------------+
1 row in set (0.00 sec)
### MySQL實(shí)例二
23:35:54>>select 'aaa' = 'aaa ';
+------------------+
| 'aaa' = 'aaa ' |
+------------------+
| 0 |
+------------------+
1 row in set (0.00 sec)
從上面的結(jié)果可以看出來,這兩個(gè)實(shí)例上,關(guān)于字符的比較規(guī)則不一樣。
到這里,可能部分同學(xué)就已經(jīng)知道答案了。不過還是往下再看看。
2)比較規(guī)則哪里不一樣?
我們可以用下面的命令,先看一下utf8相關(guān)的字符集下的比較規(guī)則,如下:
23:45:18> show collation like 'utf8%';
+----------------------------+---------+-----+---------+----------+---------+---------------+
| Collation | Charset | Id | Default | Compiled | Sortlen | Pad_attribute |
+----------------------------+---------+-----+---------+----------+---------+---------------+
| utf8mb4_0900_ai_ci | utf8mb4 | 255 | Yes | Yes | 0 | NO PAD |
| utf8mb4_0900_as_ci | utf8mb4 | 305 | | Yes | 0 | NO PAD |
| utf8mb4_0900_as_cs | utf8mb4 | 278 | | Yes | 0 | NO PAD |
| utf8mb4_0900_bin | utf8mb4 | 309 | | Yes | 1 | NO PAD |
| utf8mb4_bin | utf8mb4 | 46 | | Yes | 1 | PAD SPACE |
| utf8mb4_croatian_ci | utf8mb4 | 245 | | Yes | 8 | PAD SPACE |
| utf8mb4_cs_0900_ai_ci | utf8mb4 | 266 | | Yes | 0 | NO PAD |
| utf8mb4_cs_0900_as_cs | utf8mb4 | 289 | | Yes | 0 | NO PAD |
| utf8_unicode_ci | utf8 | 192 | | Yes | 8 | PAD SPACE |
........
| utf8_vietnamese_ci | utf8 | 215 | | Yes | 8 | PAD SPACE |
+----------------------------+---------+-----+---------+----------+---------+---------------+
103 rows in set (0.00 sec)
在最后一列,我們可以看到一個(gè)pad屬性,這個(gè)屬性里面包含2個(gè)值,分別是no pad 和pad space。
3)嘗試去官方文檔中查找這倆屬性的意思
果然,不出意外,找到了一些蛛絲馬跡:
https://dev.mysql.com/doc/refman/8.0/en/char.html
To determine the pad attribute for a collation, use the INFORMATION_SCHEMA COLLATIONS table, which has a PAD_ATTRIBUTE column.
For nonbinary strings (CHAR, VARCHAR, and TEXT values), the string collation pad attribute determines treatment in comparisons of trailing spaces at the end of strings. NO PAD collations treat trailing spaces as significant in comparisons, like any other character. PAD SPACE collations treat trailing spaces as insignificant in comparisons; strings are compared without regard to trailing spaces.
上面這段話描述的意思大概是:
要確定排序規(guī)則的填充屬性,請(qǐng)使用 information_schema.collations 表,該表具有 pad_attribute 列。
對(duì)于非二進(jìn)制字符串(char,varchar和text),字符串的填充屬性決定了比較字符串末尾空格時(shí)的處理方式。
NO PAD 排序規(guī)則將尾隨空格視為重要的比較,更加嚴(yán)格,就像任何其他字符一樣;
PAD SPACE 排序規(guī)則在比較中將尾隨空格視為無關(guān)緊要,比較字符串時(shí)不考慮尾隨空格,也就是有無空格一個(gè)樣。
這里我們就可以根據(jù)實(shí)際使用的比較規(guī)則來查看對(duì)應(yīng)的pad屬性了:
先看實(shí)例一:
### MySQL實(shí)例一
00:01:31>show variables like '%colla%';
+-------------------------------+--------------------+
| Variable_name | Value |
+-------------------------------+--------------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8mb4_0900_ai_ci |
| collation_server | utf8mb4_0900_ai_ci |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.01 sec)
00:01:45>select collation_name,character_set_name,pad_attribute from information_schema.collations where collation_name like 'utf8_gen
eral_ci';
+-----------------+--------------------+---------------+
| collation_name | character_set_name | pad_attribute |
+-----------------+--------------------+---------------+
| utf8_general_ci | utf8 | PAD SPACE |
+-----------------+--------------------+---------------+
1 row in set (0.00 sec)
再來看實(shí)例二:
### 實(shí)例二
mysql--root@localhost:(none) 23:53:52>>show variables like '%colla%';
+-------------------------------+--------------------+
| Variable_name | Value |
+-------------------------------+--------------------+
| collation_connection | utf8mb4_0900_ai_ci |
| collation_database | utf8mb4_0900_ai_ci |
| collation_server | utf8mb4_0900_ai_ci |
| default_collation_for_utf8mb4 | utf8mb4_0900_ai_ci |
+-------------------------------+--------------------+
4 rows in set (0.00 sec)
00:03:47>>select collation_name,character_set_name,pad_attribute from information_schema.collations where collation_name like 'utf8mb4_0900_ai_ci';
+--------------------+--------------------+---------------+
| collation_name | character_set_name | pad_attribute |
+--------------------+--------------------+---------------+
| utf8mb4_0900_ai_ci | utf8mb4 | NO PAD |
+--------------------+--------------------+---------------+
1 row in set (0.00 sec)
到這里,真相大白。
實(shí)例一的連接比較規(guī)則是utf8_general_ci,對(duì)應(yīng)的填充規(guī)則是pad space屬性,代表字符比較過程中,末尾空格不重要,所以加不加空格結(jié)果都是一樣的;
實(shí)例二的連接比較規(guī)則是utf8mb4_0900_ai_ci,對(duì)應(yīng)的填充規(guī)則是no pad屬性,代表字符比較過程中,末尾空格重要,所以加不加空格結(jié)果不一樣。
3.如何讓字符匹配更嚴(yán)格?
1)修改連接的比較規(guī)則為utf8mb4_0900_ai_ci,當(dāng)然,這個(gè)修改需要搭配默認(rèn)字符集
這個(gè)方案比較容易理解,不贅述。
2)使用like模糊匹配進(jìn)行比較
3)where條件之前,添加binary關(guān)鍵字
上述2、3兩種方法可見下面的測(cè)試:
00:19:13>select * from t00;
+----+------+
| id | name |
+----+------+
| 1 | aaa |
| 2 | bbb |
+----+------+
2 rows in set (0.00 sec)
00:19:18>select * from t00 where name='aaa';
+----+------+
| id | name |
+----+------+
| 1 | aaa |
+----+------+
1 row in set (0.00 sec)
00:19:28>select * from t00 where name='aaa ';
+----+------+
| id | name |
+----+------+
| 1 | aaa |
+----+------+
1 row in set (0.00 sec)
### 下面兩種方案,可以防止'aaa '匹配到'aaa'
00:19:31>select * from t00 where name like 'aaa ';
Empty set (0.00 sec)
00:19:57>select * from t00 where binary name = 'aaa ';
Empty set (0.00 sec)
今天文章就到這里吧。
新聞名稱:MySQL關(guān)閉,kill還是kill-9?
文章路徑:http://www.5511xx.com/article/dhjsepi.html


咨詢
建站咨詢
