MySQL Ignores Right Padded Spaces?


When working on an established Rails project, we noticed that trailing spaces weren’t getting trimmed off. Discussion ensued about whether to do data cleanup and then someone noticed that the where method in Rails didn’t care about the spaces. Not having seen this in my PostgreSQL days, I was assuming that there was some magic happening in either the Rails codebase or the MySQL adapter. So I went to the underlying DB: No, searching for column='hi' found values where the column value was actually 'hi '.

In searching on StackOverflow I found this answer states that MySQL ignores trailing spaces in string comparison, echoed in the MySQL 5.7 documentation:

All MySQL collations are of type PAD SPACE. This means that all CHARVARCHAR, and TEXT values are compared without regard to any trailing spaces. “Comparison” in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant. 

Of course, I’m using MySQL 8.0 and so is the app. So I dug into the collation, setting up the following table:

mysql> select concat('"', thing, '"'), unthing from surelynot;
+-------------------------+---------+
| concat('"', thing, '"') | unthing |
+-------------------------+---------+
| "hi "                   | what    |
| " hi "                  | what2   |
| " hi"                   | what3   |
| "hi"                    | what4   |
+-------------------------+---------+
4 rows in set (0.02 sec)

Setting the collation to MySQL 8.x’s default, I get one row for 'hi':

mysql> alter table surelynot convert to character set utf8mb4 collate utf8mb4_0900_ai_ci;
Query OK, 4 rows affected (0.16 sec)
Records: 4  Duplicates: 0  Warnings: 0

mysql> select concat('"', thing, '"'), unthing from surelynot where thing = 'hi';
+-------------------------+---------+
| concat('"', thing, '"') | unthing |
+-------------------------+---------+
| "hi"                    | what4   |
+-------------------------+---------+
1 row in set (0.05 sec)

However, using the latin1 or latin2 collation that was on the other database, I get two results for 'hi', one for the exact match, and one for the right-padded. Note that left-padding has no impact on the results:

mysql> select concat('"', thing, '"'), unthing from surelynot where thing = 'hi';
+-------------------------+---------+
| concat('"', thing, '"') | unthing |
+-------------------------+---------+
| "hi "                   | what    |
| "hi"                    | what4   |
+-------------------------+---------+
2 rows in set (0.00 sec)

The ucs2_general_ci also has this property as does the utf8_general_ci collation. Ultimately, only the default utf8mb4_0900_ai_ci paid attention to right padded spaces when looking for string equality in the where clause.


Leave a Reply

%d bloggers like this: