Diagnosing Special Character Issues in Mysql

Posted by jason on June 15, 2011, 6:57 p.m.
Tags: mysql sysadmin utf8

Any webmaster who deals with Mysql and some website that uses PHP has probably run into character encoding issues at some point. Whether it's a user copying and pasting from Microsoft Word and getting those "curly quotes", converting a database from latin1 to utf8, importing data from another source, or just using the application normally--all of these things can lead to garbage characters showing up.

I'll show you a basic way to help diagnose the problem. The goal here is to find out what character set those garbage characters are in and to make them display correctly.

Log into your mysql database using the command line:

mysql -u  -p -h 

Let's see what character encoding the connection you just made is using:

mysql> show variables LIKE 'character_set_connection';
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| character_set_connection | latin1 |
+--------------------------+--------+
1 row in set (0.00 sec)

mysql>

So we're using latin1, eh? Good to know. Now SELECT a row from your database that you know shows up as garbage:

mysql> select name from user WHERE id='1';
+----------------------+
| name                 |
+----------------------+
| Jos? Ram?n          |
+----------------------+
1 row in set (0.02 sec)

mysql>

OK, so we now that the data stored is not latin1--that's good. Let's change the character_set_connection (and collation) to utf8:

mysql> SET NAMES utf8;
Query OK, 0 rows affected (0.00 sec)

Press the 'up' key a few times and run that problematic query again:

mysql> select name from user WHERE id='1';
+----------------------+
| name                 |
+----------------------+
| José Ramón           |
+----------------------+
1 row in set (0.02 sec)

mysql>

Nice! The garbage characters show up properly now, which means that the data I just selected is utf-8 encoded--regardless of what the table or database encoding claims to be. So by doing these tests, it is a safe bet that the garbage characters are showing up because the connection's character set is wrong when fetching the data. The easy solution in this case it to run 'SET NAMES utf8;' when your database handle is initialized.


0 comments