So apparently everyone but myself knew that in MySQL, the utf8
character set is not true UTF-8, but a broken subset that supports only 3-byte characters. Also apparently utf8_general_ci
is hopelessly defective and should never be used.
Hmmmph.
What you really want is utf8mb4
and utf8mb4_unicode_ci
. This gives you true UTF-8 support and standards-compliant sorting.
Spent part of an afternoon updating all my code and converting my databases to support the correct charset. Should be good-to-go now.
Also updated the database-creation code in my LAMP Server Setup Guide.