UTF-8 is a variant of unicode. It uses a variable bit length to represent characters beyond the ascii character sets. I think it goes up to. . . 32 bits max, and as few as 7. Unicode character sets are required to display the Asian characters.
PHP itself will pass through UTF8 though, since it is just bits when it comes down to it. The DBMS is encoded with the charsets as well so it knows how to represent them, and PHP is capable of accepting the "text" from the DB, and pushing a UTF-8 header to the browser so it also interprets it correctly. The problem is the middle man, if you try to manipulate it in PHP, you have to describe to it how many bits it needs in order to consider a character. When I ask for $str[0], by default it will take the first char off of the string, which is 8 bits. If it's a 16bit character, than I only end up with "half" of the character I need, which when presented as text will likely appear as nothing but rubbish.
__________________
As of PHP 5.5, the MySQL library has been officially deprecated. It is recommended to move to either MySQLi or PDO libraries for your mysql connectivity. See here for help choosing which interface you prefer: http://php.net/manual/en/mysqlinfo.api.choosing.php
I was doing some reading, and I feel like there is something wrong with my last post.
On line 10 where I used mb_strtolower don't I have to mark the variable $string as utf-8? Or is that only when converting from lower case to capital because UTF-8 mb_strtolower will only convert upper case characters to lower case which are marked with the Unicode property?
Can I add trim() to line 10 and everything be okay? Or is there a better way? becuase i want to eliminate the error causes by starting a string with a space? I would think it would strip away the first letter as well if I did that. I know I can try it I just want to be correct. Just because something might work doesn't mean it is the appropriate way to do something.
Don't think the trim will be a problem. It may be wiser to stick with PCRE since you can represent the space with a \s character. Actually, pretty much everything can be done with the PCRE on the match/replaces which may be less tedious than using the mb_string functionality in general.
You've already marked the mb_strtolower as using utf-8. That is what the $e represents in this function. No its not required *but* if you don't give it it will default to the internal machine encoding which may or may not be utf8.
__________________
As of PHP 5.5, the MySQL library has been officially deprecated. It is recommended to move to either MySQLi or PDO libraries for your mysql connectivity. See here for help choosing which interface you prefer: http://php.net/manual/en/mysqlinfo.api.choosing.php
Oh I see what you mean. Yes it should be; if you don't specify the encoding to use, than it will default to the internal encoding which may or may not be utf8.
__________________
As of PHP 5.5, the MySQL library has been officially deprecated. It is recommended to move to either MySQLi or PDO libraries for your mysql connectivity. See here for help choosing which interface you prefer: http://php.net/manual/en/mysqlinfo.api.choosing.php