...

View Full Version : UTF-8 chars and preg_replace



MattF
01-12-2010, 02:16 AM
Just wondered if some of you chaps more familiar with regex and unicode could enlighten me. Would the following two expressions:



'/</'
'/\x3c/u'


match the lesser than symbol whether it's encoded in utf-8, latin etc?


Cheers.

gsnedders
01-12-2010, 04:47 PM
The former will match the literal octet 0x3C (which in ASCII has the representation "<"), which will work for all ASCII-supersets such as ISO-8859-*, UTF-8, Windows-1252 and others, but won't work in other character sets like UTF-16.

The latter requires the input string to be valid UTF-8, and will always fail (and throw a warning, IIRC) if it is not. If you're only dealing with ASCII characters and you don't want the UTF-8 validation, your better off without the u flag.

MattF
01-13-2010, 02:50 AM
Cheers. :)



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum