MattF
01-12-2010, 01:16 AM
Just wondered if some of you chaps more familiar with regex and unicode could enlighten me. Would the following two expressions:
'/</'
'/\x3c/u'
match the lesser than symbol whether it's encoded in utf-8, latin etc?
Cheers.
gsnedders
01-12-2010, 03:47 PM
The former will match the literal octet 0x3C (which in ASCII has the representation "<"), which will work for all ASCII-supersets such as ISO-8859-*, UTF-8, Windows-1252 and others, but won't work in other character sets like UTF-16.
The latter requires the input string to be valid UTF-8, and will always fail (and throw a warning, IIRC) if it is not. If you're only dealing with ASCII characters and you don't want the UTF-8 validation, your better off without the u flag.