Just wondered if some of you chaps more familiar with regex and unicode could enlighten me. Would the following two expressions:


match the lesser than symbol whether it's encoded in utf-8, latin etc?


The former will match the literal octet 0x3C (which in ASCII has the representation "<"), which will work for all ASCII-supersets such as ISO-8859-*, UTF-8, Windows-1252 and others, but won't work in other character sets like UTF-16.

The latter requires the input string to be valid UTF-8, and will always fail (and throw a warning, IIRC) if it is not. If you're only dealing with ASCII characters and you don't want the UTF-8 validation, your better off without the u flag.

