Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 3 of 3
  1. #1
    Senior Coder
    Join Date
    Jul 2009
    Location
    South Yorkshire, England
    Posts
    2,318
    Thanks
    6
    Thanked 304 Times in 303 Posts

    UTF-8 chars and preg_replace

    Just wondered if some of you chaps more familiar with regex and unicode could enlighten me. Would the following two expressions:

    Code:
    '/</'
    '/\x3c/u'
    match the lesser than symbol whether it's encoded in utf-8, latin etc?


    Cheers.

  • #2
    Senior Coder gsnedders's Avatar
    Join Date
    Jan 2004
    Posts
    2,340
    Thanks
    1
    Thanked 7 Times in 7 Posts
    The former will match the literal octet 0x3C (which in ASCII has the representation "<"), which will work for all ASCII-supersets such as ISO-8859-*, UTF-8, Windows-1252 and others, but won't work in other character sets like UTF-16.

    The latter requires the input string to be valid UTF-8, and will always fail (and throw a warning, IIRC) if it is not. If you're only dealing with ASCII characters and you don't want the UTF-8 validation, your better off without the u flag.

  • Users who have thanked gsnedders for this post:

    MattF (01-13-2010)

  • #3
    Senior Coder
    Join Date
    Jul 2009
    Location
    South Yorkshire, England
    Posts
    2,318
    Thanks
    6
    Thanked 304 Times in 303 Posts
    Cheers.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •