Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 10 of 10
  1. #1
    Regular Coder
    Join Date
    Apr 2004
    Posts
    684
    Thanks
    24
    Thanked 1 Time in 1 Post

    SimpleXML & html entities = strange characters

    I am getting a feed as such..

    PHP Code:
    $posts = new SimpleXMLElement(WP_ROOT_URL 'feed/'0true); 
    In this feed one of the items I am getting contains a html entity, which is the entity for the "hyphen character", which is
    PHP Code:
    – 
    However when this is returned from SimpleXML all I get is a "–". I have read other similar questions & some mention to make sure your page is set to "UTF-8"; though not sure how this will stop SimpleXML from returning the strange character?

    Any which way I do have this on the page the data is output on:

    PHP Code:
    <meta http-equiv="content-type" content="text/html; charset=utf-8" /> 
    What can I do here to get the correct entity?

    Thanks!

  • #2
    New Coder
    Join Date
    Dec 2011
    Posts
    84
    Thanks
    5
    Thanked 13 Times in 13 Posts
    PHP Code:
    <?php
    $orig 
    "–";

    $a htmlentities($orig);

    $b html_entity_decode($a);

    echo 
    $a
    echo 
    '<br>';
    echo 
    $b;
    ?>

  • #3
    Regular Coder
    Join Date
    Apr 2004
    Posts
    684
    Thanks
    24
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Strider64 View Post
    PHP Code:
    <?php
    $orig 
    "–";

    $a htmlentities($orig);

    $b html_entity_decode($a);

    echo 
    $a
    echo 
    '<br>';
    echo 
    $b;
    ?>
    I'm not quite sure you understand my problem...... the FEED I am reading contains the html entity form of the hyphen character, that being:

    Code:
    Now, when I get this data returned from SimpleXML I get the following:

    Code:
    –
    Now do I stop this happening?

  • #4
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,987
    Thanks
    4
    Thanked 2,660 Times in 2,629 Posts
    header('Content-type: text/html; charset=utf-8');. Try the header approach, it simply looks like it's interpreting the unicode in a latin charset.
    PHP Code:
    header('HTTP/1.1 420 Enhance Your Calm'); 

  • Users who have thanked Fou-Lu for this post:

    cyphix (04-22-2013)

  • #5
    Regular Coder
    Join Date
    Apr 2004
    Posts
    684
    Thanks
    24
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Fou-Lu View Post
    header('Content-type: text/html; charset=utf-8');. Try the header approach, it simply looks like it's interpreting the unicode in a latin charset.
    That did it, thanks very much!!

    Any idea why the meta charset I had already put in there didn't work?

  • #6
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,987
    Thanks
    4
    Thanked 2,660 Times in 2,629 Posts
    I haven't a clue how the browsers deal with all that. IMO neither should make a difference as its the client interpreting it, but I've always found that meta is unreliable compared to charset headers.
    Glad I don't do client development work, their job is so much harder O.o
    PHP Code:
    header('HTTP/1.1 420 Enhance Your Calm'); 

  • #7
    Regular Coder
    Join Date
    Apr 2004
    Posts
    684
    Thanks
    24
    Thanked 1 Time in 1 Post
    No worries

    However, on this same note, I can get it to output to the browser correctly now, but it is still returned to PHP as that latin character, which makes trying to do anything with it in PHP a pain.

    For example, I am running the content through a function that trims the string after a certain amount of chars, however I don't want it to end up with a hyphen on the end such as "some text - "; but I can't check for that because PHP still has the data as the latin character.

  • #8
    Regular Coder
    Join Date
    Apr 2004
    Posts
    684
    Thanks
    24
    Thanked 1 Time in 1 Post
    No worries

    However, on this same note, I can get it to output to the browser correctly now, but it is still returned to PHP as that latin character, which makes trying to do anything with it in PHP a pain.

    For example, I am running the content through a function that trims the string after a certain amount of chars, however I don't want it to end up with a hyphen on the end such as "some text - "; but I can't check for that because PHP still has the data as the latin character.

  • #9
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,987
    Thanks
    4
    Thanked 2,660 Times in 2,629 Posts
    You can't do much about that at a string comparator level. PHP isn't natively UTF8, so all you can do is look at using the mb_string library and the iconv for any work to do with multibyte strings. If the entity is actually in the #8211, than I'd suggest replacing the entity with the actual hyphen character instead.
    PHP Code:
    header('HTTP/1.1 420 Enhance Your Calm'); 

  • #10
    Regular Coder
    Join Date
    Apr 2004
    Posts
    684
    Thanks
    24
    Thanked 1 Time in 1 Post
    Quote Originally Posted by Fou-Lu View Post
    You can't do much about that at a string comparator level. PHP isn't natively UTF8, so all you can do is look at using the mb_string library and the iconv for any work to do with multibyte strings. If the entity is actually in the #8211, than I'd suggest replacing the entity with the actual hyphen character instead.
    Ok thanks. I don't have much control over the feed as it's done by WP, but I have thought about formatting the data before it's passed to SimpleXML, guess that's the way I will have to do it.

    Cheers!


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •