Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 8 of 8
  1. #1
    New Coder
    Join Date
    Dec 2005
    Posts
    80
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Special Characters Best Practice

    I'm trying to sanitize my code and be coherent throughout.
    What is the best practice when dealing with special characters to submit form data to a MySQL database?

    For example I have a form:

    <input name="city" value="Montréal">

    When I submit this to the DB, I use htmlentities:

    $h_city = htmlentities($city, ENT_QUOTES);
    INSERT INTO tablename city = '$h_city'

    To display this field on a web page I use:

    $h_city = html_entity_decode($city, ENT_QUOTES);
    echo $h_city;

    and I get Montréal which is perfect.

    1) Is this the best way to deal wit special characters?

    2) When I look at the data in phpMyAdmin, the field becomes Montr&Atilde;&copy;al. Is this normal?

  • #2
    New Coder
    Join Date
    Dec 2005
    Posts
    80
    Thanks
    0
    Thanked 0 Times in 0 Posts
    PS: Shouldn't it be encoded as Montr&eacute;al?

  • #3
    Senior Coder
    Join Date
    Jan 2007
    Posts
    1,648
    Thanks
    1
    Thanked 58 Times in 54 Posts
    When I look at the data in phpMyAdmin, the field becomes Montr&Atilde;&copy;al. Is this normal?
    The sequence '&Atilde;&copy;' is what usually shows up after text is mangled by incorrect character encoding (for instance, modifying a UTF-8 string with non UTF-8 methods). This is due to the fact that some encoding uses 2 bytes for a character, but when a function is not handling it properly, it turns the 2nd byte into the actual character. The fact that they were turned into their proper HTML entities means that it happens somewhere before you apply the htmlentities.

    Do some debugging (for instance, put out a few echoes) while you are processing the strings. And consider using the mb_* (multi-byte) string functions.

    EDIT: It's very possible that one of the functions you're using to encode the HTML entities does not do well with multibyte encoding.

    What is the best practice when dealing with special characters to submit form data to a MySQL database?
    What is your purpose? Displaying proper output or preventing injection attacks?

    Displaying proper output, you would probably not use htmlentities until you actually display the text. Otherwise, if someone wants to edit their post/profile they will get &eacute; in their field, which doesn't look nice.

    Preventing injection attacks is relatively easy and very secure when using prepared statements and mysql_escape_string().
    Last edited by aedrin; 08-24-2007 at 04:15 PM.

  • #4
    New Coder
    Join Date
    Dec 2005
    Posts
    80
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks for your reply aedrin.

    I was able to locate where the problem was happening. I added UTF-8 to the htmlentities function and the translation worked properly:

    $h_city = htmlentities($uc_city, ENT_QUOTES, "UTF-8");

    Montréal becomes Montr&eacute;al

    Now when I display this entry, the special characters are shown as question marks. I tried to change $h_city = html_entity_decode($city, ENT_QUOTES); to this $h_city = html_entity_decode($city, ENT_QUOTES, "UTF-8"); but it gives me an error:

    Warning: cannot yet handle MBCS in html_entity_decode()! in /home/httpd/vhosts/xxxxx.com/httpdocs/0new/result_parties.php on line 466

    To answer your last question, the purpose is to display proper output.

  • #5
    Senior Coder rafiki's Avatar
    Join Date
    Aug 2006
    Location
    Floating around somewhere...
    Posts
    2,046
    Thanks
    19
    Thanked 42 Times in 42 Posts
    heres a good fix
    PHP Code:
    <?php
    function html_entity_decode_utf8($string)
    {
        static 
    $trans_tbl;
       
        
    // replace numeric entities
        
    $string preg_replace('~&#x([0-9a-f]+);~ei''code2utf(hexdec("\\1"))'$string);
        
    $string preg_replace('~&#([0-9]+);~e''code2utf(\\1)'$string);

        
    // replace literal entities
        
    if (!isset($trans_tbl))
        {
            
    $trans_tbl = array();
           
            foreach (
    get_html_translation_table(HTML_ENTITIES) as $val=>$key)
                
    $trans_tbl[$key] = utf8_encode($val);
        }
       
        return 
    strtr($string$trans_tbl);
    }

  • #6
    New Coder
    Join Date
    Dec 2005
    Posts
    80
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks for this, it's a bit beyond my capacities as far as complexity, so I went ahead and tested it but I am getting a parse error for the first curly bracket.

    So is this something I need to do for each field I want to output?

  • #7
    Senior Coder rafiki's Avatar
    Join Date
    Aug 2006
    Location
    Floating around somewhere...
    Posts
    2,046
    Thanks
    19
    Thanked 42 Times in 42 Posts
    you need this too sorry
    PHP Code:
    function code2utf($num)
    {
        if (
    $num 128) return chr($num);
        if (
    $num 2048) return chr(($num >> 6) + 192) . chr(($num 63) + 128);
        if (
    $num 65536) return chr(($num >> 12) + 224) . chr((($num >> 6) & 63) + 128) . chr(($num 63) + 128);
        if (
    $num 2097152) return chr(($num >> 18) + 240) . chr((($num >> 12) & 63) + 128) . chr((($num >> 6) & 63) + 128) . chr(($num 63) + 128);
        return 
    '';


  • #8
    Senior Coder
    Join Date
    Jan 2007
    Posts
    1,648
    Thanks
    1
    Thanked 58 Times in 54 Posts
    Rafiki, rather than post random functions, explain what they do (or what they do different) and how to use them.

    And if you did not write them yourself, give credit where credit is due.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •