...

View Full Version : Special Characters Best Practice



ro1960
08-24-2007, 01:19 PM
I'm trying to sanitize my code and be coherent throughout.
What is the best practice when dealing with special characters to submit form data to a MySQL database?

For example I have a form:

<input name="city" value="Montréal">

When I submit this to the DB, I use htmlentities:

$h_city = htmlentities($city, ENT_QUOTES);
INSERT INTO tablename city = '$h_city'

To display this field on a web page I use:

$h_city = html_entity_decode($city, ENT_QUOTES);
echo $h_city;

and I get Montréal which is perfect.

1) Is this the best way to deal wit special characters?

2) When I look at the data in phpMyAdmin, the field becomes Montr&Atilde;&copy;al. Is this normal?

ro1960
08-24-2007, 01:27 PM
PS: Shouldn't it be encoded as Montr&eacute;al?

aedrin
08-24-2007, 04:11 PM
When I look at the data in phpMyAdmin, the field becomes Montr&Atilde;&copy;al. Is this normal?

The sequence '&Atilde;&copy;' is what usually shows up after text is mangled by incorrect character encoding (for instance, modifying a UTF-8 string with non UTF-8 methods). This is due to the fact that some encoding uses 2 bytes for a character, but when a function is not handling it properly, it turns the 2nd byte into the actual character. The fact that they were turned into their proper HTML entities means that it happens somewhere before you apply the htmlentities.

Do some debugging (for instance, put out a few echoes) while you are processing the strings. And consider using the mb_* (multi-byte) string functions.

EDIT: It's very possible that one of the functions you're using to encode the HTML entities does not do well with multibyte encoding.


What is the best practice when dealing with special characters to submit form data to a MySQL database?

What is your purpose? Displaying proper output or preventing injection attacks?

Displaying proper output, you would probably not use htmlentities until you actually display the text. Otherwise, if someone wants to edit their post/profile they will get &eacute; in their field, which doesn't look nice.

Preventing injection attacks is relatively easy and very secure when using prepared statements and mysql_escape_string().

ro1960
08-25-2007, 09:49 AM
Thanks for your reply aedrin.

I was able to locate where the problem was happening. I added UTF-8 to the htmlentities function and the translation worked properly:

$h_city = htmlentities($uc_city, ENT_QUOTES, "UTF-8");

Montréal becomes Montr&eacute;al

Now when I display this entry, the special characters are shown as question marks. I tried to change $h_city = html_entity_decode($city, ENT_QUOTES); to this $h_city = html_entity_decode($city, ENT_QUOTES, "UTF-8"); but it gives me an error:

Warning: cannot yet handle MBCS in html_entity_decode()! in /home/httpd/vhosts/xxxxx.com/httpdocs/0new/result_parties.php on line 466

To answer your last question, the purpose is to display proper output.

rafiki
08-25-2007, 12:26 PM
heres a good fix


<?php
function html_entity_decode_utf8($string)
{
static $trans_tbl;

// replace numeric entities
$string = preg_replace('~&#x([0-9a-f]+);~ei', 'code2utf(hexdec("\\1"))', $string);
$string = preg_replace('~&#([0-9]+);~e', 'code2utf(\\1)', $string);

// replace literal entities
if (!isset($trans_tbl))
{
$trans_tbl = array();

foreach (get_html_translation_table(HTML_ENTITIES) as $val=>$key)
$trans_tbl[$key] = utf8_encode($val);
}

return strtr($string, $trans_tbl);
}

ro1960
08-25-2007, 01:01 PM
Thanks for this, it's a bit beyond my capacities as far as complexity, so I went ahead and tested it but I am getting a parse error for the first curly bracket.

So is this something I need to do for each field I want to output?

rafiki
08-25-2007, 02:46 PM
you need this too sorry


function code2utf($num)
{
if ($num < 128) return chr($num);
if ($num < 2048) return chr(($num >> 6) + 192) . chr(($num & 63) + 128);
if ($num < 65536) return chr(($num >> 12) + 224) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
if ($num < 2097152) return chr(($num >> 18) + 240) . chr((($num >> 12) & 63) + 128) . chr((($num >> 6) & 63) + 128) . chr(($num & 63) + 128);
return '';
}

aedrin
08-27-2007, 04:48 PM
Rafiki, rather than post random functions, explain what they do (or what they do different) and how to use them.

And if you did not write them yourself, give credit where credit is due.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum