View Full Version : Special Characters and MySQL database
Jabbamonkey
07-13-2006, 04:09 PM
I have several PHP webforms that submit information to my MySQL database. However, when a user cuts & pastes from a WORD document, and clicks submit, all of the special characters (i.e. em-dashes) get changed into wierd symbols.
Is there some simple line of code that I can use to convert these special symbols into web characters? Or some other simple way to fix this?
chump2877
07-13-2006, 06:54 PM
use htmlentities() (http://us2.php.net/manual/en/function.htmlentities.php)
It looks like the perl code contained in the source of what can be found here looks relevant:
http://tech.huffingtonpost.com/2006/01/naughtywordchars.html
sub convert_word_chars {
my($s) = @_;
return '' unless $s;
if ($plugin->get_config_value( 'smart_replace' ))
{
# html character entity replacements
$s =~ s/\342\200\231/’/g;
$s =~ s/\342\200\230/‘/g;
$s =~ s/\342\200\246/
/g;
$s =~ s/\342\200\223/-/g;
$s =~ s/\342\200\224/—/g;
$s =~ s/\342\200\234/“/g;
$s =~ s/\342\200\235/”/g;
}
else
{
# ascii equivalent replacements
$s =~ s/\342\200[\230\231]/'/g;
$s =~ s/\342\200\246/.../g;
$s =~ s/\342\200\223/-/g;
$s =~ s/\342\200\224/--/g;
$s =~ s/\342\200[\234\235]/"/g;
}
$s;
}
you would need to change it to use preg_replace
alternatively, you could try using tidy (http://tidy.sourceforge.net/docs/tidy_man.html)
check out the tail end of the discussion I found here for some ideas on how to do it:
http://drupal.org/node/46329
chump2877
07-13-2006, 07:33 PM
I get the feeling he's not talking about Word-generated HTML, but I could be wrong...
Jabbamonkey
07-13-2006, 08:08 PM
This problem is pretty confusing ... It is caused when we PASTE text from a WORD document into the form field. The text appears fine in the form field, but gets all messed up when it's entered into the database (and then viewed on the live site) ...
... but here's the big issue ... we copied the text from the WORD document and tried PASTING the text (& submit it into the database) from our office computer. When we view the text in the database, it is all messed up. HOWEVER, when I copied the text from the same document, and pasted the text from my HOME COMPUTER (and submit it) ... IT WORKED FINE!!!! Is there a reason why it works from my home and not the office???
It may relate to the character encoding of your browser which would explain why it worked at home.. the first part of my earlier post I believe answers the question, it just needs to be tested to be sure.
chump2877
07-13-2006, 08:19 PM
the only thing I can think of is that your computer at work and your computer at home have different fonts loaded onto them, different character sets, something like that.....how this is causing your problem? -- it's just a guess because i don;t know for sure, and since PHP is server side, and this is obviously a client side issue, i figure it has to do with how your individual machines are interpreting and displaying the data before the form is actually submitted and the database is updated....
Did you try to to use htmlentities() on the text before you submit it to your database? Or you could try converting all of your Word text to a standard font like Times New Roman, and then cut and paste that text into the form and see if there's still a discrepency between your computer at home and work...worth a shot i guess...
Fumigator
07-13-2006, 08:32 PM
Are the versions of Word different? My guess is one version of Word is imbedding control codes for formatting etc when the "copy" is done; the other version copies only text. I'm using Word97 (yes, Word97 :p ) and it works fine plugging into a database.
Maybe try pasting the text to notepad, then re-copying, and then pasting to browser form.
Jabbamonkey
07-13-2006, 08:46 PM
Fumigator, as a temporary solution, we have been pasting it into notepad and then pasting it into the form. It's that extra step that they are complaining about.
DJ
Jabbamonkey
07-13-2006, 08:50 PM
Chump, htmlentities didnt work (thanks for the idea though). Tried the TIMES idea and that didnt work (I'm home now, and someone is at the office trying it).
Anyone, is there a way to change the character encoding of my browser? Would that work?
I've tried this in FireFox and IE, and both work from home. Just tried it in IE at the office ..... not sure about the different character encoding though.
could you try this then? this is a port of the thing I posted earlier on. not sure if it'll help or not but hopefully (entirely untested)..
function word_clean($str) {
$matches = array(
array( 'pattern' => "\342\200[\230\231]", 'character' => "'"),
array( 'pattern' => "\342\200\246", 'character' => "..."),
array( 'pattern' => "\342\200\223", 'character' => "-"),
array( 'pattern' => "\342\200\224", 'character' => "--"),
array( 'pattern' => "\342\200[\234\235]", 'character' => '"'));
foreach ($matches as $match)
$str = preg_replace("~$matches[pattern]~m", $match['character'], $str);
return $str;
}
In firefox you can find out your character encoding by clicking 'View' > 'Character Encoding'. In Internet Explorer you can do it by 'View' > 'Encoding'
Jabbamonkey
07-13-2006, 09:30 PM
Tried the following before updating/inserting into the database ...
function word_clean($str) {
$matches = array(
array( 'pattern' => "\342\200[\230\231]", 'character' => "'"),
array( 'pattern' => "\342\200\246", 'character' => "..."),
array( 'pattern' => "\342\200\223", 'character' => "-"),
array( 'pattern' => "\342\200\224", 'character' => "--"),
array( 'pattern' => "\342\200[\234\235]", 'character' => '"'));
foreach ($matches as $match)
$str = preg_replace("~$matches[pattern]~m", $match['character'], $str);
return $str;
}
$news_title = word_clean($news_title);
$news_subtitle = word_clean($news_subtitle);
$news_brief = word_clean($news_brief);
$news_text = word_clean($news_text);
$news_url = word_clean($news_url);
... and this messed everything up. It wouldnt even insert any new rows ... If I echo the $news_title variable after this fuction ... I got....
Title: "-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"'"-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"t"-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"'"-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"e"-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"'"-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"s"-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"'"-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"a"-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"'"-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"t"-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"'"-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"s"-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"'"-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"."-"-"-"-"-"
mmm.. I had a typo..
change it to
~$match[pattern]~m
chump2877
07-14-2006, 10:45 AM
Anyone, is there a way to change the character encoding of my browser? Would that work?
don;t know if it would work, but worth a try...As far as I can see, to change the character encoding:
In IE: View => Encoding => then pick one
In FF: View => Character Encoding => then pick one
Another thing that might help us is to see an example of what it is you are trying to cut and paste into the form, and then show us what that same text looks like inside your database (with all the "crazy" symbols you described)
Jabbamonkey
07-17-2006, 02:04 PM
Was away all weekend, so I didnt have time to try any of this. I'll run through everything mentioned above and let you know if any of these work.
The encoding seems like the easiest solution, so I will try that first. Then I will try the "matches" code above again. And then, if that doesnt work, I'll post the characters and their matching "crazy" symbols and see if anyone has anything else to try...
Jabbamonkey
Jabbamonkey
07-17-2006, 03:52 PM
Tried changing the encoding, and that didnt work. Tried all the WESTERNS encoding, and tried the UNICODE. Only thing that changed was the different types of characters that changed.
Here are some of the conversions.......
In Western ISO:
Single Curly Quote (') = ¡¦
Start quote (") = ¡§
End quote (") = ¡¨
Bullet = "X
In Unicode:
Single Curly Quote (') = â??
Start quote (") = â??
End quote (") = â??
Bullet = ï?§
Jabbamonkey
07-21-2006, 02:16 PM
I tried the following....
function word_clean($str) {
$matches = array(
array( 'pattern' => "\342\200[\230\231]", 'character' => "'"),
array( 'pattern' => "\342\200\246", 'character' => "..."),
array( 'pattern' => "\342\200\223", 'character' => "-"),
array( 'pattern' => "\342\200\224", 'character' => "--"),
array( 'pattern' => "\342\200[\234\235]", 'character' => '"'));
foreach ($matches as $match)
$str = preg_replace("~$match[pattern]~m", $match['character'], $str);
return $str;
}
$news_title = word_clean($news_title);
$news_subtitle = word_clean($news_subtitle);
$news_brief = word_clean($news_brief);
$news_text = word_clean($news_text);
$news_url = word_clean($news_url);
And although I didnt get any errors this time around, the form failed to switch out the special characters.
Jabbamonkey
Jabbamonkey
07-21-2006, 02:40 PM
-----
maybe you can just use str_replace on the bad characters instead ?
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.