Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 8 of 8
  1. #1
    New to the CF scene
    Join Date
    Mar 2007
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Unicode in forms & PHP

    I'm not sure whether my problem is with HTML forms or PHP, but I hope someone here can help.

    I need to create a web form that accepts words with diacritical marks such as ê or ü. I have a text field that accepts these characters, and I can even successfully write them into and read them from a database. However, whenever I re-populate the form from $_POST data after submitting it (so that the text input is persistent in the form), I don't retain the diacritical marks.

    Here is a skeleton PHP program that illustrates the problem. It is a standalone script that creates a form, allows the user to input a word or phrase, then when submitted, simply recreates the form -- I use the template whenever I need to create a form-based application.

    Try this out and input a word like "têst". I always get "têst" back out...

    PHP Code:
    <?php

    // Print the HTML header

    print '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> ';
    print 
    '<html> ';
    print 
    '<head> ';
    print 
    '<meta http-equiv="content-type" content="text/html;charset=utf-8"> ';
    print 
    '<title>Form Test</title> ';
    print 
    '</head> ';
    print 
    '<body> ';

    // Check the dummy/hidden form field to see if we are entering this page for the first time
    // or, as a result of clicking the submit button on the form. This allows us to use a single file
    // to both present the initial form, and process the form.

    if ($_POST['_submit_check']) {
    // If validate_form returns errors, pass them to show_form()
    if ($form_errors validate_form()) {
    show_form($form_errors);
    } else {
    process_form(); // Process the form with data coming from form
    }
    } else {
    show_form();
    }

    // Construct the form

    function show_form($errors ''){

    $defaults $_POST;

    if (
    $errors) {
    print 
    'Please correct these errors: <ul><li>';
    print 
    implode('</li><li>'$errors);
    print 
    '</li></ul>';
    }

    print 
    '    <form accept-charset="utf-8" id="FormTest" action="' $_SERVER[PHP_SELF] . '" method="post" name="FormTest">';

    print 
    '    <label>Enter text:</label>';
    form_text("test"$defaults);

    form_submit("submitButton""Go");

    print 
    '    <input type="hidden" name="_submit_check" value="1"> ';

    print 
    '    </form> ';

    }

    // Process a submitted form

    function process_form() {

    show_form();

    }

    // Check for errors in the form and do some security checking

    function validate_form() {

    $errors = array();

    // Trim leading or trailing white space

    $_POST['test'] = trim($_POST['test']);

    // Remove any (probably malicious) HTML markup

    $_POST['test'] = strip_tags($_POST['test']);

    // Return the possibly empty array of errors

    return $errors;

    }

    // Form Helpers

    //print a text box
    function form_text($element_name$values) {
    print 
    '<input type="text" name="' $element_name .'" value="';
    print 
    htmlentities($values[$element_name]) . '">';
    }

    //print a submit button
    function form_submit($element_name$label) {
    print 
    '<input type="submit" name="' $element_name .'" value="';
    print 
    htmlentities($label) .'"/>';
    }

    //print a textarea
    function form_textarea($element_name$values) {
    print 
    '<textarea name="' $element_name .'">';
    print 
    htmlentities($values[$element_name]) . '</textarea>';
    }

    // Print the footers

    print ' </body> ';
    print 
    ' </html> ';

    ?>

  • #2
    UE Antagonizer Fumigator's Avatar
    Join Date
    Dec 2005
    Location
    Utah, USA, Northwestern hemisphere, Earth, Solar System, Milky Way Galaxy, Alpha Quadrant
    Posts
    7,691
    Thanks
    42
    Thanked 637 Times in 625 Posts
    I didn't try it myself on your particular problem, but I believe the function htmlentities() will help you.

    html_entity_decode() is the reverse.

  • #3
    Senior Coder
    Join Date
    Jan 2007
    Posts
    1,648
    Thanks
    1
    Thanked 58 Times in 54 Posts
    The problem has nothing to do with that.

    The problem is that PHP is in single byte character mode. And it received a multi byte character mode string.

    Hence it turns the multi byte ê into 2 single bytes à and ª.

    I forgot what configures PHP to properly deal with these, as I've had the problem before as well.

  • #4
    UE Antagonizer Fumigator's Avatar
    Join Date
    Dec 2005
    Location
    Utah, USA, Northwestern hemisphere, Earth, Solar System, Milky Way Galaxy, Alpha Quadrant
    Posts
    7,691
    Thanks
    42
    Thanked 637 Times in 625 Posts
    Encoding and decoding the string would solve the problem.

  • #5
    Senior Coder
    Join Date
    Jan 2007
    Posts
    1,648
    Thanks
    1
    Thanked 58 Times in 54 Posts
    You can fill a bucket of water by going to the ocean, and coming back.

    Or you can walk over to the tap :P

    I think it's better to solve the root cause, instead of working around the problem.

  • #6
    Senior Coder koyama's Avatar
    Join Date
    Dec 2006
    Location
    Copenhagen, Denmark
    Posts
    1,246
    Thanks
    1
    Thanked 5 Times in 5 Posts
    I think that the problem is that you are using htmlentities. It converts more than just ampersands, double quotes and angles into entities. By default it reads your string as ISO-8859-1 one byte at a time (unless you supply an extra argument specifying encoding). Check your generated HTML source and you'll see what I mean.

    Instead use htmlspecialchars. It only converts ampersands, double quotes and angles so that your HTML won't break.

  • #7
    New to the CF scene
    Join Date
    Mar 2007
    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts
    htmlentities() was the problem. I used it in my formhelpers. If I remove it, all works fine, even the stuff in the db that was encoded is coming out okay. It was left over from when I wrote the formhelpers before Unicode...

  • #8
    UE Antagonizer Fumigator's Avatar
    Join Date
    Dec 2005
    Location
    Utah, USA, Northwestern hemisphere, Earth, Solar System, Milky Way Galaxy, Alpha Quadrant
    Posts
    7,691
    Thanks
    42
    Thanked 637 Times in 625 Posts
    Whoops, htmlspecialchars(), yeah that's what I meant


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •