Go Back   CodingForums.com > :: Server side development > ASP

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 08-20-2007, 01:10 PM   PM User | #1
Spudhead
Senior Coder

 
Spudhead's Avatar
 
Join Date: Jun 2002
Location: London, UK
Posts: 1,856
Thanks: 8
Thanked 110 Times in 109 Posts
Spudhead is on a distinguished road
Character encoding, cleaning CMS input etc.

I just don't understand this character encoding thing. Never have. ANSI, ASCII, UTF, Unicode... it may as well be Greek...

So: my simple CMS lets admin users type content into a textarea. Before it goes into the database, I take care of any dodgy chars (ie: single quotes) with escape(). When it comes out, I replace "%0D%0A" with a couple of <br/> tags, then unescape() it and dump it all on the page.

Generally, that's fine. However - one client uses a Mac to update her site. It's not causing me any problems as such, although the double quotes look a bit...odd. But she's saying that some chars are getting replaced with those "I don't know what char this is supposed to be" question mark symbols.

To clarify (hopefully - I hope the forum software doesn't do exactly what I'm trying to and fixes the dodgy char):

- Client pastes a “ into textarea.
- I escape() it. Apparently <%=escape("“")%> returns %E2%u20AC%u0153.
- <%=asc("“")%> returns 226
- I try to fix it with output= replace(output,"“", "&ldquo;") - which does, it seems, nothing.

So... can anyone explain to me, preferably in words of two syllables or less, what the nuts is going on and how to fix it? It is character encodings? Is it locale ID's? It is ANSI or Unicode? What is it?

How the chuff do I find these things and replace then with something... standard?
Spudhead is offline   Reply With Quote
Old 08-20-2007, 01:34 PM   PM User | #2
Daemonspyre
Regular Coder

 
Join Date: Mar 2007
Posts: 505
Thanks: 1
Thanked 19 Times in 19 Posts
Daemonspyre is on a distinguished road
Welcome to MS Word as an HTML editor...

MS Word and Mac Word use certain special characters (UNICODE) to produce the effect that you are experiencing.

How do you fix it? Use UTF-8. All 32bit Windows servers use UTF-8 as their character encoding.

Set that encoding schema on your form page.

To see the characters that they are using, go to START > RUN > charmap (or, Start > All Programs > Accessories > Character Map)

Font: Times New Roman

The first character to look at is double quotes, first line, second character in.

Now use the GO TO UNICODE Box: Type in 02DD, 201C, 201D, and 2033.

This will show you all the different types of double quotes (although not all are named 'double quotes').
__________________
Quote:
To say my fate is not tied to your fate is like saying, 'Your end of the boat is sinking.' -- Hugh Downs
Please, if you found my post helpful, pay it forward. Go and help someone else today.
Daemonspyre is offline   Reply With Quote
Old 08-20-2007, 03:59 PM   PM User | #3
Spudhead
Senior Coder

 
Spudhead's Avatar
 
Join Date: Jun 2002
Location: London, UK
Posts: 1,856
Thanks: 8
Thanked 110 Times in 109 Posts
Spudhead is on a distinguished road
It's getting a little clearer, thanks

So... what you're saying is that I need to take the user input and UTF-8 encode it?

The web seems awash with UTF-8 encoding functions: here's some I found at CodeToad:

Code:
<%
function DecodeUTF8(s)
  dim i
  dim c
  dim n
  i = 1
  do while i <= len(s)
    c = asc(mid(s,i,1))
    if c and &H80 then
      n = 1
      do while i + n < len(s)
        if (asc(mid(s,i+n,1)) and &HC0) <> &H80 then
          exit do
        end if
        n = n + 1
      loop
      if n = 2 and ((c and &HE0) = &HC0) then
        c = asc(mid(s,i+1,1)) + &H40 * (c and &H01)
      else
        c = 191 
      end if
      s = left(s,i-1) + chr(c) + mid(s,i+n)
    end if
    i = i + 1
  loop
  DecodeUTF8 = s 
end function


function EncodeUTF8(s)
  dim i
  dim c
  i = 1
  do while i <= len(s)
    c = asc(mid(s,i,1))
    if c >= &H80 then
      s = left(s,i-1) + chr(&HC2 + ((c and &H40) / &H40)) + chr(c and &HBF) + mid(s,i+1)
      i = i + 1
    end if
    i = i + 1
  loop
  EncodeUTF8 = s 
end function
%>
That look about right to you? If so... integrating this into my current code would be something like:

- take user input
- UFT-8 encode
- escape()
- drop into database

... and exactly the same in reverse for displaying on a page?

God knows why I've never come up against this one before...


ps. Just to clarify, all pages (admin forms and front-end display) have the following:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Last edited by Spudhead; 08-20-2007 at 04:05 PM.. Reason: clarifimification
Spudhead is offline   Reply With Quote
Old 08-20-2007, 04:07 PM   PM User | #4
Daemonspyre
Regular Coder

 
Join Date: Mar 2007
Posts: 505
Thanks: 1
Thanked 19 Times in 19 Posts
Daemonspyre is on a distinguished road
You have probably not come up on this before because TEXTAREAs are not the same as WSIWYG editors.

If the client/user is using XML schemas at all, like in Office 2000 and above, WSIWYG Editors use said XML schemas and they can screw up your input. COPY AND PASTE is a blessing and a curse.

XML schemas, unless specified otherwise, are UNICODE. Textareas use the server's encoding (i.e., UTF-8 or whatever you tell IIS to use).

Happened to me the first time I created one, and I haven't looked back since.

Your code looks right, but you may be able to use the IIS server variable of Server.HTMLEncode to do the work for you.

You might want to try that, but I cannot guarantee that will work.
__________________
Quote:
To say my fate is not tied to your fate is like saying, 'Your end of the boat is sinking.' -- Hugh Downs
Please, if you found my post helpful, pay it forward. Go and help someone else today.
Daemonspyre is offline   Reply With Quote
Old 08-20-2007, 04:13 PM   PM User | #5
Daemonspyre
Regular Coder

 
Join Date: Mar 2007
Posts: 505
Thanks: 1
Thanked 19 Times in 19 Posts
Daemonspyre is on a distinguished road
More data for you:

http://msdn2.microsoft.com/en-us/library/ms525789.aspx
__________________
Quote:
To say my fate is not tied to your fate is like saying, 'Your end of the boat is sinking.' -- Hugh Downs
Please, if you found my post helpful, pay it forward. Go and help someone else today.
Daemonspyre is offline   Reply With Quote
Old 08-20-2007, 04:41 PM   PM User | #6
Spudhead
Senior Coder

 
Spudhead's Avatar
 
Join Date: Jun 2002
Location: London, UK
Posts: 1,856
Thanks: 8
Thanked 110 Times in 109 Posts
Spudhead is on a distinguished road
Ok, thanks for the info. Will look into altering the Codepage. Have taken interim measure of emailing client with "stop pasting stuff out of Word, it's screwing everything up".

Spudhead is offline   Reply With Quote
Reply

Bookmarks

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 06:57 AM.


Advertisement
Log in to turn off these ads.