View Full Version : non valid characters displayed
09-10-2011, 03:13 AM
I am having a problem with my web page. In several places a space character is being displayed as unprintable. The symbol shown is the little square. I submitted to W3C Markup validation service. It will not check my code saying it does not recognize the apostrophe character I have inserted as the phrase: So I changed all instances to $#39; the other apostrophe, and got the same results.
I used this page to verify those characters: http://www.tedmontgomery.com/tutorial/htmlchrc.html
I tried the validator here: http://htmlhelp.com/tools/validator/
and in several places it does not like the > symbol as in </p>
My page is: http://www.bkelly.ws/software/think_positive.htm
I am using Windows 7 and the standard IE explorer.
Can someone clue me in here?
09-10-2011, 01:16 PM
The validator at http://validator.w3.org says clearly where the issue is:
<META name="robots" content="index, follow">
<META name="copyright" CONTENT="Copyright � 1998�2002 by Ted M. Montgomery">
Are you having static HTML files? If so then you may wanna check in your text/code editor in what character encoding the files are saved. You can probably set the encoding in the editor preferences/settings. If you save the files as ISO-8859-1 it doesn’t matter if you write UTF-8 into the meta element in the head because the document is interpreted the way you saved it, i. e. as ISO-8859-1.
So, make sure you save the file as UTF-8, too. Or encode special characters with HTML entities, even in the document head.
09-10-2011, 11:27 PM
when I run the checket at W3 it says:
Sorry, I am unable to validate this document because on line 26 it contained one or more bytes that I cannot interpret as us-ascii (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication.
The error was: ascii "\xA0" does not map to Unicode
Lines 24 through 27 of my web page source code are:
When you write code, create definitions, and name variables,
it's best to think positive.*Avoid temptations to think negative.
Line 26 has the ' and nothing else but plain text.
BTW: In that quotes section just to the left of the word "best" are the following characters: & # 3 9 ; s
and they are being displayed as an apostrophe. That is telling me the phrase I used for an apostrophe is correct.
I am not able to determine what to do from the previous reply.
BTW: I am using Windows 7 and Ultra Edit. When saving a file there are no options such as ISO-anything or UTF-8.
However after posting and reading my post, in the quoted section I see
I don't know what the asterisk came from as I did a cut and paste from my source file. When viewing the web page there is a little square there instead of the space that is in the source code. Hmmm. Why would a few spaces be replaced with odd symbols and not all?
09-11-2011, 12:02 AM
OK, I saved your page as HTML file and opened in Coda. There it tells me that the text encoding is “Western (Windows Latin 1)”. Now, in Coda I have the option of converting the text to other encodings but that doesn’t matter now. Knowing that the HTML file is “Western (Windows Latin 1)” I went back to the W3C validator, validated your page and then in the combo box in the line entitled “Encoding” I chose “windows-1252 (Western Europe)” and there it was able to validate (showing 28 errors and 39 warnings).
So, the conclusion is that UltraEdit seems to have saved the file as “windows-1252”. I don’t know UltraEdit but a web search brought up a discussion from 2006 with the following information that could be useful:
My suggestions for the configuration for UTF-8 webpage writers:
First read the FAQ about UTF-8, UTF-16, UTF-32 & BOM and the Character encodings to get the basic knowledge you need.
Second in UltraEdit or UEStudio open Configuration - File Handling and set following options:
Uncheck the 2 EBCDIC options if you are not editing EBCDIC files, but check the option On Paste convert line ending to destination type (UNIX/MAC/DOS).
Set the Default file type for new files to whatever you prefer. If your host server is a Linux/Unix server, you should use Unix to avoid problems while downloading or uploading via FTP. If your host server is a Windows server, use DOS.
Set the Unix/Mac file detection/conversion to Automatically convert to DOS format to avoid problems with copy and paste with other windows applications.
Uncheck Only recognize DOS terminated lines (CR/LF) as new lines for editing.
Uncheck Write UTF-8 BOM header to ALL UTF-8 files when saved.
If Write UTF-8 BOM on new files created within this program (if above is not set) should be enabled or not depends on the type of Unicode files you are creating. If you create for example only XML and HTML type files (HTML, HTML, PHP, ASP, ...) in UTF-8, you should uncheck this option, because then the encoding should be defined inside the file with encoding="utf-8" (XML) or with content="text/html; charset=utf-8" (HTML). See FAQ above for details about BOM and when it should be used.
Enable Save file as input format (UNIX/MAC/DOS). That's important because we convert every file automatically to DOS for editing, but we want to save it in the original format and not in DOS format. This option is moved from the Save to the DOS/UNIX/MAC Handling configuration dialog in v12.10 of UltraEdit!
You can set option Trim trailing spaces on file save to whatever you prefer. Normally it is good to activate it because it can reduce the file size a little bit which is interesting for HTML files.
Use the second option Open file without temp file but prompt for each file and set the Threshold for example to 4096 (4 MB). You can set the threshold value to a higher value if your computer has enough performance and your harddisk is fast and you often edit large files.
Enable Auto detect UTF-8 files, Detect Unicode(UTF-16) files without BOM and Detect ASCII/ANSI files with Escaped Unicode. You can disable for example the UTF-16 detection if you are sure that you will never edit a UTF-16 file. Every enabled detection increases the file load time of normal ASCII files. But if you don't know what format your files have, it is better to let UE/UES automatically detect it.
The 3rd option Disable automatic detection of HEX file format on reload is not important for handling Unicode files.
And as already explained above also enable the option Always create new files as UNICODE at Editor - New File Creation.
Last if you download/upload the files via the FTP client of UE/UES, always use the binary transfer mode and not the text mode. If your files on your Apache (Unix/Linux) host server are already Unix files, than UE/UES is converting a file temporary for editing only into DOS after loading from FTP and before opening in the editor and before saving back to Unix with the settings above. So there is no need to do it while transfering the file content. Local copies are then also Unix files and so are 100% identical with the files on the server. Using binary transfer mode is faster than the text/ASCII mode. Even if you don't use the FTP client of UE/UES and use a different FTP tool, you should always create and edit files with Unix line termination and use the binary transfer mode and the automatic conversion to DOS feature of UE/UES except your host server is a Windows server.
Added on 2009-11-09: I have found an undocumented setting in uedit32.exe of v11.10c and later. With manually adding to uedit32.ini
you can force all non Unicode files (not UTF-16 files) to be read/saved as UTF-8 encoded files. But new files are nevertheless created and saved either as Unicode (UTF-16 LE) or ASCII/ANSI files. So this special setting is only for already named files. However, creating a new file in ASCII/ANSI, save it with a name, close it and re-open it results in a new file encoded in UTF-8. Be careful with that setting. Even real ANSI files are loaded with this setting as UTF-8 encoded file causing all ANSI characters to be interpreted wrong.
Added on 2010-03-28: With UltraEdit v16.00 instead of Create new files as Unicode there are now the choices
Create new files as ANSI
Create new files as UTF-8
Create new files as UTF-16
at Advanced - Configuration - Editor - New File Creation. Therefore users of UltraEdit 16.00 and later can set the default encoding for new files to UTF-8. With this change the option Format of the Save As dialog is not remembered and preset anymore in UE v16.00 and later. Format of the Save As dialog is now always set to Default on opening of the dialog.
Another useful thread seems to be http://www.ultraedit.com/forums/viewtopic.php?f=7&t=7015
Hope that helps.
09-13-2011, 03:30 AM
I did not suspect there were this many options in just selecting the text format of a web page. I am not ready for an knowledgable reply, but I am ready to say: Wow. Thank you for the time you took to reply to me.
PS are the FAQs you are thinking of on a different site. I went to the FAQ pages here and a search for "utf-" yielded zero results.
Powered by vBulletin® Version 4.2.2 Copyright © 2015 vBulletin Solutions, Inc. All rights reserved.