...

View Full Version : XML Parsing Error: undefined entity



reddem0n
10-20-2008, 08:38 PM
I'm having an issue getting an xml to load up properly. What I am basically doing is querying the database for jobs, then loading it up on the xml so I can send it to rss feeds. The problem is, and based on my research it looks like the xml can't parse anything that has &nbsp. I am fairly new to how this works, below is the code with some parts of it mainly the sql queries snipped out due to security reasons. I read somewhere that you can put in 3 .ent in a doctype feed code above and have it so browsers can parse it correctly, but I had very little success with that.

Here is the error message:



XML Parsing Error: undefined entity
Location: juju.xml
Line Number 13, Column 130:
<description><p style="margin: 0in 0in 0pt"><span>Provides technical and organizational support for the Director of Marketing, &nbsp;including but not limited to data entry, vendor trafficking, daily redemptions and project tracking.<span>&nbsp; </span>Responsible for maintaining databases and creating queries/reports.<span>&nbsp; </span>Assists in evaluating and analyzing database information and making recommendations.</span></p></description>
-----------------------------------------------------------------------------------------------------------------------------------------------^




Here is the .cfm file that updates the xml file whenever I run it. Remember I deleted the cf queries, don't let that distract you.



<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<CFQUERY NAME="Getjob" DATASOURCE="#ODBC_DATASOURCE#">
DELETED
</CFQUERY>

<cfquery name="GetCats" datasource="#ODBC_DATASOURCE#">
DELETED
</cfquery>

<CFQUERY NAME="GetTypes" DATASOURCE="#ODBC_DATASOURCE#">
DELETED
</CFQUERY>

<CFQUERY NAME="NumJobs" DATASOURCE="#ODBC_DATASOURCE#">
DELETED
</CFQUERY>

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Running Batch Query.....</title>
<link rel="stylesheet" href="CSS/shell.css" />
</head>
<body>

<div id="main">
<cfset jobs='
<positionfeed
xmlns="http://www.juju.com/employers/positionfeed-namespace/"
xmlns:xsi="http://www.w3.org/TR/xmlschema-1/"
xsi:schemaLocation="http://www.juju.com/employers/positionfeed-namespace/ http://www.juju.com/employers/positionfeed.xsd"
version="2006-04">
<source>Title</source>
<sourceurl>company url here/</sourceurl>
<feeddate>#DateFormat("# Now()#", "yyyy-mm-dd")#</feeddate>'>
<cfoutput query="Getjob">

<cfset description = replace(description, "<[^>]*>", "", "all")>
<cfset description = ReReplace(description, "&nbsp;", " ", "all")>


<cfset jobs='#jobs#
<job id="#jobid#">
<employer> </employer>
<title>#title#</title>
<description>#description#</description>
<postingdate>#date_entered#</postingdate>
<location>
<city>#location#</city>
<state>#statecode#</state>
<zip>#zipcode#</zip>
<nation>#countrycode#</nation>
</location>
<jobsourceurl>job source url</jobsourceurl>
</job>
'>
</cfoutput>
<cfset jobs='#jobs#</positionfeed>'>

<cffile action="write" addnewline="yes" charset="utf-8" file="D:\\Clients\job source url" output="#jobs#" fixnewline="no">
</div>

</body>
</html>



Any advice would be very much appreciated.

oesxyl
10-20-2008, 11:05 PM
you have few solution:
- include the missing entities declaration into a internal doctype in the head of the file
- use some preprocessing to remove &nbsp; and in the same time to replace special chars like &, <. >, " and ' with entities known by xml where you need.

I prefere last solution.

regards

reddem0n
10-20-2008, 11:26 PM
Hello this is what I have...I was basically going from this site to create this page...For some reason the doc type feed is not being read though.

Here is what I have:
http://pastebin.com/m74bfbd2a

Taken from this page:
http://www.alexatnet.com/node/19

oesxyl
10-20-2008, 11:50 PM
Hello this is what I have...I was basically going from this site to create this page...For some reason the doc type feed is not being read though.

Here is what I have:
http://pastebin.com/m74bfbd2a

Taken from this page:
http://www.alexatnet.com/node/19
first of all you have two doctypes, must be only one.
second, this is not a solution, the author improvise.

I see something like that in your script:


<cfset description = ReReplace(description, "&nbsp;", " ", "all")>

that means you replace &nbsp; with ' ' in output? ( I have no idea what is cfset, that is coldfusion?)

In case the answer to my question is yes:
- remove any doctype, you don't need if you replace &nbsp; with ' '
- add another rules, if you need, after that to replace:
& -> &amp; // take care this to be first to avoid things like &amp;gt;
< -> &lt;
> -> &gt;
' -> &apos;
" -> &quote;

regards

Yay
10-21-2008, 04:23 PM
that means you replace &nbsp; with ' ' in output? ( I have no idea what is cfset, that is coldfusion?)
Yes, cfset relates to ColdFusion. It's used to set a value.
When you use the cfset tag to call a function, you do not need to assign the function return value to a variable if the function does not return a value or you do not need to use the value returned by the function. For example, the following line is a valid ColdFusion cfset tag for deleting the MyVariable variable from the Application scope:

<cfset StructDelete(Application, "MyVariable")>



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum