CodingForums.com

CodingForums.com (http://www.codingforums.com/index.php)
-   XML (http://www.codingforums.com/forumdisplay.php?f=3)
-   -   compare XML files, text vs numbers (http://www.codingforums.com/showthread.php?t=283639)

qwertyjjj 12-06-2012 10:33 AM

compare XML files, text vs numbers
 
AFAIK, when comparing data in 2 xml files, everything is passed as text or characters.
This means that data in the format 0.5 in one file will appear different to data in the second file reading .5 as the precision is different.

Is there a way to convert data to numbers/decimals first in XML and compare?

sunfighter 12-06-2012 03:08 PM

What are you using to compare the two files? Maybe things would be a little eaier if you included the xml files your talking about.

Because "data in the format 0.5 " makes no sense to me and neither does "convert data to numbers/decimals"

qwertyjjj 12-06-2012 04:37 PM

Quote:

Originally Posted by sunfighter (Post 1297736)
What are you using to compare the two files? Maybe things would be a little eaier if you included the xml files your talking about.

Because "data in the format 0.5 " makes no sense to me and neither does "convert data to numbers/decimals"

file 1
<multiplyfactor>0.5</multiplyfactor>

file2
<multiplyfactor>.5</multiplyfactor>

When you compare these as text in xml they are different.

Alex Vincent 12-06-2012 07:53 PM

qwertyjjj, XML has no concept of numbers, believe it or not. A parsed DOM of your document will look like this:

multiplyfactor (element)
-- .5 (text node, whose value is a string)

Your best bet would be to have some custom scripting to identify elements containing numbers, and then convert their contents to actual numbers using parseInt or whatever your language's equivalent is.

qwertyjjj 12-07-2012 09:37 AM

Quote:

Originally Posted by Alex Vincent (Post 1297833)
qwertyjjj, XML has no concept of numbers, believe it or not. A parsed DOM of your document will look like this:

multiplyfactor (element)
-- .5 (text node, whose value is a string)

Your best bet would be to have some custom scripting to identify elements containing numbers, and then convert their contents to actual numbers using parseInt or whatever your language's equivalent is.

What is the best way to do this?
Give each XML item an attribute with the datatype or the schema should contain the element datatype and then the software can check the schema?

xmlguy 12-07-2012 12:52 PM

have you tried using a xml diff tool?
I use liquid studio and that has a fairly decent compare / diff tool.
http://www.liquid-technologies.com/Compare-XML.aspx

qwertyjjj 12-07-2012 03:03 PM

Quote:

Originally Posted by xmlguy (Post 1298036)
have you tried using a xml diff tool?
I use liquid studio and that has a fairly decent compare / diff tool.
http://www.liquid-technologies.com/Compare-XML.aspx

but programatically by software?
Surely in code, you can check a schema t get a datatype?

tracknut 12-07-2012 04:00 PM

Quote:

Originally Posted by qwertyjjj (Post 1298068)
Surely in code, you can check a schema t get a datatype?

I don't know how that's going to help, though. Both your examples (0.5 and .5) would be valid against the same schema, so what would that tell you? I agree, the way to do this seems to be to load the two xml files into a program and "walk" the objects comparing them.

Dave

qwertyjjj 12-07-2012 04:05 PM

Quote:

Originally Posted by tracknut (Post 1298081)
I don't know how that's going to help, though. Both your examples (0.5 and .5) would be valid against the same schema, so what would that tell you? I agree, the way to do this seems to be to load the two xml files into a program and "walk" the objects comparing them.

Dave

couldn't the schema have precision and scale?
ie everything has to have something before the decimal place?

tracknut 12-07-2012 04:22 PM

Quote:

Originally Posted by qwertyjjj (Post 1298082)
couldn't the schema have precision and scale?
ie everything has to have something before the decimal place?

Unfortunately I'm no master of the schema, but logically I'm going to guess that this is not a "schema issue" as both those numbers are completely legitimate representations of "one half". You may need to write a little test example and see if there's a way to get a schema validation to fail one and accept the other.

Dave

qwertyjjj 12-07-2012 04:25 PM

Quote:

Originally Posted by tracknut (Post 1298086)
Unfortunately I'm no master of the schema, but logically I'm going to guess that this is not a "schema issue" as both those numbers are completely legitimate representations of "one half". You may need to write a little test example and see if there's a way to get a schema validation to fail one and accept the other.

Dave

I guess the problem is how to tell the software to check them as a number rather than a string so that it doens;t find a difference when it compares the,
If it knows it's a decimal, then it sees 0.5 the same as .5, which is correct.

Alex Vincent 12-07-2012 04:27 PM

Let's step back a bit. First and foremost: what is going to consume the XML? Specifically, what programming or scripting language is that consumer written in?

This is most important, since XML without something to parse it is just a string of characters. :) The language will place constraints and expose capabilities that XML itself doesn't have.

sunfighter 12-07-2012 04:43 PM

It's easy enough, after parsing the xml, to insure that you have integers when you do math by forcing them to be numbers.

qwertyjjj 12-11-2012 09:06 AM

Quote:

Originally Posted by sunfighter (Post 1298093)
It's easy enough, after parsing the xml, to insure that you have integers when you do math by forcing them to be numbers.

Ok, but imagine that xml file has 100 different elements.
How does the parser know which is meant to be a decimal, which a string, which a date, etc.

You either hardcode it in the software or you check the schema?

Alex Vincent 12-11-2012 07:54 PM

Quote:

Originally Posted by qwertyjjj (Post 1299010)
Ok, but imagine that xml file has 100 different elements.
How does the parser know which is meant to be a decimal, which a string, which a date, etc.

You either hardcode it in the software or you check the schema?

Pretty much.


All times are GMT +1. The time now is 03:59 AM.

Powered by vBulletin®
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.