...

View Full Version : How to capture the content between open and close tags



manijs
07-08-2005, 08:40 AM
Hi I have html file with huge contents included multiple open and close tage and one of the tag that I am trying to capture the date contents between the open and close tags show below, but I am not sure regular expression would help. If regular expression would work for this then how?.

<mso:Date_x0020_updated msdt:dt="string">2005-04 06T00:00:00Z</mso:Date_x0020_updated>


Please help!

Thanks so much

Many

jscheuer1
07-08-2005, 10:28 AM
My first thought was to grab the tag's innerHTML but, it is not considered a tag in the normal sense by IE6 (what look like its opening and closing tags are considered as two separate tags) so I figured out this convoluted method:

function getMsoDateTagData(){
var msoTest=document.getElementsByTagName('mso:Date_x0020_updated')[0]
var blah=msoTest.parentNode.innerHTML;
blah=blah.substr(blah.toLowerCase().lastIndexOf('mso:date_x0020_updated msdt:dt="string">')+40)
blah=blah.substr(0,blah.indexOf('<'))
return blah;
}As long as you don't have more than one of these tags on the page, this will give you the string you are looking for. One possible usage:

alert (getMsoDateTagData())

One caveat - you must wait until the page has loaded to use this function. One other thing, this all assumes that the name of the tag never changes.

enumerator
07-08-2005, 11:07 AM
If you can, declare mso as a namespace: CUSTOM Element | custom Object (http://msdn.microsoft.com/workshop/author/dhtml/reference/objects/custom.asp); its tagName would then be: "Date_x0020_updated" (or whatever followed the mso: prefix).

Kor
07-08-2005, 12:09 PM
object.firstChild.nodeValue
or
object.firstChild.data

if the content is a textNode

enumerator
07-08-2005, 05:20 PM
If the object is an object.

Kor
07-08-2005, 05:29 PM
everything can be considered as an object (well, almost :D ), The problem is, as jscheuer1 noticed, how to refere the object and, I should add, which methods are allowed to be acted upon an object. XML DOM is not quite the same with HTML DOM, from this point of view

enumerator
07-08-2005, 06:16 PM
A custom object is created when its namespace is declared. "Otherwise, the custom tag is treated as an unknown tag when the document is parsed." ;)

manijs
07-08-2005, 09:26 PM
Hi jscheuer1, the script you post early was very useful. How do I grab another content between the <mso:Prime_x0020_SME msdt:dt="string">mailto:Many, Many</mso:Prime_x0020_SME>
and write them out in the same page?.

please see below:

<mso:Date_x0020_updated msdt:dt="string">2005-04-06T00:00:00Z</mso:Date_x0020_updated>
<br>
<mso:Prime_x0020_SME msdt:dt="string">mailto:Many, Many</mso:Prime_x0020_SME>


Thanks,
Appreciated for your help

Many

jscheuer1
07-08-2005, 11:37 PM
After checking the DOM inspector, I thought firstChild.data looked very promising. It worked great in FF but neither it nor firstChild.nodeValue worked in IE. I'm hesitant to declare these tags namespaces because I suspect that they are proprietary MS tags to begin with. Anyways, thanks for the ideas, now, for manijs, here is what works here (stick this in the head):

<script type="text/javascript">
function getMsoDateTagData(){
var msoTest=document.getElementsByTagName('mso:date_x0020_updated')[0]
var blah=msoTest.parentNode.innerHTML;
blah=blah.substr(blah.toLowerCase().lastIndexOf('mso:date_x0020_updated msdt:dt="string">')+40)
blah=blah.substr(0,blah.indexOf('<'))
return blah;
}

function getMsoPrimeTagData(){
var msoTest=document.getElementsByTagName('mso:prime_x0020_sme')[0]
var blah=msoTest.parentNode.innerHTML;
blah=blah.substr(blah.toLowerCase().lastIndexOf('mso:prime_x0020_sme msdt:dt="string">')+37)
blah=blah.substr(0,blah.indexOf('<'))
return blah;
}

window.onload=function(){
document.getElementById('datum').innerHTML=getMsoDateTagData()+' '+getMsoPrimeTagData()
}
</script>and this goes in the body:

<span id="datum"></span>

Kor
07-11-2005, 09:44 AM
It worked great in FF but neither it nor firstChild.nodeValue worked in IE.

It might be the so called "gap problem". If a possible textNode (even it is a empty, thus a "gap"), Moz consider it as a child, while IE ignore it...

jscheuer1
07-11-2005, 11:38 AM
Kor, I've run across the gap problem before but, this seems to be the opposite situation, FF sees it as the firstChild whereas IE does not. In my experience the intervening blank text node throws off FF, not IE.

Kor
07-11-2005, 12:36 PM
well, both data and nodeValue will return the innerText (except that, at least in theory, nodeValue is a readonly attribute), so that I guess that innerHTML (even it is a standard DOM method) must have solved the problem... If not so, as you have said, than I guess that you should circle throught the object's childs and clone them in a colection of objects.

manijs
07-13-2005, 04:18 AM
Hi jscheuer1, based on the script below what if the open and close tage those I am trying to crab the content between them are in the <Head></Head> instead of <body></body>. How do I modify this code to grab the contents as before?. Is it necessary to declare the <span id="datum"></span> in the body?

<script type="text/javascript">
function getMsoDateTagData(){
var msoTest=document.getElementsByTagName('mso:date_x0020_updated')[0]
var blah=msoTest.parentNode.innerHTML;
blah=blah.substr(blah.toLowerCase().lastIndexOf('mso:date_x0020_updated msdt:dt="string">')+40)
blah=blah.substr(0,blah.indexOf('<'))
return blah;
}

function getMsoPrimeTagData(){
var msoTest=document.getElementsByTagName('mso:prime_x0020_sme')[0]
var blah=msoTest.parentNode.innerHTML;
blah=blah.substr(blah.toLowerCase().lastIndexOf('mso:prime_x0020_sme msdt:dt="string">')+37)
blah=blah.substr(0,blah.indexOf('<'))
return blah;
}

window.onload=function(){
document.getElementById('datum').innerHTML=getMsoDateTagData()+' '+getMsoPrimeTagData()
}
</script>

Thanks a lots,

Many

enumerator
07-13-2005, 11:05 AM
You guys are discussing this as if it were a legitimate technical issue; it isn't: forget about Firefox. This is a Microsoft office document. Just use IE to transform it, and be done already...

jscheuer1
07-13-2005, 12:08 PM
Enumerator, I'm not sure how that gets the OP the data desired.

Manijs, I don't see why the tags being in the head would make a difference. The document.getElementsByTagName() method will scan the entire document. If these are proprietary tags, perhaps IE will not see them as tags at all when placed in the head. Did you test it out and find that to be the case? If so we can try a different method.

<span id="datum"></span>

Must be in the body, yes. But, it needn't be anywhere if you have other uses for the data. Once the tags in question are parsed by the browser, or at the very latest, once the page is parsed by the browser:

getMsoDateTagData()

and

getMsoPrimeTagData()

can be used as string variables in any other code you like.

enumerator
07-13-2005, 12:21 PM
Enumerator, I'm not sure how that gets the OP the data desired.
The document needs to be rebuilt as valid HTML markup (if that's the goal). getElementsByTagName() doesn't apply, because those are not HTML elements. They can be declared as custom tags in IE, and their data extracted to be reassigned and with proper tags, while creating a new source file.

Ramu sivam
07-13-2005, 12:23 PM
hi all,

I have one doubt.Is there any possiblity to connect database in javascript
without helping of other programming language.Because i was asked this question in an interview last week.


regards,
Ramu sivam

Kor
07-13-2005, 12:25 PM
nope. javascript is a client-side language. It can not grab/write/store data. You may use javascript only to manipulate "locally" durring the session the variables which have been already brought in the web page by a server-side application/language. You can also substitute the HTML forms' action submit, but this is not really a connection with the data base...

enumerator
07-13-2005, 12:51 PM
Is there any possiblity to connect database in javascript
JS can automate database objects (such as ADO), under special circumstances.

enumerator
07-13-2005, 01:45 PM
Hi I have html file with huge contents included multiple open and close tage and one of the tag that I am trying to capture the date contents between the open and close tags show below, but I am not sure regular expression would help. If regular expression would work for this then how?

<mso:Date_x0020_updated msdt:dt="string">2005-04 06T00:00:00Z</mso:Date_x0020_updated>
Likewise, if you meerely want to transform source code, a regular expression could work directly on a file string (via a file reader), without requiring a DOM. Whether this approach is applicable also depends on details, which you haven't yet described (such as the whole point of this)...

manijs
07-13-2005, 10:35 PM
Hi jscheuer1,

Here is the structure of the actual page that I am trying to grab the contents that I've hightlight in blue color below between the tags.

<HEAD>

<xml>

<mso:CustomDocumentProperties>

<mso:Title0 msdt:dt="string">About My Website</mso:Title0>
<mso:Date_x0020_created msdt:dt="string"></mso:Date_x0020_created>
<mso:Approval_x0020_Level msdt:dt="string"></mso:Approval_x0020_Level>
<mso:Categories msdt:dt="string"></mso:Categories>
<mso:Assigned_x0020_To msdt:dt="string"></mso:Assigned_x0020_To>
<mso:Date_x0020_updated msdt:dt="string">2005-04-06T00:00:00Z</mso:Date_x0020_updated>
<mso:Author0 msdt:dt="string"></mso:Author0>
<mso:SME msdt:dt="string">Many</mso:SME>
<mso:Order msdt:dt="string">2900.00000000000</mso:Order>
<mso:Build_x0020_Status msdt:dt="string">Ready for Review</mso:Build_x0020_Status>
<mso:Assigned_x0020_To0 msdt:dt="string">Many</mso:Assigned_x0020_To0>
<mso:Item_x0020_Type msdt:dt="string">Page</mso:Item_x0020_Type>
<mso:Build_x0020_Comments msdt:dt="string"></mso:Build_x0020_Comments>
<mso:Sitemap msdt:dt="string">About Us</mso:Sitemap>
<mso:Title_x0020_Change msdt:dt="string">0</mso:Title_x0020_Change>
<mso:Prime_x0020_SME msdt:dt="string">mailto:Many, Many</mso:Prime_x0020_SME>
<mso:New_x0020_Content msdt:dt="string">0</mso:New_x0020_Content>
<mso:Secondary_x0020_SME msdt:dt="string"></mso:Secondary_x0020_SME>

</mso:CustomDocumentProperties>

</xml>

</HEAD>
Thanks,

Many

jscheuer1
07-13-2005, 11:49 PM
If that is the first <xml> tag on the page, these functions work here:

function getMsoDateTagData(){
var msoTest=document.getElementsByTagName('xml')[0]
var blah=msoTest.innerHTML;
blah=blah.substr(blah.toLowerCase().lastIndexOf('mso:date_x0020_updated msdt:dt="string">')+40)
blah=blah.substr(0,blah.indexOf('<'))
return blah;
}

function getMsoPrimeTagData(){
var msoTest=document.getElementsByTagName('xml')[0]
var blah=msoTest.innerHTML;
blah=blah.substr(blah.toLowerCase().lastIndexOf('mso:prime_x0020_sme msdt:dt="string">')+37)
blah=blah.substr(0,blah.indexOf('<'))
return blah;
}Use with the span tag as before, or with whatever you like as I mentioned in my previous post.

enumerator
07-14-2005, 01:07 AM
Twilight Zone...

manijs
07-14-2005, 02:56 AM
hi jscheuer1, the code you provided early was very usefull. Since I am new to javascript, but the stuff I am trying to do is not easy as I thought. I just have last couple questions for you and I am very appreciated for your support.

1. For the date "2005-04-06T00:00:00Z". Is there any way I could write it out as: June, 4, 2005 format?.

2. For the mail "mailto:Many@yahoo.com, Many". I would like to write it out as a hyperlink something like this: <a href="mailto:Many@yahoo.com">Many</a> this should happen dynamically.

Thanks,

Many

jscheuer1
07-14-2005, 06:22 AM
Shouldn't be a problem, we just have to construct an array of month names and parse out the data using it and your format. I've got a question for you though. Why @yahoo.com? Other than the fact that I suspect it is your email address's domain, what makes it relevant? It is not referred to in the tag so, at that rate, why not just insert a conventional email link on the page? I have one possible idea but, what's yours and/or, in actual practice, will the mso tag contain the domain name?

manijs
07-14-2005, 09:01 PM
Hi jscheuer1, What I am trying to say is the email alias not the yahoo mail account. So the tag format will be something like this:

<mso:Prime_x0020_SME msdt:dt="string">mailto:myemailalias, My name here</mso:Prime_x0020_SME>

so I like to grab it and wirte it out as a hyperlink to send mail. Hope this is make sense


Thanks,

--Many

jscheuer1
07-15-2005, 12:26 PM
I've experimented a bit more with this idea. It probably will not work in browsers other than IE unless you can follow the xml tag with:

<!--[if IE]>

and preface the </xml> tag with:

<![endif]-->

Also, it is unclear to me if:

2005-04-06T00:00:00Z

refers to July 4 or May 6 or some other date perhaps. Need to know, in order to parse it, I am assuming July 4 for now, as that is close to your initial post, even though you used June 4 in your post - generally the months are from 0 to 11 in computerese. Another thing, in order to accommodate other browsers we must know if this script will be placed before or after the xml code section. I am assuming after, for now. With all that in mind, here it is:


<script type="text/javascript">
function getMsoDateTagData(){
var msoTest=document.getElementsByTagName('head')[0]
var blah=msoTest.innerHTML.toString();
blah=blah.substr(blah.toLowerCase().indexOf('mso:date_x0020_updated msdt:dt="string">')+40)
blah=blah.substr(0,blah.indexOf('<'))
var yr=blah.substr(0,4);
var date=Math.abs(blah.substr(5,2))
var month=months[Math.abs(blah.substr(8,2))]
blah=month+' '+date+', '+yr
return blah;
}

function getMsoPrimeTagData(){
var msoTest=document.getElementsByTagName('head')[0]
var blah=msoTest.innerHTML.toString();
blah=blah.substr(blah.toLowerCase().indexOf('mso:prime_x0020_sme msdt:dt="string">')+37)
blah='<br><a href="'+blah.substr(0,blah.indexOf(','))+'">'+blah.substr(blah.indexOf(',')+1,blah.indexOf('<'))+'</a>'
return blah;
}

months=['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

window.onload=function(){
document.getElementById('datum').innerHTML=getMsoDateTagData()+' '+getMsoPrimeTagData()
}
</script>

manijs
08-20-2005, 12:41 AM
Hi jscheuer1 how do I attach a cc email to the line below?



blah='<br><a href="'+blah.substr(0,blah.indexOf(','))+'">'+blah.substr(blah.indexOf(',')+1,blah.indexOf('<'))+'</a>'
return blah;


Thanks,

maniJs

jscheuer1
08-20-2005, 07:39 PM
I don't have the code for a cc handy at the moment but to append anything to the mailto href contained in the variable 'blah' before it is returned just do this:

blah='<br><a href="'+blah.substr(0,blah.indexOf(','))+'insert literal string here">'+blah.substr(blah.indexOf(',')+1,blah.indexOf('<'))+'</a>'
return blah;If that isn't enough, a regular expression replace can often be used or the 'blah' string can be parsed out further using the indexOf() and substr() methods while slipping in the desired string at the required location. To save me some trouble as, it has been some time since I worked on this, give me literal examples of what 'blah' currently contains and of what you want it to contain.

martin_narg
08-20-2005, 10:21 PM
Hi I have html file with huge contents included multiple open and close tage and one of the tag that I am trying to capture the date contents between the open and close tags show below, but I am not sure regular expression would help. If regular expression would work for this then how?.

<mso:Date_x0020_updated msdt:dt="string">2005-04 06T00:00:00Z</mso:Date_x0020_updated>


Please help!

Thanks so much

Many

My two pence:



<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>Untitled Document</title>
<script type="text/javascript">
function parseText(str) {
var strOut = str.replace(/<[^>]*>/g, "");
strOut = strOut.replace(/[\f\n\r]+/g, "\n").substring(1, strOut.length);

document.frm.txtOutput.value = strOut;

/* Array required? */
var arrOutput = strOut.split("\n");
alert(arrOutput);
}
</script>
</head>

<body>
<form name="frm" onsubmit="parseText(this.txtInput.value); return false;">
<textarea name="txtInput" cols="100" rows="10">
<xml>
<mso:CustomDocumentProperties>
<mso:Title0 msdt:dt="string">About My Website</mso:Title0>
<mso:Date_x0020_created msdt:dt="string"></mso:Date_x0020_created>
<mso:Approval_x0020_Level msdt:dt="string"></mso:Approval_x0020_Level>
<mso:Categories msdt:dt="string"></mso:Categories>
<mso:Assigned_x0020_To msdt:dt="string"></mso:Assigned_x0020_To>
<mso:Date_x0020_updated msdt:dt="string">2005-04-06T00:00:00Z</mso:Date_x0020_updated>
<mso:Author0 msdt:dt="string"></mso:Author0>
<mso:SME msdt:dt="string">Many</mso:SME>
<mso:Order msdt:dt="string">2900.00000000000</mso:Order>
<mso:Build_x0020_Status msdt:dt="string">Ready for Review</mso:Build_x0020_Status>
<mso:Assigned_x0020_To0 msdt:dt="string">Many</mso:Assigned_x0020_To0>
<mso:Item_x0020_Type msdt:dt="string">Page</mso:Item_x0020_Type>
<mso:Build_x0020_Comments msdt:dt="string"></mso:Build_x0020_Comments>
<mso:Sitemap msdt:dt="string">About Us</mso:Sitemap>
<mso:Title_x0020_Change msdt:dt="string">0</mso:Title_x0020_Change>
<mso:Prime_x0020_SME msdt:dt="string">mailto:Many, Many</mso:Prime_x0020_SME>
<mso:New_x0020_Content msdt:dt="string">0</mso:New_x0020_Content>
<mso:Secondary_x0020_SME msdt:dt="string"></mso:Secondary_x0020_SME>
</mso:CustomDocumentProperties>
</xml>
</textarea>
<br><br>
<textarea name="txtOutput" cols="100" rows="10"></textarea>
<br><br>
<input type="submit" value="parse the text">
</form>
</body>
</html>

manijs
08-21-2005, 05:04 AM
I don't have the code for a cc handy at the moment but to append anything to the mailto href contained in the variable 'blah' before it is returned just do this:

blah='<br><a href="'+blah.substr(0,blah.indexOf(','))+'insert literal string here">'+blah.substr(blah.indexOf(',')+1,blah.indexOf('<'))+'</a>'
return blah;If that isn't enough, a regular expression replace can often be used or the 'blah' string can be parsed out further using the indexOf() and substr() methods while slipping in the desired string at the required location. To save me some trouble as, it has been some time since I worked on this, give me literal examples of what 'blah' currently contains and of what you want it to contain.


Hello jscheuer1,
blah currently contains an email address with hyperlink and this is work fine at this point, but I just want to send cc mail out at the same time. I know could have just concatenate the cc mail like so var strcc = "?cc=ccmail@yahoo.com" as you mentioned above. I tried it and it doesn't work.

Thanks,
Many

jscheuer1
08-21-2005, 03:54 PM
Well, that's not exactly what I asked for but, it will do. This works here just as I mentioned in my previous post:

blah='<br><a href="'+blah.substr(0,blah.indexOf(','))+'?cc=ccmail@yahoo.com">'+blah.substr(blah.indexOf(',')+1,blah.indexOf('<'))+'</a>'
return blah;

manijs
08-22-2005, 08:03 AM
Hi jscheuer1,

Thank you very much this is work perfectly, but I just noticed that very time some of my contents that has the following open tag and close tage as show below then the script seems not to work at all unless I remove these tags first by hand. With the script you've been helping so far. Is there anyway to add on the script to check or remove these tags first. Looks like these tags came with office documents.

Thanks

Many

<!--[if gte mso 9]>

<xml>
<mso:.../>

.......


</xml>


<![endif]-->

jscheuer1
08-22-2005, 09:31 PM
I looked into this and that tag is invisible to IE and renders the contents between its open and close invisible to IE. You could put an additional <xml></xml> tag set outside of and around it but, that would amount to editing it out in the first place. If you attempt to insert this xml tagset using script in IE, since IE cannot see the if/endif tags, there is no point of reference from which to do so.

manijs
08-23-2005, 09:23 PM
That's ok then I just gonna remove it manually.

Thanks and I really appreciated for your help.
You are :thumbsup:
--Many



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum