...

View Full Version : surround all words w/ spans



mindlessLemming
12-15-2004, 05:27 AM
I want to take this chunk of html:


<p>The <span class="noun">Hare</span> <span class="verb">ran</span> past the <span class="adj">lazy</span> <span class="noun">Tortoise</span>.</p>


...and enclose any word which is not already inside a span, inside one.

I'm totally open to ideas on this one, it doesn't have to be done in javascript (the user will always have JS on). The content is starting out life as XML, so if anyone thinks this would be easier with XSL or PHP, I'm all ears.
Here's the XML block, if anyone wants to see that:

<text id="t002">
<name>The Tortoise and the Hare</name>
<content>The <n>Hare</n> <v>ran</v> past the <adj>lazy</adj> <n>Tortoise</n>.</content>
</text>


Thanks in advance to anyone who can offer suggestions/advice :)

codegoboom
12-15-2004, 06:00 AM
Does the lonely period at the end count as a word? :D

I'd think a regular expression would be the qickest way...

hemebond
12-15-2004, 06:10 AM
I'm attempting to do something very similar. This is the code I use to surround keywords
var re = new RegExp("([^A-Za-z0-9_])("+keywords[x]+")([^A-Za-z0-9_])","gi");
txt = txt.replace(re, "$1<span class=\"keyword\">$2</span>$3");
Now, the problem I'm having, is the fact that innerHTML does not work in XHTML (and I'm assuming XML). I've been trying all day to get createContextualFragment to work, but Firefox has now decided that my document no longer even has a body element.

So do a search on that, and I'll keep you posted on any progress I make.

joh6nn
12-15-2004, 06:45 AM
if i were gonna do it in javascript, i'd use the String.split() method to split sentences at white space, iterate through the resulting array, and then put it back together with the Array.join() method.

i'd probably end up doing the equivalent thing with php, though, because i think php would actually do it faster, and i'd also rather be able to see the result in the generated html. but that's just me.

php equivalents:
http://us2.php.net/manual/en/function.split.php OR http://us2.php.net/manual/en/function.explode.php
http://us2.php.net/manual/en/function.implode.php

mindlessLemming
12-15-2004, 06:48 AM
Nice one hemebond - thanks for sharing :)
You're way past me in the js skills department, but I'll let you know if/when I make more progress :thumbsup:


Oooh, joh6nn comin' through with the goods! :D I'll have to try that too.

Willy Duitt
12-15-2004, 10:20 AM
I really do not know what you are doing here when a class used on the p tags would format any text not included within a span tag... But, below is an example on how to use the nodeType to target the text not included within any other element within the p tag...

Please note, I use a return to print it back out onto the page so you can see what is going on... You will need to work the rest of the code into your application...



<script type="text/javascript">
function format(){
var p = document.getElementsByTagName('p');
for(var i=0; i<p.length; i++){
var words = p[i].childNodes;
for(var j=0; j<words.length; j++){
if(words[j].nodeType == 3 && words[j].nodeValue.match(/.*[^\s]/g)){
words[j].nodeValue = '<span>'+words[j].nodeValue+'</span>';
}
}
} return words;
} window.onload = format;

</script>
</head>

<body>
<div>
<p>The <span class="noun">Hare</span> <span class="verb">ran</span> past the <span class="adj">lazy</span> <span class="noun">Tortoise</span>.</p>
<p>The <span class="noun">Hare</span> <span class="verb">ran</span> past the <span class="adj">lazy</span> <span class="noun">Tortoise</span>.</p>

</div>
</body>


.....Willy

Willy Duitt
12-15-2004, 11:49 AM
BTW: It occured to me that this is what you are looking for:



<script type="text/javascript">
function format(){
var p = document.getElementsByTagName('p');
for(var i=0; i<p.length; i++){
var words = p[i].childNodes;
for(var j=0; j<words.length; j++){
if(words[j].nodeType == 3 && words[j].nodeValue.match(/.*[^\s]/g)){
var span = document.createElement('span');
span.style.color = 'red'; // TESTING PURPOSES ONLY, PLEASE REMOVE //;
span.appendChild(document.createTextNode(words[j].nodeValue));
words[j].parentNode.replaceChild(span,words[j]);
}
}
} alert(document.body.innerHTML); // TESTING PURPOSES ONLY, PLEASE REMOVE //;
} window.onload = format;

</script>
</head>

<body>
<div>
<p>The <span class="noun">Hare</span> <span class="verb">ran</span> past the <span class="adj">lazy</span> <span class="noun">Tortoise</span>.</p>
<p>The <span class="noun">Hare</span> <span class="verb">ran</span> past the <span class="adj">lazy</span> <span class="noun">Tortoise</span>.</p>


.....Willy

Edit: Added a style color to make it easier to see something happening...

hemebond
12-15-2004, 09:24 PM
Thanks Willy. I don't suppose you have a variant that will replace part of a text node do you?

Oh you piece of crap! I've spent the last 2 days trying to figure out how to use createContextualFragment in Mozilla Firefox, but was getting errors every time. I just tried it in Seamonkey and the bloody thing works fine. I'll post soem code soon.

mindlessLemming
12-15-2004, 11:03 PM
Awesome Willy, thank you! The only prblem is that each word needs to be in it's own span, with no spaces or punctuation inside the span either. I'm going to do my best to convert what you've already provided, I'll post the final result once I get there.


I really do not know what you are doing here...

Then I'll tell you :) I'm building a web based linguistics tool for post-graduate students at the Uni I work for. This section is where a student selects all the nouns, verb, clauses, conjunctions, etc within the extract. Don't ask me why it's in xhtml/javascript instead of Flash -- that decision came from above ;) I need to surround every word with spans so I have something to attach the onclick behavious to. I'm probably going about it all wrong, but this is only the mockup stage and everything is working well so far :D

Willy Duitt
12-16-2004, 02:08 AM
Oh, you may try to kick the tires on this example then...
although once I wrote it I realized that I should have used the childNodes of the <p> tag and not worry about that tag itself but only the contents of the <p> tag but the loops and regexp may help on how to ignore words which are attributes within tags...



<script type="text/javascript">
var str = '<p>The <span class="noun">Hare</span> <span class="verb">ran</span> past the <span class="adj">lazy</span> <span class="noun">Tortoise</span>.</p>';

var words = (/^<p>(.*)<\/p>$/g).test(str);
words = RegExp.$1.split(/<[^>]*>\w*<\/[^>]*>/gi);
words = words.toString().replace(/\,/g,'').split(/\s+/g);

var temp = (/^<p>(.*)<\/p>$/g).test(str);
temp = RegExp.$1.replace(/\,/g,'').split(/\s+/g);

for(var i=0; i<words.length; i++){
for(var j=0; j<temp.length; j++){
if(temp[j].match('^'+words[i]+'\$','i')){
temp[j] = '<span>'+temp[j]+'</span>';
}
}
}

str = '<p>'+temp.join(' ')+'</p>';
alert(str)

</script>


.....Willy

BTW: hemebond, I hope this helps you also... :)

mindlessLemming
12-16-2004, 04:06 AM
Hot d@mn Willy, you've sure got my vote for most helpful member :thumbsup: (that'll be the third time I've voted for you...why haven't you won yet? :confused: )

Now I just need to get the value of 'str' from the page itself, instead of writing it in the JS.
innerHTML works, of course, but I can't use that. (DOM scripts only)
My next attempt was this:


// 'textQ' is the id of the div containing the text
var str = "";
var hold = document.getElementById('textQ').childNodes;
for(var j=0; j<hold.length; j++){
str += hold[j].nodeValue;
}

...but that comes back with 'nullnull'

I'm guessing I'm going to have to walk through each childNode, check it's nodeType, loop through again if it's an element node, blah blah blah... That's gonna suck. heh.
Back to work...

hemebond
12-16-2004, 05:02 AM
Actually, my problem was getting it back into the document. I managed to get it in the end, but the methods I use are broken in Firefox, which is why I had so much trouble. Here it is for anyone who wants to see it:
function format()
{
var code = document.getElementsByTagName("pre");
for(var i = 0; i < code.length; i++)
{
// I'm assuming there is no existing markup
var txt = code[i].childNodes[0].nodeValue;

// encode entities again
txt = txt.replace(/</gi, "&lt;");
txt = txt.replace(/>/gi, "&gt;");
txt = txt.replace(/&/gi, "&amp;");

// basic text replacement to add surrounding span tags
txt = txt.replace(/\"([^\"]*)\"/gi, "<span class=\"string\">&quot;$1&quot;</span>");
txt = txt.replace(/\'([^\']*)\'/gi, "<span class=\"string\">'$1'</span>");
for(var x = 0; x < keywords.length; x++)
{
var re = new RegExp("([^A-Za-z0-9_])("+keywords[x]+")([^A-Za-z0-9_])","gi");
txt = txt.replace(re, "$1<span class=\"keyword\">$2</span>$3");
}
txt = txt.replace(/(\/\/.*)/gi, "<span class=\"comment\">$1</span>");
txt = txt.replace(/\/\*(.*)\*\//gi, "<span class=\"comment\">/*$1*\/</span>");
txt = txt.replace(/([0-9]+)/gi, "<span class=\"number\">$1</span>");
txt = txt.replace(/(#.*)/gi, "<span class=\"prepro\">$1</span>");

// I don't really understand this part
// I have to create a range
// set its contents to the contents of the pre element
// then create a cCF which doesn't seem to have any connection to the range object
// then replace the code
var r = document.createRange();
r.selectNodeContents(code[i]);

var f = r.createContextualFragment(txt);

code[i].replaceChild(f, code[i].childNodes[0]);
}
}It's used to markup C++ code within a document, so that a stylesheet can be used to show syntax highlighting. It only works in Gecko browsers (well, Seamonkey at least) because of createContextualFragment. The only reason I've had to do it this way is because innerHTML is read-only in XHTML.

Willy Duitt
12-16-2004, 06:15 AM
This may help in reaching the #text of the nodeTypes == 1 (elements)....



<script type="text/javascript">
function addSpans(){
var p = document.getElementsByTagName('p');
for(var i=0; i<p.length; i++){
var node = p[i].childNodes;
for(var j=0; j<node.length; j++){
if(node[j].nodeType == 1 && node[j].hasChildNodes() == true){
for(var k=0; k<node[j].childNodes.length; k++){
var span = document.createElement('span');
span.style.cursor = 'pointer';
span.style.color = 'blue'; // TESTING PURPOSES ONLY, PLEASE REMOVE //;
span.onclick = function(){ alert(this.innerHTML+'==This word has class!') };
span.appendChild(document.createTextNode(node[j].childNodes[k].nodeValue));
node[j].childNodes[k].parentNode.replaceChild(span,node[j].childNodes[k]);
}
}
if(node[j].nodeType == 3 && node[j].nodeValue.match(/.*[^\s\.]/gi)){
var span = document.createElement('span');
span.style.cursor = 'pointer';
span.style.color = 'red'; // TESTING PURPOSES ONLY, PLEASE REMOVE //;
span.onclick = function(){ alert(this.innerHTML+'==This word has no class!') };
span.appendChild(document.createTextNode(node[j].nodeValue));
node[j].parentNode.replaceChild(span,node[j]);

}
}
}
} window.onload = addSpans;


</script>
</head>

<body>
<div>
<p>The <span class="noun">Hare</span> <span class="verb">ran</span> past the <span class="adj">lazy</span> <span class="noun">Tortoise</span>.</p>
<p>The <span class="noun">Hare</span> <span class="verb">ran</span> past the <span class="adj">lazy</span> <span class="noun">Tortoise</span>.</p>
</div>
</body>


However, I still am having problems splitting up more than one word in nodeType == 3, wrapping each individual word in spans and returning them to the nodeValue... somehow, whenever I try, I lose the nodeValue completely...

......Willy

Willy Duitt
12-16-2004, 02:18 PM
Alright, I found a method to split up node type #text ....
I'm not sure how efficient this is and I have only tested it with IE but I would assume it should work cross-browser... The method I used was splitText() and I'm not even sure if I understand it... But I got it to do what I was wanting... :D



<script type="text/javascript">
<!--//
function addSpans(){ // written by: WillyDuitt@hotmail.com //;
var p = document.getElementsByTagName('p');
for(var i=0; i<p.length; i++){
for(var j=p[i].childNodes.length-1; j>-1; j--){
var node=p[i].childNodes[j];

if(node.nodeType == 1){
for(var k=0; k<node.childNodes.length; k++){
var span = document.createElement('span');
span.style.cursor = 'pointer';
span.className = node.className;
span.style.color = 'red'; // TESTING PURPOSES ONLY, PLEASE REMOVE //;
span.onclick = function(){ alert(this.innerHTML+':className='+this.className) };
span.appendChild(document.createTextNode(node.childNodes[k].nodeValue));
node.childNodes[k].parentNode.replaceChild(span,node.childNodes[k]);
}
}

if(node.nodeType == 3){
while(node.nodeValue.lastIndexOf(' ')>-1){
var span = document.createElement('span');
span.style.cursor = 'pointer';
span.style.color = 'blue'; // TESTING PURPOSES ONLY, PLEASE REMOVE //;
span.onclick = function(){ alert(this.innerHTML+': This word has no class!') };
span.appendChild(node.splitText(node.nodeValue.lastIndexOf(' ')));
p[i].insertBefore(span,node.nextSibling);
}
}
}

var span = document.createElement('span');
span.style.cursor = 'pointer';
span.style.color = 'blue'; // TESTING PURPOSES ONLY, PLEASE REMOVE //;
span.onclick = function(){ alert(this.innerHTML+': This word has no class!') };
span.appendChild(document.createTextNode(node.nodeValue));
node.parentNode.replaceChild(span,node);
}
} window.onload = addSpans;
//-->
</script>
</head>

<body>
<div>
<p>The test <span class="noun">Hare</span> <span class="verb">ran</span> past the <span class="adj">lazy</span> <span class="noun">Tortoise</span>.</p>
<p>The <span class="noun">Hare</span> <span class="verb">ran</span> past the <span class="adj">lazy</span> <span class="noun">Tortoise</span> test.</p>
</div>



I'm not very good at 'splainin, but if there is something you do not understand I will try my best to make sense of it...

.....Willy

BTW: Thanks for the interesting question...
I really had to apply myself and I learned quite a bit... :cool:

Edit: Per Andrew's recommendation, I have replaced params with square bracket notation here: p[i].childNodes[j]

mindlessLemming
12-16-2004, 11:55 PM
That is perfect :D:D:D
Heck Willy, you've gone all the way and provided me with more features than I was asking for help with (and the features you added just happen to be the next bits in line...Good guess! ;))

I ran it in IE and it worked nicely, unfortunately it did nothing in FF/Moz and threw this error:


p[i].childNode is not a function

Luckily it was only a minor error in your script (wow, you're human too) and i fixed it easily:


// replace this line
var node=p[i].childNodes(j);
//with this one
var node=p[i].childNodes[j];


You've saved me days of work here Willy -- if there's anything I can do for you, pass it along (except it won't be till the beginning of 2005...I'm working 7 days/week until then :()
Also, I'll make sure the Uni never tries to lay claim to your intellectual property :)

Brandoe85
12-17-2004, 12:14 AM
Nobody can compete with Willy :D Ive tried this code to and works perfect...Thanks Willy :)

Willy Duitt
12-17-2004, 01:42 AM
That is perfect :D:D:D
Heck Willy, you've gone all the way and provided me with more features than I was asking for help with (and the features you added just happen to be the next bits in line...Good guess! ;))

I ran it in IE and it worked nicely, unfortunately it did nothing in FF/Moz and threw this error:

Luckily it was only a minor error in your script (wow, you're human too) and i fixed it easily:


// replace this line
var node=p[i].childNodes(j);
//with this one
var node=p[i].childNodes[j];


You've saved me days of work here Willy -- if there's anything I can do for you, pass it along (except it won't be till the beginning of 2005...I'm working 7 days/week until then :()
Also, I'll make sure the Uni never tries to lay claim to your intellectual property :)

Aye... I know not how that err in using params instead of square bracket notation got there (childNodes(j))... :eek:

I can only assumed I overlooked it because it was working (why, I don't know) but thanks for pointing that out... :thumbsup:

I do not even begin to believe that their are not more bugs in that script... For one, I never looked at the generated DOM fragment to check if, in fact, when I replaced the innerText (used for the lack of a better word) of the element textNodes, that I actually replaced the original span with the one I created...

And I'm sure there is a better and more efficient way to use splitText()... But I never heard of that method before, and was happy to see I got it to work as I liked!! Besides, improvements can always be made now that I know the theory works, and hopefully someone more knowledgeable than I will come along and add to the script, teaching us all something... But yeah!! I got it to work!! :D

Anyways, Andrew, I do thank you for posting such an interesting question... Because I have learned quite a bit trying to get my idea for a solution to actually work...

Cheers;
.....Willy

mindlessLemming
12-17-2004, 02:50 AM
Anyways, Andrew, I do thank you for posting such an interesting question... Because I have learned quite a bit trying to get my idea for a solution to actually work...

If finding the answer has taught you a tenth of what your answer has taught me, I'd be surprised ;)

Thanks again mate :thumbsup:

Willy Duitt
12-17-2004, 03:18 AM
Andrew;

I only thought today while I was christmas shopping (actually while I was standing there while my lady was christmas shopping) to check the generated DOM fragment and as I suspected, there is a bug when the #text of an element is replaced with the one which was created that includes the span, style, event handler, ect...

Below is an example of what I am referring to....


<P><SPAN style="CURSOR: pointer; COLOR: blue">The</SPAN><SPAN style="CURSOR: pointer; COLOR: blue"> test</SPAN><SPAN style="CURSOR: pointer; COLOR: blue"> </SPAN><SPAN class=noun><SPAN class=noun style="CURSOR: pointer; COLOR: red">Hare</SPAN></SPAN><SPAN style="CURSOR: pointer; COLOR: blue"> </SPAN><SPAN class=verb><SPAN class=verb style="CURSOR: pointer; COLOR: red">ran</SPAN></SPAN><SPAN style="CURSOR: pointer; COLOR: blue"> past</SPAN><SPAN style="CURSOR: pointer; COLOR: blue"> the</SPAN><SPAN style="CURSOR: pointer; COLOR: blue"> </SPAN><SPAN class=adj><SPAN class=adj style="CURSOR: pointer; COLOR: red">lazy</SPAN></SPAN><SPAN style="CURSOR: pointer; COLOR: blue"> </SPAN><SPAN class=noun><SPAN class=noun style="CURSOR: pointer; COLOR: red">Tortoise</SPAN></SPAN>.</P>

Not to mention that #text nodes containing only a space appear to be wrapped with span tags, albeit without styling ot the onclick event...

These problems should not be hard to fix by removing the parentNode of the #text node....

IE: removeChild(node.childNodes[k].parentNode)

And, in regards to the #text nodes which only contain spaces, a regular expression checking for the nodeValue for a match, and if so, ignore... should fix this but, I do not think this is much of a problem... And, perhaps the double span tags which both contain the same class will not be much of an issue... But I would fix the former at least...

Let me know if you need help with this...
But, not tonight... :)

.....Willy

BTW: I just noticed that the #text nodes which only contain spaces do have an onclick event handler attached to them... It's only hard to target the spaces and not the words on either side...

Anyways, I think the theory is sound, but there is some bugs which need to be worked out...

mindlessLemming
12-17-2004, 03:46 AM
Thanks for the follow-up Willy, I had just noticed the clickable space problem...
How are you viewing the generated fragment? If you can tell me that, I *should* be able to tweak it a little more...

Cheers,
Andrew.

Willy Duitt
12-17-2004, 04:05 AM
You can view the generated DOM code by using Mozilla's DOM Inspector...
But I used a snippet of javascript to view it in IE:
(either paste into the address bar or save as a favorite/bookmark)



javascript:var%20o=document.documentElement,p,w=window.open('','_blank'),d=w.document;d.write('<html><body><pre>');z('<'+o.tagName);for(var%20i=0;p=o.attributes[i],i!=o.attributes.length;i++)if(p.specified)z('%20'+p.nodeName+'="'+p.nodeValue+'"');z('>'+o.innerHTML+'</'+o.tagName+'>');d.write('</pre></body></html>');d.close();function%20z(s){d.write(s.replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;').replace(/"/g,'&quot;'));}


Please note: Must all be on one line... sometimes the forum software breaks and add lines... Make sure your popup blocker if any does not prevent the new window from opening...

Javascript is Kewl... :thumbsup:

.....Willy

BTW: Brandoe85;
Thank you for the kind words...
I really appreciate it... :)

.....Willy

mindlessLemming
12-17-2004, 04:16 AM
Javascript is Kewl... :thumbsup:

Six months ago I would have argued that point.. :rolleyes: :o
Alright, I've been working on this thing for two days. (not so much this bit, but the project in general - now I discover our PHP servers don't have XSL support...AAAAAH! :mad: ).
It's friday afternoon, so I think I'll wait till tomorrow or maybe even monday before tackling this some more.
It sure does make a heck-of-a-lotta spans :eek:

You're a champ and a gentleman Willy, no doubt about it :)

hemebond
12-17-2004, 05:01 AM
It sure does make a heck-of-a-lotta spans :eek:I was wondering about that. Is there a reason you don't just use XML + XSL?

mindlessLemming
12-17-2004, 11:39 PM
Well, mostly because I have no idea how one goes about referring to text nodes (and excluding spaces) via XSL... I had a couple of tries, but the results were far from encouraging. Since the functionality is entirely reliant on JS, I may as well use JS for the formatting as well.

As I said in the first post though, I'm totally open to suggestions :) Feel free to PM me if you think there's another avenue I should be exploring.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum