storing references to nodes during recursion
I don't know much about the DOM so forgive my ignorance.
At the moment, my code recurses all the nodes in a div and pushes the text nodes onto an array that I later access.
Which works fine, but if the document is very large it takes a few seconds to do this. The document I am working with at the moment ends up with an array of 3,880 text nodes in it, which takes about 5 seconds to compile.
So what I was thinking is it would be better, in the recursion, to just be storing a reference to those nodes rather than the nodes themselves.
I know that there is a nodeList (although I don't know if that includes nested elements or if each child node has its separate nodeList), but to me I think ideal would be to push the index of the node onto the array during recursion and then later to access it by something like
(and I guess more importantly, will it be significantly quicker than storing the nodes themselves?)
thanks for any thoughts...
nodelists will only contain the direct child nodes of the parent, not nested nodes in sub-tags.
it would be much faster to store strings in the array instead of nodes. if you can pluck the properties of the textNodes into a new soft object, that object should be much faster to operate on in a loop of many.
textnodes are an older interface, what are you doing with the array that you need the actual textNodes for?
hmmm. let's see if I can do this briefly. The reason why I (think I) need the actual nodes is:
- once I have them I start looping through each node, and looping through each word in each node, feeding it to the spellcheck engine (fair enough, could just use the nodeValue for that)
- then if it comes up misspelt, the user gets the option to change it, either by direct input or from a list (ok, that bit was irrelevant)
- but then, the page only registers a change under two conditions (and these I can't do anything about):
1. the text has actually changed
2. the parent node of the text node has gained and lost focus
yeah, so that's the bit - when/if the change does occur I dispatch a focus and blur event to the parentNode of the text node that got changed, thereby registering the change. Which is why I need to know, at the time of that change, which node the text belongs to.
I think. I mean, I could just store a reference to the parentNode that the text belongs to at the time of pushing it onto the array, but then I'm back where I started...
unless there's some better way of going about this?
a couple of things.
maybe you were using data or new text nodes?
it seems that if you set the .textContent property of a node, it updates on-screen right away, no bluring required. seems to work even in contentEditable.
2. presumably, all the text you are scanning is in an html element container.
rather than looping through the dom and spell-checking each word, it would likely be much faster to find all the misspellings at once using a unique word list from the textContent of the container.
once you have a short list of misspelled words, you then iterate your nodes, checking for words in the node that appear in your misspelled list. This means you have to search from a dozen words each node word, rather than from all the words each node word.
this gives you the ability to basically instantly skip a node if nothing is mispelled.
it's faster because we save a lot of looping on each node using a unique word list and the look-up-table, comparing only (#words * #misspellings) to find a bad text, instead of (#words * #dictionaryWords) to find a bad one. if there is a bad one, your normal id/suggestion routine takes over.
this pre-routine sweep also allows us to display a list of all the found bad words before the replacement wizard UI is shown. Even if it somehow took longer (which it wont), the user won't mind as much because we can say "fixing 12 of 14" somewhere.
this can save a lot of user frustration/doubt if their misspellings occur early in the text, since using just document-order selections to indicate progress would make the process appear to be taking a long time.
re-reading the OP, i think i might have a faster textNode collector for you:
i see you've been working on this project quite a bit. hopefully you can use some of the above snips or concepts to get the performance where you want it to be.
if you are using indexOf() anywhere or an array loop, a compatible look-up-table version can run dozens of times faster...
Thanks for your thoughts. It's kind of complicated to describe what's going on here, as this is an extension designed to run over the top of another extension (and the base extension is password protected, otherwise I would just point you at it). But basically:
1) It's not the on-screen display that is the problem, it's a question of the page's editor registering that changes have been made. I tried it with textContent, but it's the same result - it appears we have to spoof the page into thinking that human changes have been made, and the least obtrusive way I can find is with the focus/blur routine. My guess is that the editor has some sort of "contentchanged" event that doesn't get fired if the text is altered programmatically. This bit works OK, anyway, so I'm not that worried about it.
2) The pre-lookup idea certainly seems interesting and makes a whole lot of sense. I gather that this is to reduce waiting time while the code finds the next misspelt word. This is actually not such a big deal, either - it skips along quite nicely once it actually starts running, and part of the requirement for running the base extension is a certain amount of RAM and processing speed, so we can assume that everybody will more or less get the same results (I have a slow computer, too, which I guess is good for benchmarking). I am going to look into this later, though, as it's an interesting idea and would make the code really click rather than just amble along at an acceptable pace.
3) Your node collector does appear to be seriously faster than what I am using at the moment. The only thing is that during collection I filter out some elements that aren't of interest, but plonking that filter down in the middle of your function of course just broke it, and being that I don't really understand how yours works I am at a bit of a loss. Here's what I'm using at the moment, which works fine in terms of grabbing what I want and ignoring the rest. If you could help me incorporate the "if" into your function I'd be really grateful:
Thanks again for your thoughts, and sorry it gets kind of abstract. The code's heavily reliant on elements in the base extension and some NSI components, so I guess just dumping code here wouldn't be that useful. If you're really interested, I can send you the xpi file and an html page to run it on, but either way I appreciate the input.
to be honest, that code i wrote is older and bit magical, i don't fully understand how it all works...
it's made to support a ton of filter condtions, maybe you can use one.
i posted it here, along with some usage examples. Checkout post#1 for the examples.
the corralling is flexible on that utility, but it's still a bit stiff customizing multi-criteria filtering. since it gathers the nodes a lot faster, we're going to be ahead of ourselves if we simply gather them all and then filter the complete set to make the subset we're interested in.
i would try something like this:
if i had the isMispelled() function working, i would add "&&" to the last line and add a new last line saying isMispelled(node.textContent) or something to find only the misspelled nodes, or invert it if i needed the good nodes, you get the idea...
thinking about that getComputedStyle overhead, i see that it's applied on elements only.
if we check every node, we check a lot more often than if we check every element, which there are fewer of.
therefore, it should be faster to break our problem into two steps.
1. mark all elements as hidden or visible using a fast-accessible property
2. check each node against the element's property instead of running the method each node
i made a couple other optimizations on the last function for performance and lower garbage generation, sorry if it's harder to read...
the only test I really need to do is if the parent node is editable. If it is then it needs to be spellchecked. If it's not then it doesn't. Simple.
So (still not understanding really what goes on in your function) I hacked this together:
nodes, ms, nodes/sec
(and just for comparison, without the filter, the results are)
so the filter obviously slows stuff down. But unless there's some other way around it, a three second startup time on an 80k+ word document seems to be acceptable.
Then I had other stuff to do, so I turned the computer off (I should really do this more often - I'm sure it would make me write better code, like those Russian filmmakers who didn't have any film) and thought on and off about the idea of pre-compiling a list of misspelt words, and how to find the node the word belongs to in that case.
Which made me think I could use nsIWebBrowserFind, because once I have the word highlighted (which is just a selection range) I can get the node it belongs to, etc, etc.
But then I thought, really, I'm just reinventing all this stuff that XPCOM already has, because nsIEditor has an nsIEditorSpellCheck which has all sorts of funky stuff, including a GetNextMisspelledWord method... and as much as I love rolling my own, when you see how fast the find bar in Firefox finds a word within a massive document, it's kind of hard to resist...
all of which is to say that I think I am about to do what I have been annoyed by in the past: The Massive Mid-Thread U-Turn.
And I only really describe the above process to let you know that while it may look like this thread was a waste of time, it was actually very useful to me in making me think about a really sensible way to be doing this that will (I hope - I haven't even begun to write it yet) get rid of all the bugs and annoyances encountered so far.
So thanks again, rnd me :thumbsup:
the object was only to test many possibilities at once, so we can simply a single comparison down to the statement.
window.find(strToSearchFor) works in firefox to select a phrase in the dom; perhaps you can just call it on each misspelled word to jump to the word in a wizard...
|All times are GMT +1. The time now is 07:52 PM.|
Powered by vBulletin®
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.