Quote:
Originally Posted by Logic Ali
I just threw this together to try to retrieve all visible text. It could be substantially refined, but seems to work if run as the last item in the document.
Presumably you have the server-side code to retrieve data from another domain.
Code:
<script type='text/javascript'>
var e = document.getElementsByTagName('*'),
t = '',
tagElem,
nodes,
cn;
for( var i = 0; i < e.length; i++ )
{
tagElem = e[ i ];
nodes = tagElem.childNodes;
if( !/SCRIPT/i.test( tagElem.nodeName ) )
for( var j = 0; j < nodes.length; j++ )
if( ( cn = nodes[ j ] ).nodeType == 3 )
t += ' ' + cn.textContent;
}
alert(t)
</script>
|
that will dredge up <script>, <iframe>, <noscript>, and <style> tag text, not cool.
if you want visible text, at least start in document.body instead of the HTML element...
first loop through and run element.parentNode.removeChild(element) on every script and style tag before you grab the text.