View Single Post
Old 11-03-2012, 11:11 PM   PM User | #6
rnd me
Senior Coder

 
rnd me's Avatar
 
Join Date: Jun 2007
Location: Urbana
Posts: 3,553
Thanks: 9
Thanked 480 Times in 463 Posts
rnd me is a jewel in the roughrnd me is a jewel in the roughrnd me is a jewel in the roughrnd me is a jewel in the rough
Quote:
Originally Posted by Logic Ali View Post
I just threw this together to try to retrieve all visible text. It could be substantially refined, but seems to work if run as the last item in the document.
Presumably you have the server-side code to retrieve data from another domain.

Code:
<script type='text/javascript'>

var e = document.getElementsByTagName('*'),
    t = '',
    tagElem,
    nodes,
    cn;
    
for( var i = 0; i < e.length; i++ )
{
  tagElem = e[ i ];
  nodes = tagElem.childNodes;
  
  if( !/SCRIPT/i.test( tagElem.nodeName ) )
    for( var j = 0; j < nodes.length; j++ )
      if( ( cn = nodes[ j ] ).nodeType == 3 )
        t += ' ' + cn.textContent;   
}        
    
alert(t)

</script>
that will dredge up <script>, <iframe>, <noscript>, and <style> tag text, not cool.

if you want visible text, at least start in document.body instead of the HTML element...

first loop through and run element.parentNode.removeChild(element) on every script and style tag before you grab the text.
__________________
my site (updated 5/13)
STATS (2013/5) HTML5:90.2% MOB:15.2% IE7:0.5% IE8:8.4% IE9:8.5% IE10:8.5%
rnd me is offline   Reply With Quote
Users who have thanked rnd me for this post:
Ace..... (11-05-2012)