PDA

View Full Version : How do I get awk/grep-like behaviour from JavaScript?


mackman
07-03-2008, 06:27 PM
Hi

I've just started playing around with JavaScript, and I have a problem:
I want to be able to search for a particular phrase on several webpages* and print out only the lines which contain it. Is this reasonable to do in JavaScript?
If so, please tell me how (Or better yet, hint!:)).

Edited to add:*in the source of several webpages.

rnd me
07-03-2008, 09:21 PM
1. how do you define a line? many webpages don't breakup thier source into traditional lines.

2. are these pages on the same domain as the page your script is on?
-if not, then you will be impeded by the same domain origin (http://www.mozilla.org/projects/security/components/same-origin.html) 2 (http://en.wikipedia.org/wiki/Same_origin_policy)



if your pages are formatted well, and your doing this to your own pages, its pretty easy to wrap your head around:

the basic procedure would be to
1 make a list of the pages you want to search,
2 retrieve their source,
3 search each one,
4 report the results.

mackman
07-04-2008, 01:41 AM
Thanks for the reply.
The pages I'm looking at are formatted in the way I want. I was using a bash script to do this (loop over wget to retrieve pages, awk to locate desired lines, pipe to single output file).

I'm not sure what impact the same domain origin problem will have.

What's going on here is that I no longer have my linux laptop, only access to my wife's windows xp machine (and no, unfortunately I'm not allowed have it dual-booting:() and I wanted to integrate this selective reporting of lines into my browser by making a firefox extension for it. Does this make any sense, or am I missing an obvious solution?

rnd me
07-04-2008, 05:00 AM
in your case, only needing firefox capability, you can bypass the same domain limit by injecting your script into an existing page from a bookmarklet (http://en.wikipedia.org/wiki/Bookmarklet). you don't need to make an extension.

after making your script, place it in a .js file.
you would then go to any page in the directory you want to search, and click the bookmark to launch the app.

here is a basic injection template you can modify. all you need to change is the absolute path of the javascript file. even a file:/// address will work, so you don't even need a server to do this.




bookmark this link to "install"
<a href="javascript:(function(){d=document;t=d.getElementsByTagName('head')[0];sc1=d.createElement('script');sc1.src='http://mysite.com/harvest.js?'+(new%20Date()).getTime();t.appendChild(sc1);}())">launch</a>





then in a file called, eg "harvest.js":


function IO(U, V) {//LA MOD String Version. A tiny ajax library. by, DanDavis
var X = !window.XMLHttpRequest ? new ActiveXObject('Microsoft.XMLHTTP') : new XMLHttpRequest();
X.open(V ? 'PUT' : 'GET', U, false );
X.setRequestHeader('Content-Type', 'text/html')
X.send(V ? V : '');
return X.responseText;}



var pages2search="showthread.php, usercp.php".split(/,\s?/g)

function searchPages(term, fileName, lineNumbers) {
results = [];
function srcFun(a) {
var lines = IO(a).split(/\n/g);
var mx = lines.length;
for (var i = 0; i < mx; i++) {
var it = lines[i];
if (it.match(term)) {
results.push( (fileName?a+" ":"")+(lineNumbers?i:"")+it);
}
}
}
pages2search.map( srcFun);
return results;
}



//call the search function now:
//arguments[0] is the term to eb looked for, a RegExp
//arguments[1] is an optional boolean: prepend filename to line?
//arguments[2] is an optional boolean: prepend line# to line?

alert( searchPages(/rnd/ig , 1,1).join("\n") ) ;
/*shows:

"showthread.php 147 <strong>Welcome, rnd me.</strong><br />
usercp.php 144 <strong>Welcome, rnd me.</strong><br /> "

*/

mackman
07-04-2008, 11:48 AM
Thanks. I'll need a while to wrap my head around that. Will reply when I've made sense of it.

fside
07-06-2008, 11:09 AM
The bookmarklet is a bookmark. But instead of going to a URL, when you select the bookmark from the list - it runs a javascript program, instead. He was suggesting you get this search behavior from a dropdown list in the browser, specifically from your organized list of bookmarks.

rnd me
07-06-2008, 10:32 PM
The bookmarklet is a bookmark. But instead of going to a URL, when you select the bookmark from the list - it runs a javascript program, instead. He was suggesting you get this search behavior from a dropdown list in the browser, specifically from your organized list of bookmarks.

yes. the reason being is that, as mentioned above, the bookmark runs code.
One great effect of doing this is that the code runs in the context on the current page. this includes full same-domain file access (eg: ajax methods can download any publicly available file).

without the address bar showing a page on the site you need files from,
you will be blocked by security from downloading the URLs you need.


i was a little vague as to the implementation of my above script.
so to use the above setup:

1. save the second code box to a file, named "harvest.js".
2. navigate firefox the newly saved file.
-it can be at an http or file:/// url.
-find it, and copy the full address from the address bar.

2. edit the code in the first code box (the link), replacing the sc1.src='...' url with the url on the clipboard (the saved script location).
-save it to a .htm file and view it in firefox.
-bookmark the link shown.
3. navigate to a page on the site containing the files you want to search.
4. click the bookmark to execute the javascript file you saved.


--------
the way i had it, i had the search hard-coded into the .js file.
you probably want to be able to do different searches.
modify the Searchpages function to use the following two lines:


var searchFor = new RegExp(prompt("enter the search string"), "ig"); //add this line
alert( searchPages(searchFor , 1,1).join("\n") ) ; //replace this line

you will also have to modify the list of pages (pages2search) to search in the.js file.

i had the code setup to run from this forum page in firebug, so you can try it out here .


let me know if anything is still hazy.
its a few steps i admit, but still far simpler than an extension would be...