...

View Full Version : Pulling information from the source of another site



ltabdiel
03-20-2007, 02:05 PM
Hi, what I am trying to do is pull information from the Generated source code of this site: http://armory.worldofwarcraft.com/#character-sheet.xml?r=Gul%27dan&n=Palaran
what I want to do with it is then manipulate it withing my own code to show certain elements which were pulled from the source of the previous site. I am somewhat experienced with php and would like to use it for the development. I am sorry if I didn't explain my issue well enough, just let me know. Oh and if I am just asking a question already answered, please just let me know how to find it, I tried searching for the answer but couldn't find it.

Thanks in advance

mlseim
03-20-2007, 02:57 PM
You'll be looking at something similar to below ... where you open the file and begin parsing-out various things between tags. The example is part of an RSS feed generator, but the idea is the same ...



<?php

// Get page
$url = "http://armory.worldofwarcraft.com/#character-sheet.xml?r=Gul%27dan&n=Palaran";
$data = implode("", file($url));

// Get content items between <html> and </html>
preg_match_all ("/<html>([^`]*?)<\/html>/", $data, $matches);

// Loop through each item
foreach ($matches[0] as $match) {
// Get title
preg_match ("/<title>([^`]*?)<\/title>/", $match, $temp);
$title = $temp['1'];
$title = strip_tags($title);
$title = trim($title);

// Get an item in the <h4> header area of the page
preg_match ("/<h4>([^`]*?)<\/h4>/", $match, $temp);
$date = $temp['1'];
$date = trim($date);

// Get some text between <p> and </p>
preg_match ("/<p>([^`]*?)<\/p>/", $match, $temp);
$text = $temp['1'];
$text = trim($text);

}

// output the things you found
echo "Title: $title <br>\n";

?>

aedrin
03-20-2007, 03:36 PM
Remember to mention on the page that you are grabbing data from that website.

Stealing content is bad. Borrowing is okay ;)

ltabdiel
03-20-2007, 03:52 PM
thanks for the help, it is exactly what i was trying to do, or atleast a start, problem is I am having a hard time figuring out how to find the element I want. I noticed that when I view the generated source in firefox i find that one of the variables i want from there, say his level, is in the source as

...
var theClassId = 2;
var theRaceId = 10;
var theClassName = "Paladin";
var theLevel = 68;
var theCharUrl = "r=Gul%27dan&n=Palaran";
...

I want to pull this info from there to my site. Any ideas? lol sorry for the questions. :)

Thanks in advance again.

ltabdiel
03-20-2007, 04:19 PM
Here's my latest attempt, I can't figure out how to get it to find that text. Maybe someone can show me where I am making the mistake. Thanks again.


<?php

// Get page
$url = "http://armory.worldofwarcraft.com/#character-sheet.xml?r=Gul%27dan&n=Palaran";
$data = implode("", file($url));

// Get content items between <html> and </html>
preg_match_all ("/<html>([^`]*?)<\/html>/", $data, $matches);

// Loop through each item
foreach ($matches[0] as $match) {
// Get title
preg_match ("/<title>([^`]*?)<\/title>/", $match, $temp);
$title = $temp['1'];
$title = strip_tags($title);
$title = trim($title);

// Get some text for the classid
preg_match ("/var theClassId = ([^`]*?);/", $match, $temp);
$classid = $temp['1'];
$classid = trim($classid);

}

// output the things you found
echo "Title: $title <br>\n";
echo "Class ID: $classid <br>\n";

?>

mlseim
03-20-2007, 04:56 PM
I think I might see something that's a problem ...

See the sample code below. If you run that and view all of the
matching (the whole thing) ... the part you're looking for does not
exist in the match. This could be because they are using Javascript
to populate the portion of the page you're looking for.

Grabbing the URL and parsing the HTML won't see the Javascripting.



<?php

// Get page
$url = "http://armory.worldofwarcraft.com/#character-sheet.xml?r=Gul%27dan&n=Palaran";
$data = implode("", file($url));

// Get content items between <html> and </html>
preg_match_all ("/<html>([^`]*?)<\/html>/", $data, $matches);

foreach ($matches[0] as $match) {
// don't match, just grab everything
}

// output the whole match
echo $match;

?>


Now, if they were to generate an RSS feed ... which would be a great idea on their part ... you would have everything you need. But many sites either don't know how, or feel it allows people to steal information. I disagree with the latter, because RSS Feeds generate interest to people who would normally not see your site, and therefore, draws more visitors to your site.

ltabdiel
03-20-2007, 05:09 PM
very interesting, well i guess the final answer then is, because everything I want is either part of javascript or is written to the page via javascript with the write command I can't pull the info. Thanks for all the help, unless you have a cure for that issue, then I will call you God. lol

hessodreamy
03-20-2007, 06:07 PM
just jumping in here. If the values you want are in the javascript, there's 2 things you could do:
1. when doing your pattern matching, just match stuff between <script> tags. That should make things easier for you
2. Just grab the url (as you are doing), then simply add a bit of javascript to the end to either write the required variables to screen (via a document.write or alert()) or redirect to a php page, passing the values of the variables in the url eg window.location="output.php?theLevel="+theLevel+"&theRaceId="+theRaceId

aedrin
03-20-2007, 06:30 PM
var theClassId = 2;
var theRaceId = 10;
var theClassName = "Paladin";
var theLevel = 68;
var theCharUrl = "r=Gul%27dan&n=Palaran";

There should be no problem grabbing this information.

Just have a regular expression for this pattern: var * = *; (The * indicating any character). Make sure that you make it lazy, and not greedy, otherwise you will get:


theRaceId = 10;
var theClassName = "Paladin";
var theLevel = 68;
var theCharUrl

As the first variable name :P



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum