...

View Full Version : Pull info from an HTML file



telavir
04-23-2008, 06:11 PM
I'm a dabbler, usually grabbing snippets of code here and there for my own personal use, but I can't seem to figure out how to do this one:

I have an HTML page that displays an image (among other things). I want to pull the source of that image from the HTML file. I know that this is the only src=" attribute is between line 140 and 145. The value of that tag looks like this "/blog/images/trip07.jpg".

Once I get that src into a variable I can manipulate it all over the place. I just don't know how to retrieve it.

chump2877
04-23-2008, 06:19 PM
<img src="image.jpg" alt="" id="image1" />

<script type="text/javascript">
// Get image src
alert(document.getElementById('image1').src);
</script>

telavir
04-23-2008, 06:24 PM
Great idea, but the page is on a system I cannot edit. Sorry, I didn't think to mention that. I can probably use this code for something else though.

chump2877
04-23-2008, 06:31 PM
Then you're going to need some server side code to read in the source of the remote page and get the image src attribute value...What server side languages are you familiar with?

telavir
04-23-2008, 06:40 PM
I've got a basic understanding of php, but the files are not on the same server. Is there a load string I can run to load the html file, then search a section of the file for the src=" string and return the rest of the line?

chump2877
04-23-2008, 08:15 PM
This code gets you all the images' "src" attributes on the CodingForums main page. The best way to pinpoint a particular image src is if the image has a unique id attribute (or even other unique attributes). Relying on line numbers is no good, since line numbers can easily change if code is added, subtracted, or modified.

What is the URL you are looking at? What is the exact source code for the image you want?


<?

$file_content = file_get_contents("http://www.codingforums.com/index.php");

preg_match_all("/(<img )(.+?)( \/)?(>)/",$file_content,$images);
foreach ($images[2] as $val)
{
if (preg_match("/(src=)('|\")(.+?)('|\")/",$val,$matches) == 1)
echo $matches[3] . "<br />";
}

?>

telavir
04-23-2008, 10:00 PM
I appreciate the time you are taking with this. Here is and example (http://www.wapsisquare.com/d/20080414.html) of the page. I would be trying to pull the comic. I don't have the actual page up yet, but it would update in a similar way.

I know how to generate the URLs based on the date, but I don't know how to compensate for the actual name of the file, which has no consistency (other than being displayed on a predictably consistent page).

chump2877
04-24-2008, 06:26 AM
If the width and height attributes of the image are always present, and their values are always constant, you could find the image in that way. Or you could find the image based on the directory in which the image file resides. Or both.

Also, you can use regular expressions (http://www.regular-expressions.info/) or you can use PHP's DOM extension (http://www.php.net/manual/en/intro.dom.php) (to navigate the HTML DOM (http://www.w3.org/TR/DOM-Level-2-Core/introduction.html) as you would with JavaScript).

See the following code...the output can be found here (http://www.mediamogulsweb.com/client/getImageNames.php):


<?

// Define your constants
define("WIDTH", 756);
define("HEIGHT", 287);
define("FILE_NAME", "http://www.wapsisquare.com/d/20080414.html");
define("IMG_DIR", "comics/");


// Get file name using regular expressions
$file_content = file_get_contents(FILE_NAME);
preg_match_all("/(<img )(.+?)( \/)?(>)/i",$file_content,$images);

foreach ($images[2] as $val)
{
if (preg_match("/(width=)('|\")(".WIDTH.")('|\")/i",$val,$matches) == 1 && preg_match("/(height=)('|\")(".HEIGHT.")('|\")/i",$val,$matches) == 1)
{
if (preg_match("/(src=)('|\")(.+?)('|\")/i",$val,$matches) == 1)
{
echo "Using the Width and Height attributes of the image: <b>";
echo $matches[3] . "</b><br /><br />";
break;
}
}
}

foreach ($images[2] as $val)
{
if (preg_match("/(src=)('|\")(.+?)('|\")/i",$val,$matches) == 1)
{
$pos = stripos($matches[3],IMG_DIR);
if ($pos !== false)
{
echo "Using the ".IMG_DIR." directory in which the image is located: <b>";
echo $matches[3] . "</b><br /><br />";
break;
}
}
}

foreach ($images[2] as $val)
{
if (preg_match("/(width=)('|\")(".WIDTH.")('|\")/i",$val,$matches) == 1 && preg_match("/(height=)('|\")(".HEIGHT.")('|\")/i",$val,$matches) == 1)
{
if (preg_match("/(src=)('|\")(.+?)('|\")/i",$val,$matches) == 1)
{
$pos = stripos($matches[3],IMG_DIR);
if ($pos !== false)
{
echo "Using BOTH the Width and Height attributes of the image AND the ".IMG_DIR." directory in which the image is located: <b>";
echo $matches[3] . "</b><br /><br />";
break;
}
}
}
}


// Get file name using the DOM
$dom = new DOMDocument();
$dom->loadHTMLFile(FILE_NAME);
$images = $dom->getElementsByTagName("img");
for ($i=0; $i<$images->length; $i++)
{
$imageFile = $images->item($i)->attributes->getNamedItem("src")->nodeValue;
$imageWidth = $images->item($i)->attributes->getNamedItem("width")->nodeValue;
$imageHeight = $images->item($i)->attributes->getNamedItem("height")->nodeValue;
$pos = stripos($imageFile,IMG_DIR);
if ($pos !== false && $imageWidth == WIDTH && $imageHeight == HEIGHT)
{
echo "(Navigating the DOM instead of using regular expressions -- Requires PHP 5)<br />";
echo "Using BOTH the Width and Height attributes of the image AND the ".IMG_DIR." directory in which the image is located: <b>";
echo $imageFile . "</b><br /><br />";
break;
}
}

?>

telavir
04-24-2008, 06:23 PM
You so totally rock! It'll take me a while to process this.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum