PDA

View Full Version : preg_match Help Please



Jinxy
Oct 17th, 2011, 09:48 PM
I'm using this regex to return the links on a page.
This code:


$src = "http://page_of_links.html";
$regexp = "<a\s[^>]*href=(\"??)([^\">]*?)\\1[^>]*>(.*)<\/a>";
preg_match_all("/$regexp/siU", $src, $matches);

$matches = $matches[3];
foreach($matches as $var)
{
echo "$var<br>";
}


returns this:

Search
Home
Profile
Friends
Messages
[email protected]
privacy settings page
Home
Profile
Friends
Messages
Survey
Help
Settings
Log out
Simplified site

Now I only want to single out one of the links which is this one: [email protected]


Could somebody show me a simple way of doing this? Maybe using preg_match again?

Thanks

Jinxy
Oct 18th, 2011, 05:43 AM
This is what I ended up with. Seems to work ok.



$src = "http://page_of_links.html";

$regexp = "<a\s[^>]*href=(\"??)([^\">]*?)\\1[^>]*>(.*)<\/a>";
preg_match_all("/$regexp/siU", $src, $matches);

$matches = $matches[3];
foreach($matches as $var)
{
if ( preg_match("#m.facebook.com$#", $var, $match) )
{
echo $var;
}
}


I wanted to match it with the @ sign but for some reason I couldn't get this to work?


preg_match("#\@m.facebook.com$#", $var, $match)

Can somebody tell me why?

gvre
Oct 18th, 2011, 10:00 AM
Try this


$s = 'Search<br>
Home<br>
Profile<br>
Friends<br>
Messages<br>
<a href="mailto:[email protected]">[email protected]m.facebook.com</a><br>
privacy settings page<br>
Home<br>
Profile<br>
Friends<br>
Messages<br>
Survey<br>
Help<br>
Settings<br>
Log out<br>
Simplified site<br>';

$pattern = '#mailto:([^@][email protected])#i';
if (preg_match($pattern, $s, $m))
{
echo $m[1];
}

Jinxy
Oct 18th, 2011, 05:01 PM
Try this


$s = 'Search<br>
Home<br>
Profile<br>
Friends<br>
Messages<br>
<a href="mailto:[email protected]">[email protected]</a><br>
privacy settings page<br>
Home<br>
Profile<br>
Friends<br>
Messages<br>
Survey<br>
Help<br>
Settings<br>
Log out<br>
Simplified site<br>';

$pattern = '#mailto:([^@][email protected])#i';
if (preg_match($pattern, $s, $m))
{
echo $m[1];
}


That works great until I try to add it to what I have, lol.



$regexp = "<a\s[^>]*href=(\"??)([^\">]*?)\\1[^>]*>(.*)<\/a>";
preg_match_all("/$regexp/siU", $src, $matches);

$matches = $matches[2];
foreach($matches as $var)
{
$pattern = '#mailto:([^@][email protected])#i';
if (preg_match($pattern, $var, $m))
{
echo $m[1];
}
}


That doesn't return anything. $var is returning mailto:[email protected] in the first match. So what am I doing wrong?

gvre
Oct 18th, 2011, 07:12 PM
Could you post the content of page_of_links.html?

Jinxy
Oct 18th, 2011, 08:03 PM
This is from the body tag down:


<body><div class="mfsm" id="viewport"><div id="mainContent"><noscript><meta http-equiv="X-Frame-Options" content="deny" /></noscript><div id="root" class="acw"><div class="contents"><div class="mPageHeader acb aps" id="fb_header"><table cellspacing="0" cellpadding="0" class="lr"><tr><td valign="top"><div class="header_logo"><a href="/home.php?refid=0"><img class="img" src="http://static.ak.fbcdn.net/rsrc.php/v1/yE/r/QbujCdyP44b.png" alt="Facebook" id="facebook_logo" width="76" height="20" title="forceimage" /></a></div></td><td valign="top" class="r"><a class="inv" href="#search">Search</a></td></tr></table></div><div class="marquee acb"><span class="mfss"><a class="inv" href="/home.php?refid=0" accesskey="0"><span>Home</span></a></span><span class="mfss"><a class="inv" href="/jinxy.1?refid=0" accesskey="1"><span>Profile</span></a></span><span class="mfss"><a class="inv" href="/friends.php?refid=0" accesskey="2"><span>Friends</span></a></span><span class="mfss"><a class="inv" href="/messages/?refid=0" accesskey="3"><span>Messages</span></a></span></div><div id="rootContent"><div id="body"><div><div class="al aps">Upload via email</div><div class="acw apm"><div class="subsection"><span class="mfsm">Send an email with a photo attachment to your personal publishing address:</span></div><div><strong><a href="mailto:[email protected]">[email protected]</a></strong></div><br /><span class="mfss">To control the privacy of photos uploaded through email, visit your <a href="/privacy/?refid=0">privacy settings page</a> and update the default privacy.</span></div></div></div><div class="abt acy apm"><a href="/download.php?cr=int_mf&refid=0">Install Facebook on your QuickTime Agent and browse faster</a></div><div id="footer"><div class="abt acw apm"><span class="mfss fcg"><a class="sec" href="/home.php?refid=0">Home</a>*·*<a class="sec" href="/jinxy.1?refid=0">Profile</a>*·*<a class="sec" href="/friends.php?refid=0">Friends</a>*·*<a class="sec" href="/messages/?refid=0">Messages</a></span></div><div class="abt acw apm" id="search"><form method="post" action="/search?refid=0"><input type="hidden" name="fb_dtsg" value="AQAYftDY" autocomplete="off" /><input type="hidden" name="post_form_id" value="44fc2012ababfca8afe39853525fbf0d" /><input type="hidden" name="charset_test" value="€,´,€,´,水,",Є" /><table cellspacing="0" cellpadding="0" class="comboInput"><tr><td class="inputCell"><input class="input" name="query" size="15" type="text" /></td><td class="btnCell"><input value="Search" type="submit" class="btn btnD" /></td></tr></table></form></div><div class="acw apm"><span class="mfss fcg"><a class="sec" href="/survey.php?refid=0">Survey</a>*·*<a class="sec" href="/help.php?refid=0">Help</a>*·*<a class="sec" href="/settings.php?refid=0">Settings</a></span><br /><span class="mfss fcg"><a href="/logout.php?h=b28e30857f5138b6435d5075da56e47d&t=1318963870&refid=0" data-sigil="logout">Log out</a> Jinxy</span><br /><span class="mfss fcg">Having trouble? Try the <a href="/a/preferences.php?site=mini&gfid=AQB4VLbYOz7wV2VT&refid=0">Simplified site</a></span></div></div></div></div></div></div></div></body></html>


This is what my first match returns:

/home.php?refid=0
#search
/home.php?refid=0
/jinxy.1?refid=0
/friends.php?refid=0
/messages/?refid=0
mailto:[email protected]
/privacy/?refid=0
/download.php?cr=int_mf&refid=0
/home.php?refid=0
/jinxy.1?refid=0
/friends.php?refid=0
/messages/?refid=0
/survey.php?refid=0
/help.php?refid=0
/settings.php?refid=0
/logout.php?h=b28e30857f5138b6435d5075da56e47d&t=1318964330&refid=0
/a/preferences.php?site=mini&gfid=AQB4VLbYOz7wV2VT&refid=0

gvre
Oct 18th, 2011, 09:25 PM
Try this


$content = @file_get_contents("page_of_links.html");
if ($content)
{
$pattern = '#mailto:([^@][email protected])#i';
if (preg_match($pattern, $content, $m))
{
echo $m[1];
}
}

Jinxy
Oct 18th, 2011, 10:15 PM
Still nothing. This is what I have and I'm using curl because I need to login:



<?php

$user_name = "user_name";
$user_pass = "pass";

$url = "http://www.facebook.com/login.php?m=m&next=http://m.facebook.com/upload.php&refsrc=http://m.facebook.com/upload.php&refid=9&b=56";

$agent = "Apple iPhone OS v2.0 CoreMedia v1.0.0.5A347";
$cook_file = "login_cookie.txt";

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_COOKIESESSION, TRUE);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $agent);
curl_setopt($ch, CURLOPT_COOKIEFILE, $cook_file);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cook_file);
curl_setopt($ch, CURLOPT_TIMEOUT, 8);
curl_setopt($ch, CURLOPT_POSTFIELDS, "&email=$user_name&pass=$user_pass&login=Log+In");
$src = curl_exec($ch);
curl_close($ch);

if ($src)
{
$pattern = '#mailto:([^@][email protected])#i';
if (preg_match($pattern, $src, $m))
{
echo $m[1];
}
}
?>

gvre
Oct 18th, 2011, 10:19 PM
Add

var_dump($src); exit;
right before

if ($src)
and post the result.

Jinxy
Oct 18th, 2011, 10:38 PM
Ok I uploaded the source to my site just like curl gets it:

http://area51.heliohost.org/facebook_src.html

gvre
Oct 18th, 2011, 10:54 PM
$pattern = '#mailto:(.+?m.facebook.com)#i';
if (preg_match($pattern, $content, $m))
{
echo html_entity_decode($m[1]);
}

Jinxy
Oct 18th, 2011, 11:13 PM
I see what the problem was now. The source wasn't showing an @ sign. I guess I should haved looked at my source a little more closely and I wouldn't have had to bother you so much. Thank you so much for solving this for me. :)