View Full Version : RegEx help
Terry
05-10-2003, 12:42 AM
Hi,
I'm opening up a file that has results of a search. I'm trying to get only the urls out of it. This should be simple since every url is enclosed with these characters: * and 1, example:
*<a href="http://allafrica.com/stories/200305090757.html">Tanzania: Jitters Over
Tainted Fish Reports Calmed</a> 1
When I open the file I chop it up with explode() etc. on the <dl> and </dl> until I get a subscript of the results I'm looking for. Now I just want to be able to get the urls between the two characters * and 1.
I don't know that much about regular expressions. I want a pattern that can get all the substrings between these 2 characters. Something like:
$pattern = "\*([a-zA-Z0-9])*1";
Any help would be appreciated, thanks.
Terry
Mhtml
05-10-2003, 08:04 AM
Hello. :)
With out knowing exactly what you want to do I can only give you a pattern.$pattern="/\\*(.*)1/is"
* is a meta character so it must be escaped!
Terry
05-10-2003, 04:21 PM
Thanks for the response. I think I understand your pattern except for the "is" at the end of it. It doesn't seem to work for me though. Maybe I should try and explain what I'm trying to do.
I'm trying to make a ticker for a client on the search word "fisheries".
<?
echo "<h1>News Search on 'fisheries'</h1>";
// connect to URL and read information
$theurl = "http://www.newsindex.com/cgi-bin/process.cgi?"
."query=fisheries&mode=any";
echo $theurl."<br />";
if (!($fp = fopen($theurl, "r")))
{
echo "Could not open URL";
exit;
}
$contents = fread($fp, 1000000);
fclose($fp);
// find the part of the page we want and output it
$news = explode("<dl>", $contents);
$news = explode("</dl>", $news[1]);
print_r ($news);
// acknowledge source
echo "<br>"
."This information retrieved from <br>"
."<a href=\"$theurl\">$theurl</a><br>"
."on ".(date("l jS F Y g:i a T"));
?>
If you run the code you might get an idea of what I'm trying to do. I managed to chop out the query information from the search page that I fopened, now its stored in $news[0]. The next step I'm trying to do is get only the urls that are between the two characters * and 1, respectively. Each url looks like this in the code:
*
<a href="http://www.usnewswire.com/topnews/qtr2_2003/0507-168.html">168-0507 Defenders of
Wildlife: Congress Widens
Scope of Military Assault on Environmental Protectio</a>
1
Once I can get just the urls I will pass it into a javascript array so I can make a dynamic ticker on the client side.
Terry
missing-score
05-10-2003, 11:58 PM
I think I know why it may not work. It could be that you are using eregi_replace or ereg_replace.
You are better off using preg_replace(). Try the pattern in that, it should work:
preg_replace($pattern, $replace, $string);
http://www.php.net/preg-replace
Mhtml
05-11-2003, 04:08 AM
Yeah, I'm not to sure about ergeg functions and syntax so I'll have to read up on it in five mins....
From what I do know though preg is faster.
i & s are pattern modifiers.
i = Ignore case
s = dot matches all characters
(I don't see your useage of the pattern anywhere in that script)
missing-score
05-11-2003, 09:34 AM
I (think) ereg syntax is basically the same as perl reg syntax, with the exception that perl syntax has the / at the start and and.
Yes, preg is faster.
Mhtml
05-11-2003, 03:12 PM
The delimeters do not have to be /.
mordred
05-12-2003, 01:39 AM
ereg uses POSIX regular expressions which are kind of standardized. MySQL regexes are also POSIX-compliant.
preg, as the name tells, is perl-compatible regular expressions. The syntax and modifiers are much more advanced and versatile than ereg ones, i.e. you can do much more with preg regexes. Also, there are more handy functions like preg_match_all and preg_replace_callback available than for the ereg counterparts.
Perl-like regular expression syntax is also used in JavaScript and Java (>= 1.4), AFAIR.
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.