...

View Full Version : Regex help!



CyberPirate
08-13-2012, 01:34 PM
Hello!

I'm currently practicing with using PHP DOMDocument and I've come across an issue that I just can't solve on my own :D

I've managed to parse some data from Wikipedia's special page, but I've got a lot of characters around my data which I really don't need.

This is a sample of the output I'm getting:
*''[[10 Things I Hate About You (TV series)|10 Things I Hate About You]]''

My desired result:
10 Things I Hate About You (TV series)

If anyone could help me out with a regular expression as to how to get such a result and also explain the regex pattern to me I would be forever grateful!

Sincerely
Cyb :p

rerryn
08-13-2012, 02:17 PM
Hi CyberPirate,
the regex pattern I'd suggest is \*\'\'\[\[(.*?)\|(.*?)\]\]\'\'


$string = "*''[[10 Things I Hate About You (TV series)|10 Things I Hate About You]]''";
preg_match('/\*\'\'\[\[(.*?)\|(.*?)\]\]\'\'/',$string,$m);
print_r($m);

It's probably not the best solution, but it works.

Assuming all of your parsed output is in a similar format, then it will always be something like *''[[something here|something here]]'' . So, as *, [ and ] are used by the regex engine for other things you need to escape them. I find it better to escape quotes as well just to be safe (as you can see in the example, I used single quotes to enclose the regex so in the example not escaping the quotes would cause a syntax error).

Next up is the (.*?) what this does it match anything (.) infinite times (*) until it hits |, which is because of the ungreedy operator (?). You could put .*? on its own without brackets, but then the result wouldn't go to preg_match. Preg_match returns the whole match in its output (in this case $m)'s index 0, and then each subsequent bracket in its own index.

So the output from the print_r is :

Array
(
[0] => *''[[10 Things I Hate About You (TV series)|10 Things I Hate About You]]''
[1] => 10 Things I Hate About You (TV series)
[2] => 10 Things I Hate About You
)


Which would leave the part you want in [1], but give you the option to use the other entry if you wanted to.

Hope this helps.

CyberPirate
08-14-2012, 01:44 PM
Worked like a charm!

Thanks :) +1

safeservicejt
08-16-2012, 04:21 AM
<?php
$text="*''[[10 Things I Hate About You (TV series)|10 Things I Hate About You]]''";

$text=preg_match('/\[\[(.*)\|.*\]\]/i',$text,$matches);

print_r($matches[1]);
?>



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum