...

View Full Version : regex help



cosmicsea
03-19-2010, 01:47 PM
hi im trying to get my crawler to work right and need a little help.
Im using this code to extract info such as "10.pr", "10_pr", "10pr", "pr.10", "pr_10", "pr10" and so on etc.


preg_match('/(\d{1,3})[._]?(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)|(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(\d{1,3})/', $t, $m);

my problem is the second half of the code
(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(\d{1,3})
does not seem to index at all. I used a regex tester and all my examples i put in work correctly but when i use this with the crawler, it seems only the first half of the code works
(\d{1,3})[._]?(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr) it will index my examples "10.pr", "10_pr", "10pr" perfectly, but when the crawler tries to index "pr.10", "pr_10", "pr10". It does not seem to work. I have tried refomatting this regex several times to make it work so im out of ideas. Is it maybe the | in the middle of the code causing it? is there a way to make sure the whole regex code is read? I used the | in the middle because it was all i could get to work in the regex tester. can somebody help please? thanks.

MattF
03-19-2010, 03:10 PM
Untested. Try:



preg_match('/((\d{1,3})[._]?(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)|(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(\d{1,3}))/', $t, $m);

cosmicsea
03-20-2010, 12:01 AM
Untested. Try:



preg_match('/((\d{1,3})[._]?(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)|(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(\d{1,3}))/', $t, $m);


nope that didnt seem to do it either.

MattF
03-20-2010, 12:09 AM
Does it match anything of you just use the latter of the two expressions on its own?



preg_match('/(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[\._]?(\d{1,3})/', $t, $m);


Try changing preg_match to preg_match_all too, if there may be more than one match per time.

cosmicsea
03-20-2010, 12:33 AM
Does it match anything of you just use the latter of the two expressions on its own?



preg_match_all('/(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[\._]?(\d{1,3})/', $t, $m);


Try changing preg_match to preg_match_all too, if there may be more than one match per time.
it does match, im just trying to make it loop through the whole regex and grab "10.pr", and "pr.10" in any combination.

here is what i just tried and it almost works.

preg_match_all('/((\d{1,3})[._]?(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)|(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(\d{1,3}))/', $t, $m);
The problem is, it is suppost to group similar types and it does but for the "part" table in the db it gives everything a 1 automatically, even if has distinguished multiple types.

if i could just get this to index part numbers correctly it will be working just fine.

here is my whole function.

function part_search($t)
{
preg_match_all('/((\d{1,3})[._]?(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)|(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(\d{1,3}))/', $t, $m);
if (isset($m[1]))
{
return (int)$m[1];
} else {
return 0;
}
}

is there anything you can think of that might help further?

MattF
03-20-2010, 12:44 AM
What do you see printed if you change this line:



return (int)$m[1];


to:



print_r($m[1]);


?

cosmicsea
03-20-2010, 12:47 AM
the weird thing is if i take out preg_match_all and make it preg_match it will stop marking everything 1 and make it work correctly, But it does the original problem again and just wants to do the "10.pr" format type and not "pr.10".

cosmicsea
03-20-2010, 12:48 AM
What do you see printed if you change this line:



return (int)$m[1];


to:



print_r($m[1]);


?

ok hold on ill test it.

cosmicsea
03-20-2010, 12:57 AM
What do you see printed if you change this line:



return (int)$m[1];


to:



print_r($m[1]);


?
here is what it outputted on a sample test which looks good. its just those 0's are suppost to be the part numbers.

Array
(
[0] => pr.1
)
Array
(
[0] => pr.2
)
Array
(
[0] => pr.3
)
Array
(
[0] => pr.4
)
Array
(
[0] => 01.pr
)
Array
(
[0] => 02.pr
)

MattF
03-20-2010, 01:06 AM
You're using preg_match_all now though, hence you're getting a multi-dimensional array rather than a single level array. do print_r($m); and it will show you the full array output. Can't remember offhand which one the relevant parts will be in.

cosmicsea
03-20-2010, 01:24 AM
You're using preg_match_all now though, hence you're getting a multi-dimensional array rather than a single level array. do print_r($m); and it will show you the full array output. Can't remember offhand which one the relevant parts will be in.
when i do that with preg_match_all i get


Array
(
[0] => Array
(
)

[1] => Array
(
)

[2] => Array
(
)

[3] => Array
(
)

[4] => Array
(
)

[5] => Array
(
)

)
and when i do a preg_match i get


Array
(
[0] => pr.2
[1] => pr.2
[2] =>
[3] =>
[4] => pr
[5] => 2
)
Array
(
[0] => 01.pr
[1] => 01.pr
[2] => 01
[3] => pr
)

cosmicsea
03-20-2010, 01:27 AM
I just cant understand why i can even get the first bit to work correctly doing a regular preg_match and when i do a preg_match_all it will just index all 1's for the parts.

MattF
03-20-2010, 02:11 AM
You need to find out which arrays the respective parts are in and then use those arrays. Run this code.



foreach ($m as $key => $array)
{
print('Key: '.$key."\n");
print_r($array);
}

cosmicsea
03-20-2010, 02:12 AM
You're using preg_match_all now though, hence you're getting a multi-dimensional array rather than a single level array. do print_r($m); and it will show you the full array output. Can't remember offhand which one the relevant parts will be in.

I think i know what will fix it but i cant figure out the regex correctly. if i do this

(\d{1,3})[._]?(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(\d{1,3})?
it will correctly get "01.pr" but wont get "pr.01" and if i do

(\d{1,3})?[._]?(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(\d{1,3})
it will get "pr.01" and not "01.pr". is there a way to make this regex work to make it so it will grab either order of text? im sorry for all the questions, this is just getting frustrating for me.

cosmicsea
03-20-2010, 02:13 AM
You need to find out which arrays the respective parts are in and then use those arrays. Run this code.



foreach ($m as $key => $array)
{
print('Key: '.$key."\n");
print_r($array);
}


ok i will try.

MattF
03-20-2010, 02:20 AM
this is just getting frustrating for me.

Your code merely needs some adjustments. That's all. No need to worry about it.

cosmicsea
03-20-2010, 02:47 AM
You need to find out which arrays the respective parts are in and then use those arrays. Run this code.



foreach ($m as $key => $array)
{
print('Key: '.$key."\n");
print_r($array);
}

ok here is the output of
function part_search($t)
{
preg_match_all('/((\d{1,3})[._]?(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)|(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(\d{1,3}))/', $t, $m);
foreach ($m as $key => $array)
{
print('Key: '.$key."\n");
print_r($array);
}
}


Key: 0
Array
(
[0] => 01.pr
)
Key: 1
Array
(
[0] => 01.pr
)
Key: 2
Array
(
[0] => 01
)
Key: 3
Array
(
[0] => pr
)
Key: 4
Array
(
[0] =>
)
Key: 5
Array
(
[0] =>
)
Key: 0
Array
(
[0] => pr.01
)
Key: 1
Array
(
[0] => pr.01
)
Key: 2
Array
(
[0] =>
)
Key: 3
Array
(
[0] =>
)
Key: 4
Array
(
[0] => pr
)
Key: 5
Array
(
[0] => 01
)

MattF
03-20-2010, 03:07 AM
Are you wanting to capture the letters too, or just the numbers?

cosmicsea
03-20-2010, 03:12 AM
Are you wanting to capture the letters too, or just the numbers?
i just need to capture numbers such as 1, 2, 3, 4, 5 for the parts of any of the text, pr.01 = 1, pr.02 =2 etc. like i said preg_match will do this first half of code correctly but preg_match_all grabs all but labels every part as 1.

MattF
03-20-2010, 03:40 AM
If all you need are the numbers:



$link = 'pr9 pr.10 10pr 8.pr';
$regex = '#(?:(\d{1,3})[._]?(?:pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr))|(?:(?:pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(\d{1,3}))#';
preg_match_all($regex, $link, $match);
$array = array_merge($match[1], $match[2]);
print_r($array);


They're all combined into a single array. You'll obviously just need to replace the test vars with your own.


Edit: You could probably reduce that regex down to, (although it may create false positives):



$regex = '#(?:(\d{1,3})[._]?)?(?:pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)(?:[._]?(\d{1,3}))?#';

cosmicsea
03-20-2010, 04:11 AM
If all you need are the numbers:



$link = 'pr9 pr.10 10pr 8.pr';
$regex = '#(?:(\d{1,3})[._]?(?:pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr))|(?:(?:pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(\d{1,3}))#';
preg_match_all($regex, $link, $match);
$array = array_merge($match[1], $match[2]);
print_r($array);


They're all combined into a single array. You'll obviously just need to replace the test vars with your own.


Edit: You could probably reduce that regex down to, (although it may create false positives):



$regex = '#(?:(\d{1,3})[._]?)?(?:pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)(?:[._]?(\d{1,3}))?#';

I cant seem to get it to work. I am about to give up on this. Its not a big deal its just something extra i was trying to add to my crawler. Thanks.

MattF
03-20-2010, 04:18 AM
Post your code. What do you mean by 'not working', btw?



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum