Go Back   CodingForums.com > :: Server side development > PHP

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 03-19-2010, 01:47 PM   PM User | #1
cosmicsea
Regular Coder

 
Join Date: Jan 2010
Location: Washington
Posts: 223
Thanks: 34
Thanked 0 Times in 0 Posts
cosmicsea is an unknown quantity at this point
Question regex help

hi im trying to get my crawler to work right and need a little help.
Im using this code to extract info such as "10.pr", "10_pr", "10pr", "pr.10", "pr_10", "pr10" and so on etc.

Code:
   preg_match('/(\d{1,3})[._]?(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)|(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(\d{1,3})/', $t, $m);
my problem is the second half of the code
PHP Code:
(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(d{1,3}) 
does not seem to index at all. I used a regex tester and all my examples i put in work correctly but when i use this with the crawler, it seems only the first half of the code works
Code:
(\d{1,3})[._]?(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)
it will index my examples "10.pr", "10_pr", "10pr" perfectly, but when the crawler tries to index "pr.10", "pr_10", "pr10". It does not seem to work. I have tried refomatting this regex several times to make it work so im out of ideas. Is it maybe the | in the middle of the code causing it? is there a way to make sure the whole regex code is read? I used the | in the middle because it was all i could get to work in the regex tester. can somebody help please? thanks.
cosmicsea is offline   Reply With Quote
Old 03-19-2010, 03:10 PM   PM User | #2
MattF
Senior Coder

 
Join Date: Jul 2009
Location: South Yorkshire, England
Posts: 2,322
Thanks: 6
Thanked 304 Times in 303 Posts
MattF will become famous soon enoughMattF will become famous soon enough
Untested. Try:

Code:
preg_match('/((\d{1,3})[._]?(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)|(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(\d{1,3}))/', $t, $m);
MattF is offline   Reply With Quote
Old 03-20-2010, 12:01 AM   PM User | #3
cosmicsea
Regular Coder

 
Join Date: Jan 2010
Location: Washington
Posts: 223
Thanks: 34
Thanked 0 Times in 0 Posts
cosmicsea is an unknown quantity at this point
Quote:
Originally Posted by MattF View Post
Untested. Try:

Code:
preg_match('/((\d{1,3})[._]?(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)|(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(\d{1,3}))/', $t, $m);
nope that didnt seem to do it either.
cosmicsea is offline   Reply With Quote
Old 03-20-2010, 12:09 AM   PM User | #4
MattF
Senior Coder

 
Join Date: Jul 2009
Location: South Yorkshire, England
Posts: 2,322
Thanks: 6
Thanked 304 Times in 303 Posts
MattF will become famous soon enoughMattF will become famous soon enough
Does it match anything of you just use the latter of the two expressions on its own?

Code:
preg_match('/(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[\._]?(\d{1,3})/', $t, $m);
Try changing preg_match to preg_match_all too, if there may be more than one match per time.
MattF is offline   Reply With Quote
Old 03-20-2010, 12:33 AM   PM User | #5
cosmicsea
Regular Coder

 
Join Date: Jan 2010
Location: Washington
Posts: 223
Thanks: 34
Thanked 0 Times in 0 Posts
cosmicsea is an unknown quantity at this point
Quote:
Originally Posted by MattF View Post
Does it match anything of you just use the latter of the two expressions on its own?

Code:
preg_match_all('/(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[\._]?(\d{1,3})/', $t, $m);
Try changing preg_match to preg_match_all too, if there may be more than one match per time.
it does match, im just trying to make it loop through the whole regex and grab "10.pr", and "pr.10" in any combination.

here is what i just tried and it almost works.
PHP Code:
preg_match_all('/((\d{1,3})[._]?(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)|(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(\d{1,3}))/'$t$m); 
The problem is, it is suppost to group similar types and it does but for the "part" table in the db it gives everything a 1 automatically, even if has distinguished multiple types.

if i could just get this to index part numbers correctly it will be working just fine.

here is my whole function.
PHP Code:
function part_search($t)
{
   
preg_match_all('/((\d{1,3})[._]?(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)|(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(\d{1,3}))/'$t$m);
    if (isset(
$m[1]))
    {
        return (int)
$m[1];
    } else {
        return 
0;
    }

is there anything you can think of that might help further?

Last edited by cosmicsea; 03-20-2010 at 12:35 AM.. Reason: typo
cosmicsea is offline   Reply With Quote
Old 03-20-2010, 12:44 AM   PM User | #6
MattF
Senior Coder

 
Join Date: Jul 2009
Location: South Yorkshire, England
Posts: 2,322
Thanks: 6
Thanked 304 Times in 303 Posts
MattF will become famous soon enoughMattF will become famous soon enough
What do you see printed if you change this line:

Code:
return (int)$m[1];
to:

Code:
print_r($m[1]);
?
MattF is offline   Reply With Quote
Old 03-20-2010, 12:47 AM   PM User | #7
cosmicsea
Regular Coder

 
Join Date: Jan 2010
Location: Washington
Posts: 223
Thanks: 34
Thanked 0 Times in 0 Posts
cosmicsea is an unknown quantity at this point
the weird thing is if i take out preg_match_all and make it preg_match it will stop marking everything 1 and make it work correctly, But it does the original problem again and just wants to do the "10.pr" format type and not "pr.10".
cosmicsea is offline   Reply With Quote
Old 03-20-2010, 12:48 AM   PM User | #8
cosmicsea
Regular Coder

 
Join Date: Jan 2010
Location: Washington
Posts: 223
Thanks: 34
Thanked 0 Times in 0 Posts
cosmicsea is an unknown quantity at this point
Quote:
Originally Posted by MattF View Post
What do you see printed if you change this line:

Code:
return (int)$m[1];
to:

Code:
print_r($m[1]);
?
ok hold on ill test it.
cosmicsea is offline   Reply With Quote
Old 03-20-2010, 12:57 AM   PM User | #9
cosmicsea
Regular Coder

 
Join Date: Jan 2010
Location: Washington
Posts: 223
Thanks: 34
Thanked 0 Times in 0 Posts
cosmicsea is an unknown quantity at this point
Quote:
Originally Posted by MattF View Post
What do you see printed if you change this line:

Code:
return (int)$m[1];
to:

Code:
print_r($m[1]);
?
here is what it outputted on a sample test which looks good. its just those 0's are suppost to be the part numbers.
PHP Code:
Array
(
    [
0] => pr.1
)
Array
(
    [
0] => pr.2
)
Array
(
    [
0] => pr.3
)
Array
(
    [
0] => pr.4
)
Array
(
    [
0] => 01.pr
)
Array
(
    [
0] => 02.pr

cosmicsea is offline   Reply With Quote
Old 03-20-2010, 01:06 AM   PM User | #10
MattF
Senior Coder

 
Join Date: Jul 2009
Location: South Yorkshire, England
Posts: 2,322
Thanks: 6
Thanked 304 Times in 303 Posts
MattF will become famous soon enoughMattF will become famous soon enough
You're using preg_match_all now though, hence you're getting a multi-dimensional array rather than a single level array. do print_r($m); and it will show you the full array output. Can't remember offhand which one the relevant parts will be in.
MattF is offline   Reply With Quote
Old 03-20-2010, 01:24 AM   PM User | #11
cosmicsea
Regular Coder

 
Join Date: Jan 2010
Location: Washington
Posts: 223
Thanks: 34
Thanked 0 Times in 0 Posts
cosmicsea is an unknown quantity at this point
Quote:
Originally Posted by MattF View Post
You're using preg_match_all now though, hence you're getting a multi-dimensional array rather than a single level array. do print_r($m); and it will show you the full array output. Can't remember offhand which one the relevant parts will be in.
when i do that with preg_match_all i get

PHP Code:
Array
(
    [
0] => Array
        (
        )

    [
1] => Array
        (
        )

    [
2] => Array
        (
        )

    [
3] => Array
        (
        )

    [
4] => Array
        (
        )

    [
5] => Array
        (
        )

)
and 
when i do a preg_match i get 
PHP Code:
Array
(
    [
0] => pr.2
    
[1] => pr.2
    
[2] => 
    [
3] => 
    [
4] => pr
    
[5] => 2
)
Array
(
    [
0] => 01.pr
    
[1] => 01.pr
    
[2] => 01
    
[3] => pr

cosmicsea is offline   Reply With Quote
Old 03-20-2010, 01:27 AM   PM User | #12
cosmicsea
Regular Coder

 
Join Date: Jan 2010
Location: Washington
Posts: 223
Thanks: 34
Thanked 0 Times in 0 Posts
cosmicsea is an unknown quantity at this point
I just cant understand why i can even get the first bit to work correctly doing a regular preg_match and when i do a preg_match_all it will just index all 1's for the parts.

Last edited by cosmicsea; 03-20-2010 at 01:28 AM.. Reason: typo
cosmicsea is offline   Reply With Quote
Old 03-20-2010, 02:11 AM   PM User | #13
MattF
Senior Coder

 
Join Date: Jul 2009
Location: South Yorkshire, England
Posts: 2,322
Thanks: 6
Thanked 304 Times in 303 Posts
MattF will become famous soon enoughMattF will become famous soon enough
You need to find out which arrays the respective parts are in and then use those arrays. Run this code.

Code:
foreach ($m as $key => $array)
{
    print('Key: '.$key."\n");
    print_r($array);
}
MattF is offline   Reply With Quote
Old 03-20-2010, 02:12 AM   PM User | #14
cosmicsea
Regular Coder

 
Join Date: Jan 2010
Location: Washington
Posts: 223
Thanks: 34
Thanked 0 Times in 0 Posts
cosmicsea is an unknown quantity at this point
Quote:
Originally Posted by MattF View Post
You're using preg_match_all now though, hence you're getting a multi-dimensional array rather than a single level array. do print_r($m); and it will show you the full array output. Can't remember offhand which one the relevant parts will be in.
I think i know what will fix it but i cant figure out the regex correctly. if i do this
PHP Code:
(d{1,3})[._]?(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(d{1,3})? 
it will correctly get "01.pr" but wont get "pr.01" and if i do
PHP Code:
(d{1,3})?[._]?(pr|tr|gr|zr|amc|mp|o|iv|is|ve|sr)[._]?(d{1,3}) 
it will get "pr.01" and not "01.pr". is there a way to make this regex work to make it so it will grab either order of text? im sorry for all the questions, this is just getting frustrating for me.
cosmicsea is offline   Reply With Quote
Old 03-20-2010, 02:13 AM   PM User | #15
cosmicsea
Regular Coder

 
Join Date: Jan 2010
Location: Washington
Posts: 223
Thanks: 34
Thanked 0 Times in 0 Posts
cosmicsea is an unknown quantity at this point
Quote:
Originally Posted by MattF View Post
You need to find out which arrays the respective parts are in and then use those arrays. Run this code.

Code:
foreach ($m as $key => $array)
{
    print('Key: '.$key."\n");
    print_r($array);
}
ok i will try.
cosmicsea is offline   Reply With Quote
Reply

Bookmarks

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 10:32 PM.


Advertisement
Log in to turn off these ads.