Go Back   CodingForums.com > :: Server side development > PHP

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 12-17-2010, 04:19 PM   PM User | #1
MrBiggZ
Regular Coder

 
MrBiggZ's Avatar
 
Join Date: Apr 2005
Location: Indianapolis IN
Posts: 237
Thanks: 34
Thanked 0 Times in 0 Posts
MrBiggZ is an unknown quantity at this point
Smile ISO regex guru for help in multi-line pattern match

Hi!

I'm hoping there is a regex guru in the house. I've been getting pretty good at this on a novice level and now it's time to step up the game a little bit.

From this little section of HTML code below I'd like to pull the addresses out. I tried this:

PHP Code:
#<div style="margin-bottom:.*?>\s(.*>)<br>\s(.*?)\s<br>\s(\d{3}-\d{3}-\d{4}\s<br>#m 
Alas .. my results yield nothing =( I don know that the end of each line in the HTML is a LF not CR/LF. I did try using the s modifier instead of the m. Still no luck! Oh .. I'm using preg_match_all and not just preg_match

I've seen multi-lines done but it was congested with \/.*+ and other thing I couldn't follow it. If you could please help me with this one and break it down for me so I can under stand what is going on. I think once I get this one under my belt and I can follow it I can do others.

Hope you can help! I'll consider if an xmas present! =) Probably the only one I'll be getting this year!

Thanks much in advance!

Code:
			<div id="directions">
				<div class="item">	
						
						<div class="number">1</div> Steak 'N Shake
							<div style="margin-bottom: 20px;">
								3810 W. Washington<br>
								Indianapolis, IN 46241 <br>
								317-241-0483 <br><br>
								
								Hours: <br>
								Dining Room:
								<div>Monday: Anytime</div>
														<div>Tuesday: Anytime</div>
														<div>Wednesday: Anytime</div>
														<div>Thursday: Anytime</div>
														<div>Friday: Anytime</div>
														<div>Saturday: Anytime</div>
														<div>Sunday: Anytime</div>
								<div class="show">Drivethrough: </div>
														<div class="show">Monday: </div>
														<div class="show">Tuesday: </div>
														<div class="show">Wednesday: </div>
														<div class="show">Thursday: </div>
														<div class="show">Friday: </div>
														<div class="show">Saturday: </div>
														<div class="show">Sunday: </div>
								
							</div>
__________________
“No matter how slick the demo is in rehearsal, when you do it in front of a live audience, the probability of a flawless presentation is inversely proportional to the number of people watching, raised to the power of the amount of money involved.” ~ Mark Gibbs
MrBiggZ is offline   Reply With Quote
Old 12-17-2010, 04:50 PM   PM User | #2
timgolding
Senior Coder

 
timgolding's Avatar
 
Join Date: Aug 2006
Location: Southampton
Posts: 1,460
Thanks: 89
Thanked 110 Times in 109 Posts
timgolding is on a distinguished road
DOM would probably be better for dealing with parsing html documents
__________________
You can not say you know how to do something, until you can teach it to someone else.
timgolding is offline   Reply With Quote
Old 12-17-2010, 08:02 PM   PM User | #3
Inigoesdr
Super Moderator


 
Inigoesdr's Avatar
 
Join Date: Mar 2007
Location: Florida, USA
Posts: 3,601
Thanks: 2
Thanked 397 Times in 390 Posts
Inigoesdr is a jewel in the roughInigoesdr is a jewel in the roughInigoesdr is a jewel in the rough
Tim is right, using the DOM is a lot more reliable than regular expressions for parsing HTML. That being said, I believe something like this is what you're looking for:
PHP Code:
$input = <<<END
<div id="directions">
    <div class="item">    
            
            <div class="number">1</div> Steak 'N Shake
                <div style="margin-bottom: 20px;">
                    3810 W. Washington<br>
                    Indianapolis, IN 46241 <br>
                    317-241-0483 <br><br>
                    
                    Hours: <br>
                    Dining Room:
                    <div>Monday: Anytime</div>
                    <div>Tuesday: Anytime</div>
                    <div>Wednesday: Anytime</div>
                    <div>Thursday: Anytime</div>
                    <div>Friday: Anytime</div>
                    <div>Saturday: Anytime</div>
                    <div>Sunday: Anytime</div>
                    <div class="show">Drivethrough: </div>
                    <div class="show">Monday: </div>
                    <div class="show">Tuesday: </div>
                    <div class="show">Wednesday: </div>
                    <div class="show">Thursday: </div>
                    <div class="show">Friday: </div>
                    <div class="show">Saturday: </div>
                    <div class="show">Sunday: </div>
                    
                </div>    <div class="item">    
            
            <div class="number">2</div> Steak 'N Shake2
                <div style="margin-bottom: 20px;">
                    3810 W. Washington2<br>
                    Indianapolis, IN 462412 <br>
                    317-241-04832 <br><br>
END;

$count preg_match_all('#<div\s*class="item">\s*<div[^>]*>[^<]*</div>\s*([^\r\n\t]*?)[\r\n\t]*<div[^>]*>\s*([^\r\n\t]*?)<br[^>]*>[\s\r\n\t]*([^\r\n\t]*?)\s*<br[^>]*>\s*([0-9\-]*)#si'$input$matches);

unset(
$matches[0]);

var_dump($matches); 
That gives output like this:
Code:
array(4) {
  [1]=>
  array(2) {
    [0]=>
    string(14) "Steak 'N Shake"
    [1]=>
    string(15) "Steak 'N Shake2"
  }
  [2]=>
  array(2) {
    [0]=>
    string(18) "3810 W. Washington"
    [1]=>
    string(19) "3810 W. Washington2"
  }
  [3]=>
  array(2) {
    [0]=>
    string(22) "Indianapolis, IN 46241"
    [1]=>
    string(23) "Indianapolis, IN 462412"
  }
  [4]=>
  array(2) {
    [0]=>
    string(12) "317-241-0483"
    [1]=>
    string(13) "317-241-04832"
  }
}
I'm sure there are edge cases that you will have to tweak for, so keep that in mind.
Inigoesdr is offline   Reply With Quote
Old 12-18-2010, 12:50 AM   PM User | #4
Lamped
Super Moderator


 
Join Date: Feb 2009
Location: England
Posts: 539
Thanks: 8
Thanked 63 Times in 54 Posts
Lamped will become famous soon enough
A little tip for you: .* is almost always a bad idea. finding "margin-bottom:.*?>" - "margin-bottom:[^>]*>" leads to less confusion and errors.

I use "The Regex Coach" from http://weitz.de/regex-coach/ for developing and testing complex expressions, I strongly recommend it. The Windows version runs perfectly in Wine too.
__________________
lamped.co.uk :: Design, Development & Hosting
marcgray.co.uk :: Technical blog
Lamped is offline   Reply With Quote
Old 12-18-2010, 06:09 AM   PM User | #5
MrBiggZ
Regular Coder

 
MrBiggZ's Avatar
 
Join Date: Apr 2005
Location: Indianapolis IN
Posts: 237
Thanks: 34
Thanked 0 Times in 0 Posts
MrBiggZ is an unknown quantity at this point
Thanks for your replies!

Ok! This:

Code:
#<div\s*class="item">\s*<div[^>]*>[^<]*</div>\s*([^\r\n\t]*?)[\r\n\t]*<div[^>]*>\s*([^\r\n\t]*?)<br[^>]*>[\s\r\n\t]*([^\r\n\t]*?)\s*<br[^>]*>\s*([0-9\-]*)#si'
Looses me! =(

Tell me if I'm right or not! This <div[^>]*>[^<]*</div> after the <div but not a > 0 or more time then a > and again not a < 0 or more times the a </div>

Now does \s* mean more then one space, tab or line break

I'm confused on this one: <br[^>]*>[\s\r\n\t]*([^\r\n\t]*?)

Dumb question is .. how to you train your brain to think this way? I just haven't found the in's and out's of it yet. If I had a good teacher I'd be better off. I'm semi-noobish so go ahead and beat me up!
__________________
“No matter how slick the demo is in rehearsal, when you do it in front of a live audience, the probability of a flawless presentation is inversely proportional to the number of people watching, raised to the power of the amount of money involved.” ~ Mark Gibbs
MrBiggZ is offline   Reply With Quote
Old 12-18-2010, 08:25 AM   PM User | #6
Inigoesdr
Super Moderator


 
Inigoesdr's Avatar
 
Join Date: Mar 2007
Location: Florida, USA
Posts: 3,601
Thanks: 2
Thanked 397 Times in 390 Posts
Inigoesdr is a jewel in the roughInigoesdr is a jewel in the roughInigoesdr is a jewel in the rough
Quote:
Originally Posted by MrBiggZ View Post
This <div[^>]*>[^<]*</div> after the <div but not a > 0 or more time then a > and again not a < 0 or more times the a </div>
Correct.
Quote:
Originally Posted by MrBiggZ View Post
Now does \s* mean more then one space, tab or line break

I'm confused on this one: <br[^>]*>[\s\r\n\t]*([^\r\n\t]*?)
Yeah, you don't really need \r\n\t in that one. I left them in there because I added the \s last, and... lazy.
Quote:
Originally Posted by MrBiggZ View Post
Dumb question is .. how to you train your brain to think this way? I just haven't found the in's and out's of it yet. If I had a good teacher I'd be better off. I'm semi-noobish so go ahead and beat me up!
You seem to have a better knowledge of regex than most people that ask questions about it. As far as getting your brain to think like that... I'm not sure. The easiest way I have found to develop/test regular expressions is to use Regex Buddy(commercial). I have used The Regex Coach mention earlier previously, and it's a good free solution, but Regex Buddy is the best piece of software I've used for regex thus far.

Screen:
Inigoesdr is offline   Reply With Quote
Users who have thanked Inigoesdr for this post:
MrBiggZ (12-19-2010)
Old 12-19-2010, 04:31 AM   PM User | #7
MrBiggZ
Regular Coder

 
MrBiggZ's Avatar
 
Join Date: Apr 2005
Location: Indianapolis IN
Posts: 237
Thanks: 34
Thanked 0 Times in 0 Posts
MrBiggZ is an unknown quantity at this point
Yes sir I've seen Regex Buddy but my wallet cried at $40 bucks. I guess if I was doing this for a living .. it would be a good investment. But I mainly do this just to keep my mind sharp and that degree that hangs on the wall behind me that has NEVER been used to some use. =((

Cobol programmer by schooling. When I graduated in '88 PC were about were a POS used car costs now. The interwebz didn't even exist in the civilian world yet.

I guess it's going to have to be repetition learning to get this down. *sigh*

Thanks for all your help!
__________________
“No matter how slick the demo is in rehearsal, when you do it in front of a live audience, the probability of a flawless presentation is inversely proportional to the number of people watching, raised to the power of the amount of money involved.” ~ Mark Gibbs
MrBiggZ is offline   Reply With Quote
Old 12-19-2010, 04:43 AM   PM User | #8
low tech
Regular Coder

 
low tech's Avatar
 
Join Date: Dec 2009
Posts: 740
Thanks: 149
Thanked 67 Times in 67 Posts
low tech is on a distinguished road
Hi

http://www.gskinner.com/RegExr/

FREE online learning tool:-)

LT
low tech is offline   Reply With Quote
Users who have thanked low tech for this post:
MrBiggZ (12-20-2010)
Old 12-20-2010, 12:49 AM   PM User | #9
MrBiggZ
Regular Coder

 
MrBiggZ's Avatar
 
Join Date: Apr 2005
Location: Indianapolis IN
Posts: 237
Thanks: 34
Thanked 0 Times in 0 Posts
MrBiggZ is an unknown quantity at this point
Quote:
Originally Posted by low tech View Post
Hi

http://www.gskinner.com/RegExr/

FREE online learning tool:-)

LT
Thanks bud! Haven't ran across that one yet! Once the torture, I mean holidays are over I'll have to apply myself a bit more on it! Consider it bookmarked!
__________________
“No matter how slick the demo is in rehearsal, when you do it in front of a live audience, the probability of a flawless presentation is inversely proportional to the number of people watching, raised to the power of the amount of money involved.” ~ Mark Gibbs
MrBiggZ is offline   Reply With Quote
Reply

Bookmarks

Tags
multi-line regex, pattern search, regex

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 08:13 PM.


Advertisement
Log in to turn off these ads.