Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 9 of 9
  1. #1
    Regular Coder MrBiggZ's Avatar
    Join Date
    Apr 2005
    Location
    Indianapolis IN
    Posts
    275
    Thanks
    39
    Thanked 0 Times in 0 Posts

    Smile ISO regex guru for help in multi-line pattern match

    Hi!

    I'm hoping there is a regex guru in the house. I've been getting pretty good at this on a novice level and now it's time to step up the game a little bit.

    From this little section of HTML code below I'd like to pull the addresses out. I tried this:

    PHP Code:
    #<div style="margin-bottom:.*?>\s(.*>)<br>\s(.*?)\s<br>\s(\d{3}-\d{3}-\d{4}\s<br>#m 
    Alas .. my results yield nothing =( I don know that the end of each line in the HTML is a LF not CR/LF. I did try using the s modifier instead of the m. Still no luck! Oh .. I'm using preg_match_all and not just preg_match

    I've seen multi-lines done but it was congested with \/.*+ and other thing I couldn't follow it. If you could please help me with this one and break it down for me so I can under stand what is going on. I think once I get this one under my belt and I can follow it I can do others.

    Hope you can help! I'll consider if an xmas present! =) Probably the only one I'll be getting this year!

    Thanks much in advance!

    Code:
    			<div id="directions">
    				<div class="item">	
    						
    						<div class="number">1</div> Steak 'N Shake
    							<div style="margin-bottom: 20px;">
    								3810 W. Washington<br>
    								Indianapolis, IN 46241 <br>
    								317-241-0483 <br><br>
    								
    								Hours: <br>
    								Dining Room:
    								<div>Monday: Anytime</div>
    														<div>Tuesday: Anytime</div>
    														<div>Wednesday: Anytime</div>
    														<div>Thursday: Anytime</div>
    														<div>Friday: Anytime</div>
    														<div>Saturday: Anytime</div>
    														<div>Sunday: Anytime</div>
    								<div class="show">Drivethrough: </div>
    														<div class="show">Monday: </div>
    														<div class="show">Tuesday: </div>
    														<div class="show">Wednesday: </div>
    														<div class="show">Thursday: </div>
    														<div class="show">Friday: </div>
    														<div class="show">Saturday: </div>
    														<div class="show">Sunday: </div>
    								
    							</div>
    “No matter how slick the demo is in rehearsal, when you do it in front of a live audience, the probability of a flawless presentation is inversely proportional to the number of people watching, raised to the power of the amount of money involved.” ~ Mark Gibbs

  • #2
    Senior Coder timgolding's Avatar
    Join Date
    Aug 2006
    Location
    Southampton
    Posts
    1,519
    Thanks
    114
    Thanked 110 Times in 109 Posts
    DOM would probably be better for dealing with parsing html documents
    You can not say you know how to do something, until you can teach it to someone else.

  • #3
    Super Moderator Inigoesdr's Avatar
    Join Date
    Mar 2007
    Location
    Florida, USA
    Posts
    3,642
    Thanks
    2
    Thanked 405 Times in 397 Posts
    Tim is right, using the DOM is a lot more reliable than regular expressions for parsing HTML. That being said, I believe something like this is what you're looking for:
    PHP Code:
    $input = <<<END
    <div id="directions">
        <div class="item">    
                
                <div class="number">1</div> Steak 'N Shake
                    <div style="margin-bottom: 20px;">
                        3810 W. Washington<br>
                        Indianapolis, IN 46241 <br>
                        317-241-0483 <br><br>
                        
                        Hours: <br>
                        Dining Room:
                        <div>Monday: Anytime</div>
                        <div>Tuesday: Anytime</div>
                        <div>Wednesday: Anytime</div>
                        <div>Thursday: Anytime</div>
                        <div>Friday: Anytime</div>
                        <div>Saturday: Anytime</div>
                        <div>Sunday: Anytime</div>
                        <div class="show">Drivethrough: </div>
                        <div class="show">Monday: </div>
                        <div class="show">Tuesday: </div>
                        <div class="show">Wednesday: </div>
                        <div class="show">Thursday: </div>
                        <div class="show">Friday: </div>
                        <div class="show">Saturday: </div>
                        <div class="show">Sunday: </div>
                        
                    </div>    <div class="item">    
                
                <div class="number">2</div> Steak 'N Shake2
                    <div style="margin-bottom: 20px;">
                        3810 W. Washington2<br>
                        Indianapolis, IN 462412 <br>
                        317-241-04832 <br><br>
    END;

    $count preg_match_all('#<div\s*class="item">\s*<div[^>]*>[^<]*</div>\s*([^\r\n\t]*?)[\r\n\t]*<div[^>]*>\s*([^\r\n\t]*?)<br[^>]*>[\s\r\n\t]*([^\r\n\t]*?)\s*<br[^>]*>\s*([0-9\-]*)#si'$input$matches);

    unset(
    $matches[0]);

    var_dump($matches); 
    That gives output like this:
    Code:
    array(4) {
      [1]=>
      array(2) {
        [0]=>
        string(14) "Steak 'N Shake"
        [1]=>
        string(15) "Steak 'N Shake2"
      }
      [2]=>
      array(2) {
        [0]=>
        string(18) "3810 W. Washington"
        [1]=>
        string(19) "3810 W. Washington2"
      }
      [3]=>
      array(2) {
        [0]=>
        string(22) "Indianapolis, IN 46241"
        [1]=>
        string(23) "Indianapolis, IN 462412"
      }
      [4]=>
      array(2) {
        [0]=>
        string(12) "317-241-0483"
        [1]=>
        string(13) "317-241-04832"
      }
    }
    I'm sure there are edge cases that you will have to tweak for, so keep that in mind.

  • #4
    Super Moderator
    Join Date
    Feb 2009
    Location
    England
    Posts
    539
    Thanks
    8
    Thanked 63 Times in 54 Posts
    A little tip for you: .* is almost always a bad idea. finding "margin-bottom:.*?>" - "margin-bottom:[^>]*>" leads to less confusion and errors.

    I use "The Regex Coach" from http://weitz.de/regex-coach/ for developing and testing complex expressions, I strongly recommend it. The Windows version runs perfectly in Wine too.
    lamped.co.uk :: Design, Development & Hosting
    marcgray.co.uk :: Technical blog

  • #5
    Regular Coder MrBiggZ's Avatar
    Join Date
    Apr 2005
    Location
    Indianapolis IN
    Posts
    275
    Thanks
    39
    Thanked 0 Times in 0 Posts
    Thanks for your replies!

    Ok! This:

    Code:
    #<div\s*class="item">\s*<div[^>]*>[^<]*</div>\s*([^\r\n\t]*?)[\r\n\t]*<div[^>]*>\s*([^\r\n\t]*?)<br[^>]*>[\s\r\n\t]*([^\r\n\t]*?)\s*<br[^>]*>\s*([0-9\-]*)#si'
    Looses me! =(

    Tell me if I'm right or not! This <div[^>]*>[^<]*</div> after the <div but not a > 0 or more time then a > and again not a < 0 or more times the a </div>

    Now does \s* mean more then one space, tab or line break

    I'm confused on this one: <br[^>]*>[\s\r\n\t]*([^\r\n\t]*?)

    Dumb question is .. how to you train your brain to think this way? I just haven't found the in's and out's of it yet. If I had a good teacher I'd be better off. I'm semi-noobish so go ahead and beat me up!
    “No matter how slick the demo is in rehearsal, when you do it in front of a live audience, the probability of a flawless presentation is inversely proportional to the number of people watching, raised to the power of the amount of money involved.” ~ Mark Gibbs

  • #6
    Super Moderator Inigoesdr's Avatar
    Join Date
    Mar 2007
    Location
    Florida, USA
    Posts
    3,642
    Thanks
    2
    Thanked 405 Times in 397 Posts
    Quote Originally Posted by MrBiggZ View Post
    This <div[^>]*>[^<]*</div> after the <div but not a > 0 or more time then a > and again not a < 0 or more times the a </div>
    Correct.
    Quote Originally Posted by MrBiggZ View Post
    Now does \s* mean more then one space, tab or line break

    I'm confused on this one: <br[^>]*>[\s\r\n\t]*([^\r\n\t]*?)
    Yeah, you don't really need \r\n\t in that one. I left them in there because I added the \s last, and... lazy.
    Quote Originally Posted by MrBiggZ View Post
    Dumb question is .. how to you train your brain to think this way? I just haven't found the in's and out's of it yet. If I had a good teacher I'd be better off. I'm semi-noobish so go ahead and beat me up!
    You seem to have a better knowledge of regex than most people that ask questions about it. As far as getting your brain to think like that... I'm not sure. The easiest way I have found to develop/test regular expressions is to use Regex Buddy(commercial). I have used The Regex Coach mention earlier previously, and it's a good free solution, but Regex Buddy is the best piece of software I've used for regex thus far.

    Screen:

  • Users who have thanked Inigoesdr for this post:

    MrBiggZ (12-19-2010)

  • #7
    Regular Coder MrBiggZ's Avatar
    Join Date
    Apr 2005
    Location
    Indianapolis IN
    Posts
    275
    Thanks
    39
    Thanked 0 Times in 0 Posts
    Yes sir I've seen Regex Buddy but my wallet cried at $40 bucks. I guess if I was doing this for a living .. it would be a good investment. But I mainly do this just to keep my mind sharp and that degree that hangs on the wall behind me that has NEVER been used to some use. =((

    Cobol programmer by schooling. When I graduated in '88 PC were about were a POS used car costs now. The interwebz didn't even exist in the civilian world yet.

    I guess it's going to have to be repetition learning to get this down. *sigh*

    Thanks for all your help!
    “No matter how slick the demo is in rehearsal, when you do it in front of a live audience, the probability of a flawless presentation is inversely proportional to the number of people watching, raised to the power of the amount of money involved.” ~ Mark Gibbs

  • #8
    Regular Coder low tech's Avatar
    Join Date
    Dec 2009
    Posts
    851
    Thanks
    172
    Thanked 93 Times in 93 Posts
    Hi

    http://www.gskinner.com/RegExr/

    FREE online learning tool:-)

    LT

  • Users who have thanked low tech for this post:

    MrBiggZ (12-20-2010)

  • #9
    Regular Coder MrBiggZ's Avatar
    Join Date
    Apr 2005
    Location
    Indianapolis IN
    Posts
    275
    Thanks
    39
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by low tech View Post
    Hi

    http://www.gskinner.com/RegExr/

    FREE online learning tool:-)

    LT
    Thanks bud! Haven't ran across that one yet! Once the torture, I mean holidays are over I'll have to apply myself a bit more on it! Consider it bookmarked!
    “No matter how slick the demo is in rehearsal, when you do it in front of a live audience, the probability of a flawless presentation is inversely proportional to the number of people watching, raised to the power of the amount of money involved.” ~ Mark Gibbs


  •  

    Tags for this Thread

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •