Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 3 of 3
  1. #1
    Super Moderator
    Join Date
    May 2002
    Location
    Perth Australia
    Posts
    4,040
    Thanks
    10
    Thanked 92 Times in 90 Posts

    Regex help please

    I am regex challanged , there I said it , that feels better

    <tAblE bgcolor=BLUE BORDER=0 width=100 >

    I am parsing what should be valid xhtml but it aint always so , I can handle most things apart from unquoted attributes 1 attribute I can handle
    ( albeit in an ametuerish way
    preg_replace("|<(.*) (.*)=(.*)>|U" , "<$1 $2=\"$3\">");
    )
    but dont have a clue how to handle multiple instances in the same string , eg with the above the output should be ..

    <table bgcolor="blue" border="0" width="100">

    can this be handled in one regex statement or should I be using preg_replace_callback etc ?



    anyone want to share ?

    also an explanation of how to handle optional bits like the fact that there may or may not be a whitespace before the closing tag <table blah=blah >

    I am getting though it to a point , but with a lot of str_replace()ing and messy stuff which in this case is probably less efficient than a decent bit of regex.
    resistance is...

    MVC is the current buzz in web application architectures. It comes from event-driven desktop application design and doesn't fit into web application design very well. But luckily nobody really knows what MVC means, so we can call our presentation layer separation mechanism MVC and move on. (Rasmus Lerdorf)

  • #2
    Senior Coder
    Join Date
    Jun 2002
    Location
    frankfurt, german banana republic
    Posts
    1,848
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Writing parsers seems to be popular at the moment. I would use preg_replace_callback() to do the hard work for you, as it provides you with more flexibility regarding the replacement part. Here's my suggestion:

    Code:
    $html = '<tAblE bgcolor=BLUE BORDER=0 width=100  > ';
    
    function tidyAttribute($tokens) {
    	$out = '';
    	switch (count($tokens)) {
    		case 2: 
    			$out .= '<' . strtolower($tokens[1]);
    			break;	
    
    		case 4:
    			$out .= " " . strtolower($tokens[2]) . '="';
    			$out .= strtolower($tokens[3]) . '"';
    			break;
    	}
    	return $out;
    }
    
    $html = preg_replace_callback(
    	'/
    	<(\w+)\s			# tagname
    	|(?:(?:\s*)			# starting whitespace 
    	(\S+?)=				# attribute name
    	([^>\s"\\']+)		# attribute value
    	(?:\s*))			# trailing whitespace
    	/x', 
    	"tidyAttribute", 	
    	$html				
    );
    
    print $html;
    De gustibus non est disputandum.

  • #3
    Super Moderator
    Join Date
    May 2002
    Location
    Perth Australia
    Posts
    4,040
    Thanks
    10
    Thanked 92 Times in 90 Posts
    Mordred thats excellent thank you !

    I would love to add I have learnt from the experience and would be able to do it myself tomorrow ....

    I would be fibbing

    (but I am trying) Cheers again.
    resistance is...

    MVC is the current buzz in web application architectures. It comes from event-driven desktop application design and doesn't fit into web application design very well. But luckily nobody really knows what MVC means, so we can call our presentation layer separation mechanism MVC and move on. (Rasmus Lerdorf)


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •