Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 7 of 7
  1. #1
    Regular Coder
    Join Date
    Sep 2011
    Posts
    408
    Thanks
    18
    Thanked 26 Times in 26 Posts

    Regular Expression: find matching elements within tags

    I'm trying to create a regular expression to get a certain match within a pair of HTML tags.

    More detailed, I am trying to get a regular expression to get ID selectors for jQuery (in the format of ('this-is-my-ID')), but it can only grab it if it's between <script and </script>, and it needs to check if there's more than one set of script tags.


    SO lets say we have the following code:
    PHP Code:
    <!doctype html>
    <
    html>

    <
    body>
        <
    div id="main">
            <
    p id="my_info" style="display:none;" class="some-class">
                
    Peter Griffin<BR>
                
    123 Spooner St.<BR>
                
    QuahogRhode Island
            
    </p>
        </
    div>
        <
    script type="text/javascript">
        $(
    document).ready(function() {
            $(
    '#main').do_stuff();
        });
        
    </script>
        <script>
        $(function() {
            $('#my_info').fadeIn(1000);
            $(".some-class").do_nothing();
        });
        </script>
    </body>
    </html> 
    I need a regex that can take this entire data and replace main and my_info within the script tags.

    Notice as well that one tag is <script type="text/javascript"> and the other is just <script>, hence why I just said [ICODE<script[/ICODE] earlier.


    The regex I've come up with can find it, however it selects everything, including the script tags.

    I guess the new question now is how can I match a regex within two tags but only replace the searched data rather than everything in it? I'm trying to keep in mind that some of the page source may contain words that look like ID's so I don't want those replaced, even though they should show with &apos; and &quot; rather than the actual characters.
    If I've helped you out, show your appreciation by clicking the "Thanks" link as well as a link below!

    AdFly
    Facebook | Twitter
    Google | YouTube

  • #2
    New Coder
    Join Date
    Jun 2005
    Location
    Blackpool. UK
    Posts
    98
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Use #main to search and #my_info as the replace.
    Chris

    Indifference will be the downfall of mankind, but who cares?

  • #3
    Regular Coder
    Join Date
    Sep 2011
    Posts
    408
    Thanks
    18
    Thanked 26 Times in 26 Posts
    Quote Originally Posted by chrishirst View Post
    Use #main to search and #my_info as the replace.
    Not even close to what I need to do. Well actually the first half of that answer was what I need, the second half butchered it. I know how to do a simple string replacement, in fact I'd be able to do what you said if that's all I needed to do, but that's not what I need.

    I need to perform a preg_replace_callback() on every ID in the jQuery of the page. My goal is to change every ID on an HTML page as well as the selectors in the script tags. I don't need to prepend or append strings t it, it has to change completely, hence why I'm using the callback function. I have the function working as well, that's not the issue. The issue is getting a regex to match this.
    If I've helped you out, show your appreciation by clicking the "Thanks" link as well as a link below!

    AdFly
    Facebook | Twitter
    Google | YouTube

  • #4
    Senior Coder Dormilich's Avatar
    Join Date
    Jan 2010
    Location
    Behind the Wall
    Posts
    3,237
    Thanks
    12
    Thanked 340 Times in 336 Posts
    question: do you need the jQuery to use IDs? I have the feeling that IDs might be too restrictive for your purpose.
    The computer is always right. The computer is always right. The computer is always right. Take it from someone who has programmed for over ten years: not once has the computational mechanism of the machine malfunctioned.
    André Behrens, NY Times Software Developer

  • #5
    New Coder
    Join Date
    Jun 2005
    Location
    Blackpool. UK
    Posts
    98
    Thanks
    0
    Thanked 4 Times in 4 Posts
    Quote Originally Posted by Dubz View Post
    Not even close to what I need to do. Well actually the first half of that answer was what I need, the second half butchered it. I know how to do a simple string replacement, in fact I'd be able to do what you said if that's all I needed to do, but that's not what I need.

    I need to perform a preg_replace_callback() on every ID in the jQuery of the page. My goal is to change every ID on an HTML page as well as the selectors in the script tags. I don't need to prepend or append strings t it, it has to change completely, hence why I'm using the callback function. I have the function working as well, that's not the issue. The issue is getting a regex to match this.
    So not what you actually asked for originally then, as it would work in the example code you provided.

    What you need to be aware of is that regular expressions are 'greedy' be nature, so trying to simply match <script(.*)</script> WILL result in the entire document text from the first '<script' to the last '</script>' being selected.

    You will have to parse the document using "string slicing" and create blocks of data that can modified, then rebuilt into the whole document when complete.
    Chris

    Indifference will be the downfall of mankind, but who cares?

  • #6
    Regular Coder
    Join Date
    Sep 2011
    Posts
    408
    Thanks
    18
    Thanked 26 Times in 26 Posts
    Quote Originally Posted by Dormilich View Post
    question: do you need the jQuery to use IDs? I have the feeling that IDs might be too restrictive for your purpose.
    My goal is to "mask" the ID's so people can't make extensions wot modify content and whatnot, just something I was seeing if I could do and try out.


    Quote Originally Posted by chrishirst View Post
    So not what you actually asked for originally then, as it would work in the example code you provided.

    What you need to be aware of is that regular expressions are 'greedy' be nature, so trying to simply match <script(.*)</script> WILL result in the entire document text from the first '<script' to the last '</script>' being selected.

    You will have to parse the document using "string slicing" and create blocks of data that can modified, then rebuilt into the whole document when complete.
    If you do this: \<script(.*?)\> then it will match any opening script tag. The question mark at the end makes the greedy asterisk lazy and stop as soon as it can, which would be when the next part of the pattern is able to match.

    I found that what I need to do is fist off have the regular expression start and end with script matches. From there, I have to put capture groups within another capturing group, and make the main group repeatable, so that way it will cover every ID in the script. The one I did only grabbed the first.

    Basically, in a more pseudo code way:
    Code:
    Search for opening script tag
    {
        Get any character leading up to the jQuery ID selector
        Capture the ID
        Continue searching for any other characters
    } < Repeat
    Search for end script tag
    I think taking off the second search for random characters, after getting the ID, would fix this issue and would help more. The gears are turning, its just a matter of me shifting out of neutral and finishing the race now.
    If I've helped you out, show your appreciation by clicking the "Thanks" link as well as a link below!

    AdFly
    Facebook | Twitter
    Google | YouTube

  • #7
    New Coder
    Join Date
    Jun 2005
    Location
    Blackpool. UK
    Posts
    98
    Thanks
    0
    Thanked 4 Times in 4 Posts
    If you do this: \<script(.*?)\> then it will match any opening script tag. The question mark at the end makes the greedy asterisk lazy and stop as soon as it can, which would be when the next part of the pattern is able to match.
    Yep, I am well aware of that which is why there is not a single regular expression that will do what you want in a single pass through the document.
    Chris

    Indifference will be the downfall of mankind, but who cares?


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •