Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 11 of 11
  1. #1
    New to the CF scene
    Join Date
    Jun 2005
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Help with preg_replace

    I'm new to this regular expression stuff. I'd like to use preg_replace to eliminate a known multi-line signature from the body of incoming E-mails. Say the body text is in $body, and the sig is this

    ---
    Sig line1
    Sig line2
    Sig line3
    If I could just get rid of that, it would be pretty good. But I also get this kind of junk a lot, since messages are being quoted:

    > ---
    > Sig line1
    > Sig line2
    > Sig line3
    or

    >> ---
    >> Sig line1
    >> Sig line2
    >> Sig line3
    so I thought I'd be smart and tried:

    Code:
    $body = preg_replace("/---.*?Sig line1.*?Sig line2.*?Sig line3/","",$body);
    but this erased the entire message somehow. So I thought I'd go back to basics and tried:

    Code:
    $body = preg_replace("---","",$body);
    $body = preg_replace("Sig line1","",$body);
    $body = preg_replace("Sig line2","",$body);
    $body = preg_replace("Sig line3","",$body);
    but this erased everything too.

    I'm kinda stumped. Why are these erasing the entire message? And what's the actual smart way to erase this signature when it can have any amount of white space and >'s between lines?

    Thanks for any help.

  • #2
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,978
    Thanks
    4
    Thanked 2,659 Times in 2,628 Posts
    Uh, what exactly are you attempting to remove from that?
    It sounds like you wanted to get rid of it all, but based on your examples this is not the case.
    Also, does it always contain the words sigline in it?

  • #3
    New to the CF scene
    Join Date
    Jun 2005
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Fou-Lu
    Uh, what exactly are you attempting to remove from that?
    It sounds like you wanted to get rid of it all, but based on your examples this is not the case.
    Also, does it always contain the words sigline in it?
    Oh, sorry, it's a signature in the body of an E-mail from a mailing list. The actual content of the mail is always different, but the signature is at the end of every mail. Since people hit reply on their E-mail programs, the signatures end up all over the place. I'm writing a thing to grab the messages and archive the actual content.

    For example:

    Sounds good Bob, I'll see you Friday.

    Fred

    >> Bob,
    >>
    >> Can we meet up some time next week?
    >>
    >> --
    >> Sig1
    >> Sig2
    >
    > Fred,
    >
    > Sure, Friday would work best for me.
    >
    > Bob
    >
    > ---
    > Sig1
    > Sig2

    ---
    Sig1
    Sig2
    It doesn't need to be perfect, but if I could make that be something like:

    Sounds good Bob, I'll see you Friday.

    Fred

    >> Bob,
    >>
    >> Can we meet up some time next week?
    >>
    >
    > Fred,
    >
    > Sure, Friday would work best for me.
    >
    > Bob
    >
    That's the idea.

    Thanks.

  • #4
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,978
    Thanks
    4
    Thanked 2,659 Times in 2,628 Posts
    Hmm, this won't be a simple task for preg techniques, though it can be done. I'll take a look at what I can do for you tonight.
    Honestly, I would think there would be an easier way to do this. I mean, what we do know is that each reply generates another set of > symbols, and the signature is always started with at least on hyphen.
    So its possible to get the data since it steps down from the > to >-- each time.
    It may be easier to force something of the sorts into an xml based document and extract it from there. I'll test with a few different methods and let you know what I think would be the best way to do it.

  • #5
    New to the CF scene
    Join Date
    Jun 2005
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks for the help. Don't rely on the > symbols too much though. I put those in the example, since they are the most common, but some E-mail programs use other symbols, like "|", or just indents.

    If I could deal with the > symbols, that would cover most cases and I'd be pretty happy, but don't rely on a totally predictable pattern every time.

    I guess the ideal solution would be where it looks for the signature lines separated by any amount of white space plus any number of reply characters that could be specified (I'd start with >, and |). Beggars can't be choosers of course.

  • #6
    Senior Coder
    Join Date
    Aug 2003
    Location
    One step ahead of you.
    Posts
    2,815
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Simple but works:
    Code:
    Regex: /[>]* ?---.*?Sig line1.*?Sig line2.*?Sig line3/ims
    Tested with:
    Code:
    some text
    >> ---
    >> Sig line1
    >> Sig line2
    >> Sig line3 
    more text
    > ---
    > Sig line1
    > Sig line2
    > Sig line3  
    yet more text
    ---
    Sig line1
    Sig line2
    Sig line3
    I'm not sure if this was any help, but I hope it didn't make you stupider.

    Experience is something you get just after you really need it.
    PHP Installation Guide Feedback welcome.

  • #7
    New to the CF scene
    Join Date
    Jun 2005
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks. Would you mind explaining how it works since I'm still learning? The part I need help with is the
    Code:
    [>]* ?
    at the beginning.

    Code:
    /[>]* ?---.*?Sig line1.*?Sig line2.*?Sig line3/ims

  • #8
    Senior Coder
    Join Date
    Aug 2003
    Location
    One step ahead of you.
    Posts
    2,815
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Match the > sign any number of times then a space once or 0 times...
    I'm not sure if this was any help, but I hope it didn't make you stupider.

    Experience is something you get just after you really need it.
    PHP Installation Guide Feedback welcome.

  • #9
    New to the CF scene
    Join Date
    Jun 2005
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Hey, I tried this out and it's not quite working for me. Here's what happened I think. The text was like this:

    Code:
    Some text
    
    > more text
    > ---- Original Message ----
    > From: blah blah
    > To: Blah blah
    >
    >> More text
    >> More text
    >> More text
    >> More text
    >> ---
    >> Sig1
    >> Sig2
    >> Sig3
    >
    > ---
    > Sig1
    > Sig2
    > Sig3
    The output:
    Code:
    Some text
    
    > more text
    > -
    I think it spots the "----" around "original message" and then wildcards everything until the last "Sig1". Any smart way around this?

  • #10
    Senior Coder
    Join Date
    Aug 2003
    Location
    One step ahead of you.
    Posts
    2,815
    Thanks
    0
    Thanked 3 Times in 3 Posts
    Try this regex:
    Code:
    /[>]* ?---[^a-z]*?Sig1.*?Sig2.*?Sig3/mis
    I'm not sure if this was any help, but I hope it didn't make you stupider.

    Experience is something you get just after you really need it.
    PHP Installation Guide Feedback welcome.

  • #11
    New to the CF scene
    Join Date
    Jun 2005
    Posts
    6
    Thanks
    0
    Thanked 0 Times in 0 Posts
    This one seems to be working. Does it make sense?

    Code:
    preg_replace("/^[> ]*---[> \n]*Sig1[> \n]*Sig2[> \n]Sig3[ \n]*$/ims","",$body);
    I think it says start and end at the beginning of a line; have as many >'s or spaces at the start as you want, have as many >'s, spaces or newlines in between lines as you want, and have as many spaces and newlines as you want before stopping. This prevents any good text from getting sucked in between the Sig lines. And I ought to be able to add in |'s and anything else that comes up later.


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •