Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 5 of 5
  1. #1
    New Coder
    Join Date
    Mar 2006
    Posts
    66
    Thanks
    3
    Thanked 0 Times in 0 Posts

    Stripping Unwanted Characters

    Hi,

    I know there must be a simple way of doing this but I just can't find it. I am developing a system that allows users to upload word docs and download them. The upload script uses file_get_contents() to convert the file into a string and store it in a database for full text searching. The users never actually see the text - its just used to ascertain the relevance of the word doc before they download it.

    The problem is that before and after the main text I'm getting random characters. Any ideas how I can strip these away? Example:

    ╨╧рб▒с����������������>��■  ���������������W����������Y������■   ����V���                                                                                                                                                                                                                                                                                                                                                                                                                                                    ье┴� @ ��°┐�������������6!���bjbj0ж0ж������������������
    Chris Holbrook
    Freelance Designer and Musician
    Freelance Web Designer and Musician: Bristol, UK
    Visit my site: http://www.chrisholbrook.com

  • #2
    UE Antagonizer Fumigator's Avatar
    Join Date
    Dec 2005
    Location
    Utah, USA, Northwestern hemisphere, Earth, Solar System, Milky Way Galaxy, Alpha Quadrant
    Posts
    7,691
    Thanks
    42
    Thanked 637 Times in 625 Posts
    Word documents aren't plain text files so you can't use file_get_contents() unless you are able to reliably parse out the control codes (all those crazy characters you want to strip).

    There are utilities out there that can create Word documents from PHP... you might try googling to see if you can find something that does the reverse.

  • #3
    New Coder
    Join Date
    Mar 2006
    Posts
    66
    Thanks
    3
    Thanked 0 Times in 0 Posts
    Thanks. I'll have a look.

    I was thinking that there might be a generic function for stripping out characters - like trim() but it doesn't seem to work
    Chris Holbrook
    Freelance Designer and Musician
    Freelance Web Designer and Musician: Bristol, UK
    Visit my site: http://www.chrisholbrook.com

  • #4
    Senior Coder rafiki's Avatar
    Join Date
    Aug 2006
    Location
    Floating around somewhere...
    Posts
    2,046
    Thanks
    19
    Thanked 42 Times in 42 Posts
    trim() removes white space
    are you trying to add an image or something without the appropriate tags? thats what was causing it to print them strange chars whilst i was adding captcha

  • #5
    New Coder
    Join Date
    Mar 2006
    Posts
    66
    Thanks
    3
    Thanked 0 Times in 0 Posts
    Hi,

    I using file_get_contents to convert a word doc into text so that I can store in db and do full text search on it.
    Chris Holbrook
    Freelance Designer and Musician
    Freelance Web Designer and Musician: Bristol, UK
    Visit my site: http://www.chrisholbrook.com


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •