Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 15 of 15
  1. #1
    Senior Coder
    Join Date
    May 2006
    Posts
    1,673
    Thanks
    28
    Thanked 4 Times in 4 Posts

    Finding stuff in a txt file

    Hi,

    I am building my little parser to fill up
    my database records.

    I have some text files in a directory on my server.

    The directory name is "fitness"
    and I have several files in the directory which ı want to process.

    They are a called:
    article1.txt
    article2.txt
    article3.txt
    article4.txt
    article5.txt

    Now each article file has tags in it so that I can extract the text for
    different records.

    e.g:

    #Intro#
    This is the text for the introduction
    #End-Intro#

    #Body1#
    This is the text for the body section number one
    #End-Body1#

    So what is the best way to open the text file called "article1.txt"
    in directory "fitness" and then find and extract the data into variables
    which I can then insert into my database ??

    After processing "article1.txt",
    I want to move on to "article2.txt"

    would that involve a loop where the file name is:
    $filename = "article".$n.".txt"; ???


    Thanks for any suggestions



    .
    Last edited by jeddi; 08-14-2013 at 03:55 PM.
    If you want to attract and keep more clients, then offer great customer support.

    Support-Focus.com. automates the process and gives you a trust seal to place on your website.
    I recommend that you at least take the 30 day free trial.

  • #2
    Senior Coder
    Join Date
    Jun 2008
    Location
    New Jersey
    Posts
    2,530
    Thanks
    45
    Thanked 259 Times in 256 Posts
    So the glob function is great for getting through files.

    As for finding stuff under, I'd probably say regex? In general, this isn't the easiest of tasks.

  • #3
    Senior Coder
    Join Date
    May 2006
    Posts
    1,673
    Thanks
    28
    Thanked 4 Times in 4 Posts
    Thanks for the push in the right direction

    I have not used "glob" before.

    This is the code I have.

    PHP Code:

    IF( isset($_POST['AP_form1']) && $_POST['AP_form1'] == "AP65pC"){
        
        
    $N_directory     $_POST['x_directory'];
        
    $Db_directory    safe_sql($N_directory);    
        
        foreach (
    glob("*.txt") as $filename) {
            
    $article file_get_contents($filename);
            
            
    $start '#Intro#';
            
    $pos1 strpos($article'#Intro#'); 
            
    $pos1 $pos1+7;
            
    $pos2 strpos($article'#End-Intro#'); 
            
    $length $pos2-$pos1;
            
    $intro substr $article $pos1 $length );
            
            
    $sql "INSERT INTO pages (intro) VALUES ('$intro')";
            
    $result mysql_query($sql) or die("could not CREATE PAGE."mysql_error());      
            }

        }  
    // END IF        END OF PROCESS FORM 
    A couple of questions.

    I am inputting the directory in the form and it is a sub-directory of the
    one that the script is running in.

    So how do I do -a "glob" on the passed directory : $Db_directory ?

    Am I using file_get_contents(), strpos() and substr ()
    properly to get my data ?

    I added 7 here $pos1 = $pos1+7; so that I start the position after the
    #Intro# tag. Is that correct ?

    If this is correct, then ...
    I just add in the same thing for the other data elements ??


    Thanks.



    .
    Last edited by jeddi; 08-15-2013 at 10:39 AM.
    If you want to attract and keep more clients, then offer great customer support.

    Support-Focus.com. automates the process and gives you a trust seal to place on your website.
    I recommend that you at least take the 30 day free trial.

  • #4
    Senior Coder
    Join Date
    Jun 2008
    Location
    New Jersey
    Posts
    2,530
    Thanks
    45
    Thanked 259 Times in 256 Posts
    So as for the glob thing, you can pass it a directory, using * as a wildcard. So glob('/var/www/*.txt') will find all text files within /var/www.

    Have you worked with regex? Its perfect for these sort of situations. Rather then all your position finding:

    PHP Code:
    preg_match($article'/#Intro#(.*)#End-Intro#/m'$matches);
    echo 
    $matches[1]; 
    Untested, but should work. You've already done it the hard way: learn regex! Its fantastic in all sorts of places.

  • #5
    Senior Coder
    Join Date
    May 2006
    Posts
    1,673
    Thanks
    28
    Thanked 4 Times in 4 Posts
    Hi

    My script is running in /home/com55/public_html

    So I probably need to use the /home/com55

    If I put my directories in that one as:
    "fitness"
    "sport"

    So I wil have /home/com55/fitness
    article1.txt
    article2.txt
    article3.txt
    article4.txt
    article5.txt


    i.e.
    /home/com55/fitness/article1.txt
    /home/com55/fitness/article2.txt
    /home/com55/fitness/article3.txt
    /home/com55/fitness/article4.txt
    /home/com55/fitness/article5.txt

    Then pass my selected directory
    so that $Db_directory = "fitness";
    ( using the form )


    Then if I go like:
    PHP Code:

    $path 
    "/home/com55/$Db_directory/*.txt";

    foreach (
    glob("$path") as $filename) {
            
    $article file_get_contents($filename);
            
    the regex etc
      

    That should work ???

    Thanks.


    .
    Last edited by jeddi; 08-15-2013 at 03:41 PM.
    If you want to attract and keep more clients, then offer great customer support.

    Support-Focus.com. automates the process and gives you a trust seal to place on your website.
    I recommend that you at least take the 30 day free trial.

  • #6
    Senior Coder
    Join Date
    Jun 2008
    Location
    New Jersey
    Posts
    2,530
    Thanks
    45
    Thanked 259 Times in 256 Posts
    Yah, that looks good to me.

  • #7
    Senior Coder
    Join Date
    May 2006
    Posts
    1,673
    Thanks
    28
    Thanked 4 Times in 4 Posts
    Hi,

    What is the "/m" for in:

    preg_match($article, '/#Intro#(.*)#End-Intro#/m', $matches);

    And as I only need to find the first occurance of the #Intro#
    is this the most efficient way ?

    Thanks.


    .
    If you want to attract and keep more clients, then offer great customer support.

    Support-Focus.com. automates the process and gives you a trust seal to place on your website.
    I recommend that you at least take the 30 day free trial.

  • #8
    Senior Coder
    Join Date
    Jun 2008
    Location
    New Jersey
    Posts
    2,530
    Thanks
    45
    Thanked 259 Times in 256 Posts
    well, the /m isn't what counts. The two slashes contain the regex you're looking for, the last m means multiline.

    And you only need to find the first occurrence? I thought its the only occurrence?

    Either way, regex is your best way here. Otherwise, you're just being messy and leaving plenty of room for math errors.

  • #9
    Senior Coder
    Join Date
    May 2006
    Posts
    1,673
    Thanks
    28
    Thanked 4 Times in 4 Posts
    Hi,

    When using the regex, I get an error:

    This is my code:

    i
    PHP Code:
    (file_exists($filename)) {
       
    $article file_get_contents($filename);
       
    preg_match'/#summary#(.*)##summary##/'$article$matches);
      
    $summary =  $matches[1];  
    }  
    // end if 

    The error I get is :

    Notice: Undefined offset: 1 in /home/com567b/public_html/auto_page.php on line 147 Notice: Undefined offset: 1 in /home/com567b/public_html/auto_page.php on line 150
    I understood that $matches will be
    an array automatically .

    What have I done wrong ?

    Thanks.


    .
    If you want to attract and keep more clients, then offer great customer support.

    Support-Focus.com. automates the process and gives you a trust seal to place on your website.
    I recommend that you at least take the 30 day free trial.

  • #10
    Senior Coder
    Join Date
    Jun 2008
    Location
    New Jersey
    Posts
    2,530
    Thanks
    45
    Thanked 259 Times in 256 Posts
    Because it didn't match anything, because you didn't have the m at the end of the regex. Its looking for that pattern on one line. Like I said above, the m means multiline, otherwise its single line.

  • #11
    Senior Coder
    Join Date
    May 2006
    Posts
    1,673
    Thanks
    28
    Thanked 4 Times in 4 Posts
    OK ,,I had tried it with "m" as well.

    Anyway I tried it again.

    This is my code:

    PHP Code:
    $filename "{$art}.txt";
    $filename $path.$filename;
            
    echo 
    "<br>Processing filename: $filename<br>";

    if (
    file_exists($filename)) {
      
    $article file_get_contents($filename);
        
      
    preg_match'/#summary#(.*)##summary##/m'$article$matches);
      
    $summary =  $matches[1];  
                
      
    preg_match('/#descrip#(.*)##descrip##/m'$article$matches);
      
    $descrip =  $matches[1];     
            
      
    preg_match('/#pullquote#(.*)##pullquote##/m'$article$matches);
     
    $pullquote =  $matches[1];  
                

    This is the error:

    Processing filename: /home/com567b/property/0.txt
    Notice: Undefined offset: 1 in /home/com567b/public_html/auto_page.php on line 147 Notice: Undefined offset: 1 in /home/com567b/public_html/auto_page.php on line 150 Notice: Undefined offset: 1 in /home/com567b/public_html/auto_page.php on line 153 Notice: Undefined offset: 1 in /home/com567b/public_html/auto_page.php on line 156 Notice: Undefined offset: 1 in /home/com567b/public_html/auto_page.php on line 159 Notice: Undefined offset: 1 in /home/com567b/public_html/auto_page.php on line 162

    this is the contents of 0.txt:

    #summary#
    North London $key_wd1 offering Free of cost Wi-fi in addition to the Cable television.
    Short-term Lease ranging from only £ 500 a week. All-inclusive.

    Master bedroom is complete with large en-suite
    bathroom. Second bedroom also has a considerable bathroom opposing. This particular $key_wd2 is within North London and available today upon short-run let from five-hundred each week. Amazon Bk


    There is also usage of Member Only Sauna coupled with Gymnasium.

    Transportation: Hornsey train rail station Five min stroll. Stress-free immediate path to Center along with the West End.

    CURRENTLY AVAILABLE
    Talk to Doug directly on 0787- 941- 0356.
    ##summary##


    #descrip#
    $key_wd1 N8 The following terrific $key_wd2 possesses Free of cost Wi-fi as well as Cable television. Short-term Lease simply just 500 per week.
    ##descrip##

    #pullquote#
    Terrific : Short-term lease Two bedroom Two bathrooms apartment North London incorporation. 100 % free Fitness centre Along with Sauna Area
    ##pullquote##
    So - does the error means that there is no match found ?

    If so - what have I done wrong,
    the tags should be getting found !!

    Thanks for your help.



    .
    If you want to attract and keep more clients, then offer great customer support.

    Support-Focus.com. automates the process and gives you a trust seal to place on your website.
    I recommend that you at least take the 30 day free trial.

  • #12
    Senior Coder
    Join Date
    Jun 2008
    Location
    New Jersey
    Posts
    2,530
    Thanks
    45
    Thanked 259 Times in 256 Posts
    Oh, sorry, I forgot that . doesn't capture new lines. Try :

    Code:
    /#summary#(.*)##summary##/ms

  • #13
    Senior Coder
    Join Date
    May 2006
    Posts
    1,673
    Thanks
    28
    Thanked 4 Times in 4 Posts
    That's good.

    But It fails now because in one of the paragraphs there is
    an apostrophe Château D'Esclans

    That seems to be breaking the Insert query

    Error:
    You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'Esclans Côtes Rose
    My code is :

    PHP Code:
    $sql "INSERT INTO prepages (summary,pullquote,intro,body1,body2,body3,body4)
    VALUES
    ('$summary','$pullquote','$intro','$body1','$body2','$body3','$body4')"

    The problem was in the $intro

    Since there will often be apostrophes in the text
    eg it's or that's. Is it better to use something else in
    the query ?

    What could it be ??

    Maybe ` - not sure if that would work ??

    EDIT -- TRIED THAT AND IT DIDN'T WORK
    Got a different error, like it didn't accept the variable
    in (`$summary`,`$pullquote` ...


    This must be a common problem.
    There must be a "standard" to deal with it ?

    Thanks.

    .
    Last edited by jeddi; 08-21-2013 at 04:19 PM.
    If you want to attract and keep more clients, then offer great customer support.

    Support-Focus.com. automates the process and gives you a trust seal to place on your website.
    I recommend that you at least take the 30 day free trial.

  • #14
    Senior Coder
    Join Date
    May 2006
    Posts
    1,673
    Thanks
    28
    Thanked 4 Times in 4 Posts
    OK

    Just realised,

    I usually INSERT stuff form a form which I sanatize.

    This time I was extracting from a file ... so ı forgot

    Fixed it
    If you want to attract and keep more clients, then offer great customer support.

    Support-Focus.com. automates the process and gives you a trust seal to place on your website.
    I recommend that you at least take the 30 day free trial.

  • #15
    Senior Coder
    Join Date
    Jun 2008
    Location
    New Jersey
    Posts
    2,530
    Thanks
    45
    Thanked 259 Times in 256 Posts
    So different issue, but now would be a good time to learn something like mysqli or PDO. PDO is currently my preference, but either of them make handling complex inserts much easier. Using a handler object like mysqli or PDO, you can insert variables, and they will do (some) of the sanitizing, so you just have to worry about WHAT you're inserting, rather then the details of what it contains.

    But all in all, it works now? The regex particularly?


  •  

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •