View Full Version : Parseing text into catagories

10-24-2011, 04:39 PM
I have 1000 of these kind of data,

Alpha Psi Beta
Type of Organization: Honor
President: xxxxxxxxxxxxx
Advisor: xxxxxxxxxxxxxxxxxxxxxxx
Requirements for Membership: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Meeting Information: Every other Tuesday at 5:00PM in ######
Amount of Dues: $30 one time National fee; $5 semester fee
Description of Organization: Alpha Psi Beta is an honor society which recognized college-level theatre. We run workshops, fundraisers, field trips, and events that revolve around theatre.
Website: www.alphapsiomega.org

I need to break these into varibles so that i can write into database, what kind of reg ex can i use, to get the whole text at one like type of organisation or meeting information that are long of a exact word.

any help will be appreciated.Need to do this soon .

10-24-2011, 04:49 PM
Are each one of the 1000 a file?
For example, the one you showed us is xyz.txt (a text .txt file)?
And are they all stored in the same directory?

... and is the database you're transferring them to MySQL?


10-24-2011, 05:05 PM
yes the are in the file for now but i need to create a textfield where epople will copy and paste these type of data and i have to separate them and write them to the database.

I tried using space to delimit and explode and then search through the array with a comparestring and change compare string incase of match but it does not work properly as you have linefeed and long match. My ugly code looks like this this will just get a submitted textfield. and there is a comparestring functing. But i need to do this simpler probably using pregmatch or something.



$text=explode(" ",$text_Received);
for ($i=0;$i<count($text);$i++){

echo "reached end";

//echo " ".$text[$i];
addTo($text[$i]+" ");

echo $GLOBALS['compareString'];
//echo $text[$i];
}else if($text[$i]==$compareString){
//echo $text[$i+2];




just need the field in array. Thank you so much for looking in.

10-24-2011, 05:08 PM
Ya and the database that i am using is mysql, i have another php that can write into mydatabase and that is working for any variables that i send using post. Am stuck in separating these field I am very bad with regex.

10-24-2011, 05:22 PM
Is every 'record' 9 lines in size, and do they all contain the same potential categories and in the same order? Any separation between the records?
Aside from the first line, every other line follows the same format of Category : Data. You don't need to use a regex on here at all if the records are all consistent.

10-24-2011, 05:28 PM
Here's basically what you can do .... as proof of concept (I never tested it though).

Save your paragraph as a text file .. call it "alpha_psi_beta.txt" ...
Then, run this test script (call it "test.php") and see what it does ...


// This puts the file into an array ... each line is one array element.

// Now loop through and explode each line:
foreach($data as $line){

// Let's first change the proper : character to something not normally used, like pipe |
// This is important because you sometimes use : for time (eg. 5:00)
// You know that each line has a : followed by a space ... so we'll look for those.
$string=str_replace(": ","|",$line);

// Now separate the string by the pipe |

// Only pick the lines with 2 parts (because the first line doesn't have : in it).
// You now have 2 parts ... let's show them. At this point you would be adding the
// 2nd part to your database ...
echo "Part 1: ".$parts[0]." <br />";
echo "Part 2: ".$parts[1]." >br /><br />";



10-24-2011, 05:38 PM
ya nearly all data are consistent but some have addition website and contact information after the description so i want to have the last variable with all the text after the Description of organisation. And the text of requiremnts and meeting information can be long with two or more lines so i cannot separate them by just finding the line feed.

This is the result i was trying to achieve
[NAme] => Array
[0] => Alpha Psi Beta
[2] => xxxxxxxxxxxxx
[3] => xxxxxxxxxxxxxxxxxxxxxxx
[5]=> Every other Tuesday at 5:00PM in ######
[6]=>$30 one time National fee; $5 semester fee
[7]=> Alpha Psi Beta is an honor society which recognized college-level theatre. We run workshops, fundraisers, field trips, and events that revolve around theatre. Website: www.alphapsiomega.org

10-24-2011, 07:05 PM
thank you mlseim, the stuff works great and serves the purpose.

10-24-2011, 08:54 PM
An FYI ...
If all of your paragraphs of data (or whatever you call them) are each
saved as .txt files, you can have PHP scan a directory and automatically
open each .txt file one at a time, process it, and then move on to the next one.

If your paragraphs are clumped together in one file, you'll need to have a way to
find the separation between them. If the first line always has some text without
a colon : ... that might be the way to separate them.