...

View Full Version : regex preg_match_all question



gilgalbiblewhee
10-15-2009, 05:07 PM
I was working on a method of scraping my own file into another page:

$contents_of_page = file_get_contents('bible.html');

function getTextBetweenTags($string, $tagname) {
$pattern = "/<$tagname ?.*>(.*)<\/$tagname>/";
preg_match_all($pattern, $string, $matches);
return $matches[1];
}

$str = '<table><tr><td>1</td><td>1</td><td>1</td><td>gn</td><td>Genesis</td><td>1</td><td>1</td><td>1</td><td>1</td><td>In the beginning God created the heaven and the earth.</td></tr></table>';
$txt = getTextBetweenTags($str, "td");
print_r($txt);

$txt brings only In the beginning God created the heaven and the earth.

What I want is to replace:

$txt = '<table><tr><td>1</td><td>1</td><td>1</td><td>gn</td><td>Genesis</td><td>1</td><td>1</td><td>1</td><td>1</td><td>In the beginning God created the heaven and the earth.</td></tr></table>';
into

$txt = '<table><tr><td class="book">1</td><td class="chapter">1</td><td class="verse">1</td><td class="recordType">gn</td><td class="book_title">Genesis</td><td class="book_spoke">1</td><td class="chapter_spoke">1</td><td class="verse_spoke">1</td><td class="something_else">1</td><td>In the beginning God created the heaven and the earth.</td></tr></table>';

gilgalbiblewhee
10-16-2009, 12:16 AM
I was working on it and got this so far:

$string = "<table><tr><td>1</td><td>1</td><td>1</td><td>gn</td><td>Genesis</td><td>1</td><td>1</td><td>1</td><td>1</td><td>In the beginning God created the heaven and the earth.</td></tr></table>";
$patterns[0] = "/<td/";
$names = Array("book", "chapter", "verse", "recordType", "book_title", "book_spoke", "chapter_spoke", "verse_spoke", "something_else", "text_data");

for($repNames=0; $repNames<count($names); $repNames++){
$replacements[$repNames] = "<td class=\"".$names[$repNames]."\"";

}
echo preg_replace($patterns, $replacements, $string)."\n";


One thing I don't understand is why are all the classes book?

<table>
<tr>
<td class="book">1</td>
<td class="book">1</td>
<td class="book">1</td>
<td class="book">gn</td>
<td class="book">Genesis</td>
<td class="book">1</td>
<td class="book">1</td>
<td class="book">1</td>
<td class="book">1</td>
<td class="book">In the beginning God created the heaven and the earth.</td>
</tr>
</table>

There seems something wrong with the for loop. It's only reading the first in the array.

oesxyl
10-16-2009, 04:20 AM
I was working on it and got this so far:

$string = "<table><tr><td>1</td><td>1</td><td>1</td><td>gn</td><td>Genesis</td><td>1</td><td>1</td><td>1</td><td>1</td><td>In the beginning God created the heaven and the earth.</td></tr></table>";
$patterns[0] = "/<td/";
$names = Array("book", "chapter", "verse", "recordType", "book_title", "book_spoke", "chapter_spoke", "verse_spoke", "something_else", "text_data");

for($repNames=0; $repNames<count($names); $repNames++){
$replacements[$repNames] = "<td class=\"".$names[$repNames]."\"";

}
echo preg_replace($patterns, $replacements, $string)."\n";


One thing I don't understand is why are all the classes book?

<table>
<tr>
<td class="book">1</td>
<td class="book">1</td>
<td class="book">1</td>
<td class="book">gn</td>
<td class="book">Genesis</td>
<td class="book">1</td>
<td class="book">1</td>
<td class="book">1</td>
<td class="book">1</td>
<td class="book">In the beginning God created the heaven and the earth.</td>
</tr>
</table>

There seems something wrong with the for loop. It's only reading the first in the array.
replace this:


for($repNames=0; $repNames<count($names); $repNames++){
$replacements[$repNames] = "<td class=\"".$names[$repNames]."\"";

}

with this:


foreach($names as $repNames => $name){
$patterns[] = $patterns[0];
$replacements[] = '<td class="'.$name.'"';
}

the $patterns and $replacements arrays must have the same size. The rest, using foreach instead of for and single quote instead of double are just to make the code easy to read.

best regards

gilgalbiblewhee
10-16-2009, 10:46 AM
replace this:


for($repNames=0; $repNames<count($names); $repNames++){
$replacements[$repNames] = "<td class=\"".$names[$repNames]."\"";

}

with this:


foreach($names as $repNames => $name){
$patterns[] = $patterns[0];
$replacements[] = '<td class="'.$name.'"';
}

the $patterns and $replacements arrays must have the same size. The rest, using foreach instead of for and single quote instead of double are just to make the code easy to read.

best regards


$string = "<table><tr><td>1</td><td>1</td><td>1</td><td>gn</td><td>Genesis</td><td>1</td><td>1</td><td>1</td><td>1</td><td>In the beginning God created the heaven and the earth.</td></tr></table>";
$patterns[0] = "/<td/";
$names = Array("book", "chapter", "verse", "recordType", "book_title", "book_spoke", "chapter_spoke", "verse_spoke", "something_else", "text_data");

foreach($names as $repNames => $name){
$patterns[] = $patterns[0];
$replacements[] = '<td class="'.$name.'"';
}
echo preg_replace($patterns, $replacements[$repNames], $string)."\n";
I got this result:

<table>
<tr>
<td class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data">1</td>

<td class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data">1</td>

<td class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data">1</td>

<td class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data">gn</td>

<td class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data">Genesis</td>

<td class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data">1</td>

<td class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data">1</td>

<td class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data">1</td>

<td class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data">1</td>

<td class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data" class="text_data">In the beginning God created the heaven and the earth.</td></tr>
</table>

gilgalbiblewhee
10-16-2009, 10:49 PM
I'll answer my own problem:

$string = "<table><tr><td>1</td><td>1</td><td>1</td><td>gn</td><td>Genesis</td><td>1</td><td>1</td><td>1</td><td>1</td><td>In the beginning God created the heaven and the earth.</td></tr></table>";
$patterns[0] = "/<td>/";
$names = Array("book", "chapter", "verse", "recordType", "book_title", "book_spoke", "chapter_spoke", "verse_spoke", "something_else", "text_data");

foreach($names as $repNames => $name){
$patterns[] = $patterns[0];
$replacements[] = "<td class=\"".$name."\">";
}
echo preg_replace($patterns, $replacements, $string, 1)."\n";
echo preg_replace($patterns, $replacements, $string, 1)."\n";

<table>
<tr>
<td class="book">1</td>
<td class="chapter">1</td>
<td class="verse">1</td>
<td class="recordType">gn</td>
<td class="book_title">Genesis</td>
<td class="book_spoke">1</td>
<td class="chapter_spoke">1</td>
<td class="verse_spoke">1</td>
<td class="something_else">1</td>
<td class="text_data">In the beginning God created the heaven and the earth.</td>
</tr>
</table>



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum