...

View Full Version : preg_match() maximum string length reached



bauhsoj
03-29-2007, 07:23 PM
I am using PHP 5.2.0 on Apache 2.0 and trying to get preg_match() to scan a certain result string returned. However, it wouldn't match the regular expression with the data. I verified the expression is valid in RegexBuddy. I also tested it again in PHP with a shorter string containing approximately the same kind of content and it worked fine.

After some very tedious testing I found that preg_match() simply stops working when the string is 100,055 chars in length. No errors, warnings, or notices. It just won't do the match if there are that many chars or more.

I have never experienced this with early versions of PHP and it is really causing problems.

Does anyone know how to resolve this issue?

iLLin
03-29-2007, 07:58 PM
I dunno, explode your string into array by strpos max character and then do your pregmatch for each and then put it back together LOL.

bauhsoj
03-29-2007, 08:28 PM
That won't work unfortunately. Each result has a serious of delimiting tags used to identify where each data segment begins. The contents of each tags has to be unserialized after being captured. Each data segment could in and of itself go over the 100,055 char limit.

iLLin
03-29-2007, 08:43 PM
count by /n or w/e your delimeter is. Then take a best guess on max char and go way under it for a buffer. Then for every 100 delimeters lets start a new array. :)

But I don't think you should be having that problem, have you checked to see if its a bug?

Side note, how well does something that large perform? Time to upgrade to the db?

dumpfi
03-29-2007, 08:44 PM
Maybe a series of strpos and substr calls and some custom logic solves the issue.

dumpfi

gsnedders
03-29-2007, 09:28 PM
You should probably question whether you should really be using regular expressions on such a long string, as they aren't the most efficient of things.

marek_mar
03-29-2007, 09:31 PM
PCRE has some limitations as to backreference length and match length which can depend on the options it was compiled with and the OS it is running on.
Regex might not be suitable for what you want it to do.

bauhsoj
03-30-2007, 07:17 PM
Maybe a series of strpos and substr calls and some custom logic solves the issue.

dumpfi

Worked like a charm! :thumbsup:

I got so wrapped up in the simplicity of just writing a regular expression to handle it that it didn't even occur to me to just write a simple algorithm like that.

Here is what I came up with to solve the issue in case anyone might find it useful:
$result_tags = array('codes', 'errors', 'lines');
$returned_result_data = array();

if (!empty($results)) {
foreach ($result_tags as $tag) {
$opening_tag_pos = strpos($results, "<$tag>");

if ($opening_tag_pos !== false) {
$closing_tag_pos = strpos($results, "</$tag>");

if ($closing_tag_pos !== false) {
$tag_data_begins = strlen("<$tag>") + $opening_tag_pos;

$tmp_extracted_data = substr($results, $tag_data_begins, $closing_tag_pos - 1);

if ($tmp_extracted_data != '') {
$returned_result_data[$tag] = $tmp_extracted_data;

$tmp_unserialized = @unserialize($returned_result_data[$tag]);

if ($tmp_unserialized !== false || strpos($tmp_extracted_data, 'bool') !== false)
$returned_result_data[$tag] = $tmp_unserialized;
}
}
}
}
}



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum