...

View Full Version : inconsistent results with in_array



cfructose
07-17-2007, 10:47 AM
In essence, I'm hoping someone might be able to suggest what I can try next to isolate the cause of my apparent paradox: Sometimes in_array returns TRUE that something is in it, and sometimes not - apparently randomly.

Excuse the preamble - the background is necessary, as my problem may be connected to an earlier issue regarding the recognition of Greek characters...

Yesterday on this forum, te senior coder NancyJ was extremely helpful to me regarding an attempt to automatically generate span lang tags around all words in Greek on a largely English-langage document.
The thread is:
http://www.codingforums.com/showthread.php?t=118935

We got bogged down, however, in the question of how php recognises Greek characters, and made no progress on my main problem hence the post.

NancyJ - if you see this - read on for an interesting development in the character recognition part: it's solved...


<?php
//THIS SUCCESSFULLY PUTS A SPAN AROUND EVERY GREEK LETTER!

$greek_lang_open = "<span lang=\"el\">";
$greek_lang_close = "</span>";

$greek_chars = array("α", "β", "γ", "δ", "ε", "ζ", "η", "θ", "ι", "κ", "λ", "μ", "ν", "ξ", "ο", "π", "ρ", "σ", "ς", "τ", "υ", "φ", "χ", "ψ", "ω", "ά", "έ", "ή", "ί", "ό", "ύ", "ώ", "ϊ", "Α", "Β", "Γ", "Δ", "Ε", "Ζ", "Η", "Θ", "Ι", "Κ", "Λ", "Μ", "Ν", "Ξ", "Ο", "Π", "Ρ", "Σ", "Τ", "Υ", "Φ", "Χ", "Ψ", "Ω", "Ά", "Έ", "Ή", "Ί", "Ό", "Ύ", "Ώ", "Ϊ");

foreach ($greek_chars as $current_letter) { // add span lang tags
$cc_text = str_replace($current_letter,$greek_lang_open.$current_letter.$greek_lang_close,$cc_text);
}

Now, that's great, as it proves that php is having no trouble recognising Greek characters. Moreover - this'll be of particular interest to NancyJ - it seems that there are two separate encodings for Greek capital letters that happen to look exactly like Latin capital letters. There will therefore be no confusion between Α and A, as one is really &#916 semicolon, whereas the other is some other number.

Now for a long bit of code which I've been trying to debug, where "in_array works!" is not printed. Skip over this, please, if it's too much work to read through, and see the paradox below.


// add <span lang> tags around Greek words
$greek_chars = array("α", "β", "γ", "δ", "ε", "ζ", "η", "θ", "ι", "κ", "λ", "μ", "ν", "ξ", "ο", "π", "ρ", "σ", "ς", "τ", "υ", "φ", "χ", "ψ", "ω", "ά", "έ", "ή", "ί", "ό", "ύ", "ώ", "ϊ", "Α", "Β", "Γ", "Δ", "Ε", "Ζ", "Η", "Θ", "Ι", "Κ", "Λ", "Μ", "Ν", "Ξ", "Ο", "Π", "Ρ", "Σ", "Τ", "Υ", "Φ", "Χ", "Ψ", "Ω", "Ά", "Έ", "Ή", "Ί", "Ό", "Ύ", "Ώ", "Ϊ");
$greek_lang_open = "<span lang=\"el\">";
$greek_lang_close = "</span>";
$span_open_length = strlen($greek_lang_open); //measures how many chars there are so we can jump forward later after inserting
$span_close_length = strlen($greek_lang_close); //measures how many chars there are so we can jump forward later after inserting
$pos=0; //pointer
$cc_text_length = strlen($cc_text);
$gr1 = "no";
$gr2 = "no";

while ($pos<$cc_text_length) { //scan file
$current_letter = substr($cc_text,$pos,1); //grab next character from pointer
if ($gr1 == "no" && $gr2 == "no" && in_array($current_letter, $greek_chars)) {
echo "YEAH!";
$gr1 = "yes";
$cc_text = str_replace($current_letter,$greek_lang_open.$current_letter,$cc_text); // add open span lang tag
$pos = $pos+$span_open_length;
}
elseif ($gr1 == "yes" && in_array($current_letter, $greek_chars)) {
$gr1 = "no";
$gr2 = "yes";
}
elseif ($gr2 == "yes" && !(in_array($current_letter, $greek_chars))) {
$gr2 = "no";
$gr1 = "no";
$cc_text = str_replace($current_letter,$greek_lang_close.$current_letter,$cc_text); // add close span lang tag
$pos = $pos+$span_close_length;
}
$pos++; //increment the pointer
$cc_text_length = strlen($cc_text); //re-measure length of $cc_text, as it's changed from our adding span tags
}

I've tested the following:
echoing $current_letter inside the while loop results in the entirety of the data in $cc_text being displayed, which is what we expected, so there's no doubt that the line


$current_letter = substr($cc_text,$pos,1);

is correctly grabbing each character. (Note the line


$pos++;

before the end of the while loop).

So, my conclusion is that for some perculiar reason, in_array is 'not working'.

For comparison, note that the following is successful:


//THIS RETURNS "in_array works!", AS WE EXPECTED, GIVEN THAT ζ IS INDEED IN THE ARRAY
if (in_array("ζ", $greek_chars)) {
echo "in_array works!";
} else {
echo "in_array failed";
}

Any ideas what I can do to test where the problem is? Whatever I strip away from the code, I end up with in_array working sometimes, and not other times.

Thanks a lot

NancyJ
07-17-2007, 11:31 AM
Probably not much I can do to help since running your little snippet above does not put spans around every actual greek character, it places them around nearly every character (including s - which makes all the spans very nested and weird)
Try this:


foreach($greek_chars as $char)
{
if(in_array($char, $greek_chars))
{
echo "$char is in array<br />";
}
else
{
echo "$char is not in array";
}
}

For me it says that every character is sucessfully matching in_array(), if it doesnt for you, it might show you what is working and what isnt.

I cant test your code because with anything more than a couple of words it gives up - largely because my in_array is matching nearly every character, meaning cc_text is constantly getting longer

I also think you should rewrite the code to be faster and more efficient.

cfructose
08-02-2007, 07:46 PM
deleted (next post is same)

cfructose
08-02-2007, 07:51 PM
After endless struggling, I've succeeded! I'm posting this in case anyone finds it useful...

(Note that the code, while working perfectly, could do with quite a bit of tightening to make it more efficient, eg, some regular expressions for the 'skip punctuation' bit etc)...


<?php
//convert all quotes and greek characters (all high-level UTF-8) to asciis.
$cc_text = preg_replace('/([\xc0-\xdf].)/se', "'&#' . ((ord(substr('$1', 0, 1)) - 192) * 64 + (ord(substr('$1', 1, 1)) - 128)) . ';'", $cc_text);
$cc_text = preg_replace('/([\xe0-\xef]..)/se', "'&#' . ((ord(substr('$1', 0, 1)) - 224) * 4096 + (ord(substr('$1', 1, 1)) - 128) * 64 + (ord(substr('$1', 2, 1)) - 128)) . ';'", $cc_text);


// create greek character array
$ansi=902;
while($ansi<=974) {
$greek_chars[] = "&#".$ansi.";";
$ansi++;
}
// add angled quotation marks to array
$greek_chars[] = "&#171;";
$greek_chars[] = "&#187;";

$greek_lang_open = "<span lang=\"el\">";
$greek_lang_close = "</span>";
$span_open_length = strlen($greek_lang_open); //measures how many chars there are so we can jump forward later after inserting
$span_close_length = strlen($greek_lang_close); //measures how many chars there are so we can jump forward later after inserting
$pos=0; //pointer
$cc_text_length = strlen($cc_text);
$current_letter_length = 6;
$first_greek_letter_found = "no";
$number_of_close_tag_occurences = 0;
$tag_detected = "no";

while ($pos<$cc_text_length) { //scan file
//grab characters in search of Greek ANSI codes
$current_letter = substr($cc_text,$pos,$current_letter_length);



if ($tag_detected == "no") {



//if 1 Greek letter has already been found:
if ($first_greek_letter_found == "yes") {
if (in_array($current_letter, $greek_chars)) {
//if next one is Greek too, move pointer
$pos = $pos+$current_letter_length-1;
}
//as long as the next group of 6 symbols doesn't start with space or punctuation, we'll put in the close span tag
elseif ((substr($current_letter, 0, 1) != " ") && (substr($current_letter, 0, 1) != ",") && (substr($current_letter, 0, 1) != "'") && (substr($current_letter, 0, 1) != ";") && (substr($current_letter, 0, 1) != ".")) {
// add close span tag
$cc_text = substr_replace($cc_text, $greek_lang_close.$current_letter, $pos, $current_letter_length);
//re-measure length of $cc_text, as it's changed from our adding span tags
$cc_text_length = strlen($cc_text);
$pos = $pos+$span_close_length;
$first_greek_letter_found = "no";//reset the var
}
}



//if no Greek letters have been found, but this one IS Greek:
elseif ($first_greek_letter_found == "no" && in_array($current_letter, $greek_chars)) {
$first_greek_letter_found = "yes";
//add open span lang tag
$cc_text = substr_replace($cc_text, $greek_lang_open.$current_letter, $pos, $current_letter_length);
//re-measure length of $cc_text, as it's changed from our adding span tags
$cc_text_length = strlen($cc_text);
$pos = $pos+$current_letter_length+$span_open_length-1;
}



//if no Greek letter found, but 1st symbol is <
elseif (substr($current_letter, 0, 1) == "<") {
$tag_detected = "yes";
}
}



//if tag_detected = yes, we want to skip all text until the close of the tag (">") to avoid putting spans within, say, image alt text that includes Greek characters.
else {
if (substr($current_letter, 0, 1) == ">") {
$tag_detected = "no";
}
}
$pos++; //increment the pointer
}
?>



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum