PDA

View Full Version : Sorting special characters (with accents, etc)


john85
01-11-2003, 07:44 PM
I have a directory-like website that lists bands in alphabetical order. A problem that I have recently run into:

Global bands have accents and special characters in their names, and when I use my alphabetical sort code, it lists these special characters last. For example, the band "Chérie" would be listed after "Chump." What can I do to correct this? Here's an example of my code:


&open_file("FILE1","",$filename);

$counter = 0;
$maxhits = 20;
$newhits = 0;
$matchcnt = 0;


while (($line = &read_file("FILE1")) && ($counter < $results)) {
# split the fields at the ¦ character
@tabledata = split(/\s*\¦\s*/,$line ,$fields);
&check_record;
if ($found == 1) {

$arid[$counter] = $id;
$arletter[$counter] = $letter;
$arbandname[$counter] = $bandname;
$argenre[$counter] = $genre;
$arstyle[$counter] = $style;
$aryears[$counter] = $years;
$arfans[$counter] = $fans;
$arblank[$counter] = $blank;

$counter++;


}
}
close(FILE1);

# sort the arrays
$k = 0;
$matchnum = 0;

while ($k < $counter) {
$j = 0;
$lowname = $arbandname[0];
while ($j < $counter) {

if ($lowname ge $arbandname[$j]) {
$lowname = $arbandname[$j];

$matchnum = $j;
}
$j++;
}
$sortbandname[$k] = $lowname;
$sortnumname[$k] = $matchnum;
$arbandname[$matchnum] = "zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz";
$k++;

}

$l = 0;
while ($l < $counter) {

$sortitem = $sortnumname[$l];

$id = $arid[$sortitem];
$letter = $arletter[$sortitem];
$bandname = $sortbandname[$l];
$genre = $argenre[$sortitem];
$style = $arstyle[$sortitem];
$years = $aryears[$sortitem];
$fans = $arfans[$sortitem];
$blank = $arblank[$sortitem];

&print_record;


$l++;

}


A working example can be found by visiting http://www.globaldust.com/ and clicking on a letter. Any help would be greatly appreciated. Thanks!

Mouldy_Goat
01-13-2003, 12:34 AM
Hi John,

I found this (http://unix.be.eu.org/docs/tpj/issues/vol4_2/tpj0402-0029.html) article on the issue, which comes to a solution which should be fine for you, i.e. of doing something like:


@sorted = sort { normalise($a) cmp normalise($b) } @unsorted;

sub normalise {
my $in = $_[0];
$in =~ tr/é/e/;
return lc($in);
}

Except you'll want to convert all accented letters into un-accented ones for the comparison.

john85
01-13-2003, 02:40 AM
Thanks! I've altered my code and it should be working!