Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Results 1 to 5 of 5
  1. #1
    New Coder
    Join Date
    Mar 2011
    Posts
    63
    Thanks
    10
    Thanked 0 Times in 0 Posts

    Question How to Get SQL SOUNDEX Value in PHP

    For historical reasons the algorithm used by SQL to calculate the soundex value is different from the PERL algorithm implemented by the PHP soundex function. So for example I cannot do the following:

    Code:
    $sql = "INSERT INTO MyTable (Surname, SoundexOfSurname) VALUES('" . $surname . "', '" . soundex($surname) . "')";
    but rather I have to do:

    Code:
    $sql = "INSERT INTO MyTable (Surname, SoundexOfSurname) VALUES('" . $surname . "', LEFT(SOUNDEX('" . $surname) . "'),4)";
    That is I have to request the SQL server to generate the soundex value. The LEFT function is invoked to handle a quirk of the MySQL server implementation of the SOUNDEX function in that it can return a value that is more than 4 characters long. More specifically I must either always use the SQL implementation of SOUNDEX or always used the PHP/PERL implementation of SOUNDEX in a database application. The database which I inherited used the SQL implementation because it originally ran on Windows using MS Access, and I migrated it to LAMP.

    I have added a logging function to record all changes made to the database in records which contain the before and after image of the modified record, but this creates a problem because after the surname changes the PHP code doesn't know what the value of the SOUNDEX field in the database is because it was filled in by the server. I would rather not take the performance hit to read the value from the database just to fill in the log. Does anyone know of PHP code that will calculate the SQL SOUNDEX value.
    Last edited by jcobban; Aug 4th, 2017 at 04:01 AM.

  2. #2
    New Coder
    Join Date
    Mar 2011
    Posts
    63
    Thanks
    10
    Thanked 0 Times in 0 Posts
    I thought it might be useful to demonstrate the differences between the SQL SOUNDEX algorithm and the PERL/PHP algorithm:

    Code:
    Surname        STORED   SQL SOUNDEX     PHP SOUNDEX
    McKnockitter 	 M252 	M25236 	             M252 
    Honeyman 	 H500 	H500 	             H555
    Munn 	         M000 	M000 	             M500
    Poppett          P300 	P300 	             P130 
    Peachman 	 P250 	P250 	             P255 
    Cochrane         C650 	C650 	             C265 
    Chesley          C400 	C400 	             C240
    Tachenion        T250 	T250 	             T255 
    Wilcox 	         W420 	W420 	             W422
    The essential difference is that the SQL implementation of SOUNDEX mashes all occurrences of similar consonants, for example all nasals or all sibilants, together whereas the PERL/PHP implementation considers sequences of consonants that are separated either by vowels or a different type of consonant, to be distinct. For example Honeyman has only one digit 5, representing nasal consonants, in the SQL implementation whereas the PERL/PHP implementation has three because there are three groups of nasal consonants separated by vowels.
    Last edited by jcobban; Aug 4th, 2017 at 03:59 AM.

  3. #3
    Regular Coder Vege's Avatar
    Join Date
    Jan 2008
    Posts
    982
    Thanks
    22
    Thanked 132 Times in 131 Posts
    I think the function MySQL uses is quite simple to implement as PHP function. I assume it's this one:
    https://en.wikipedia.org/wiki/Soundex
    From MySQL manual about soundex:
    This function implements the original Soundex algorithm, not the more popular enhanced version (also described by D. Knuth). The difference is that original version discards vowels first and duplicates second, whereas the enhanced version discards duplicates first and vowels second.

  4. #4
    New Coder
    Join Date
    Mar 2011
    Posts
    63
    Thanks
    10
    Thanked 0 Times in 0 Posts
    Quote Originally Posted by Vege View Post
    I think the function MySQL uses is quite simple to implement as PHP function. I assume it's this one:
    https://en.wikipedia.org/wiki/Soundex
    From MySQL manual about soundex:
    The Wikipedia entry gives two different algorithms, the first one which every implementation of SQL that I have tested uses, and the second one which is used by every programming language I have tried. I have updated the Wikipedia entry to point this difference out as well as adding a citation to this forum entry to share the examples of what is generated. The use of informal language to describe an algorithm is unfortunate as it can lead to errors in implementation. For example when I try to follow the first algorithm in the Wikipedia article for the surname "Munn" I do not get the result generated by MySQL which merges the second set of nasals into the initial letter M giving the code "M000", Knuth's algorithm has been adopted by all of the programming languages because it was defined formally and therefore could be reliably migrated into any new language. It strikes me that the SQL implementations are kluges to try and generate the same results that AT&T produced.

  5. #5
    Regular Coder Vege's Avatar
    Join Date
    Jan 2008
    Posts
    982
    Thanks
    22
    Thanked 132 Times in 131 Posts
    Here is the C code for mysql server that handles the soundex creation. Obviously you know more about the problem but maybe you can use it to recreate it in PHP:
    https://github.com/mysql/mysql-serve...s/mf_soundex.c


 

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •