Hello and welcome to our community! Is this your first visit?
Register
Enjoy an ad free experience by logging in. Not a member yet? Register.
Page 1 of 2 12 LastLast
Results 1 to 15 of 21

Thread: dupicates check

  1. #1
    Regular Coder
    Join Date
    Sep 2009
    Location
    Calgary, Alberta
    Posts
    239
    Thanks
    47
    Thanked 3 Times in 3 Posts

    dupicates check

    Can you check for duplicates on insert.??

    if yes.. can you please help me out..

    examples please.



    Thanks in advance.


    Slayer.

  • #2
    Regular Coder
    Join Date
    Nov 2012
    Posts
    115
    Thanks
    7
    Thanked 12 Times in 12 Posts
    This is the code I use to create a randomly generated ID, check if it exists and otherwise create a new ID.

    So yes, it is possible =)

    PHP Code:
    function generate_id() {
        
    $id mt_rand(10000009999999);
        
    $sql "SELECT a_id FROM {$this->table} WHERE a_id={$id};";
        
    $return $this->query2($sql);
        if(
    $return->num_rows == 0) {
            return 
    $id// This is where you would put your INSERT statement
        
    } else {
            
    $this->generate_id();
        }

    Last edited by Thyrosis; 01-07-2013 at 10:15 PM. Reason: clarification

  • #3
    Regular Coder
    Join Date
    Sep 2009
    Location
    Calgary, Alberta
    Posts
    239
    Thanks
    47
    Thanked 3 Times in 3 Posts
    how would this work for this type of insert?

    PHP Code:
    if(isset($_POST['filename']))
       {
              
    $filename=$_POST['filename'];
              
    $handle fopen("$filename""r");
              while ((
    $data fgetcsv($handle200000",")) !== FALSE)
              {
    include (
    'config.php');

    $import="INSERT into used(`used_id`,`stock`,`carline`,`listprice`,`make`,`miles`,`serial`,`stockno`,`year`,`desc`,`cylinders`,`transmission`,`trim`,`color`,`warranty`,`condition`) values('','$data[0]','$data[1]','$data[2]','$data[3]','$data[4]','$data[5]','$data[6]','$data[7]','$data[8]','$data[9]','$data[10]','$data[11]','$data[12]','$data[13]','$data[14]')";
    mysql_query($import) or die(mysql_error());

        }
        
    fclose($handle);
        print 
    "Import done";
           }
              else
               {
                 print 
    "<form action='importcsv.php' method='post'>";
                 print 
    "Type file name to import:";
                 print 
    "<input type='text' name='filename' size='20'>";
                 print 
    "<input type='submit' name='submit' value='submit'></form>";
                 } 
    Sorry... bit of a noob still..

    Thanks in advance...


    Slayer.

  • #4
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,987
    Thanks
    4
    Thanked 2,660 Times in 2,629 Posts
    What are you checking duplicates for? And if you find them, how do you react?
    Assuming that the duplicate is on UK/PK and you don't need to keep duplicates, than you can simply ignore the result of the query and don't provide any output for it. If you need more information, you would store the data from the file into an array where you would then check for duplicates. But I don't know what you mean by a duplicate (all records, combination of records, single value over multiple records, etc), so its hard to point out the right way.
    You should look at using MySQLi or PDO though. Using the prepared statements would be a lot faster to work with and are more secure than simply running inserts for every entry. Plus MySQL library will silently die soon.

  • #5
    Regular Coder
    Join Date
    Sep 2009
    Location
    Calgary, Alberta
    Posts
    239
    Thanks
    47
    Thanked 3 Times in 3 Posts
    Basically .. it is going to work this way...


    I have 2 databases.

    1) a reusable database that is emptied on a weekly or bi-weekly basis.

    2) a database that keeps all the records that have been entered into the reusable database.

    this way if the car that we sold .. that at some point is removed from the reusable DB.. comes back to us for resale or on a return lease... I can just bull the information from the 2nd database based on it vin#.

    however.. I do not want to add duplilcates to the database..

    as I am importing the data through the above code out of a csv file and maybe have 25 to 30 duplicates from the week or 2 weeks before.

    I hope this is making sense.


    Please fill me in on what you think.. as you guys are the pros and I value your input.

    Thanks again, Slayer.

  • #6
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,023
    Thanks
    75
    Thanked 4,323 Times in 4,289 Posts
    Again, you seem to be talking about TABLES, not DATABASES. Don't mix the two up.

    A database is a collection of tables--and almost surely you should have more than the two you mentioned. You need to show us the SCHEMA of you proposed tables. That is, all the field names and types in each table.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #7
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,987
    Thanks
    4
    Thanked 2,660 Times in 2,629 Posts
    If you have constraints set up to dictate what has to be unique, than you never need to worry about any duplicates.
    Not sure why you are emptying a table (or db?) regularly, but can you not just use INSERT IGNORE syntax with INSERT. . . SELECT syntax?
    If you are using surrogates I think your only options would be to either select for each entry first and determine if its already there, or insert everything anyway and then search for duplicates. Part of the reason why I hate surrogates so much :|

  • #8
    Regular Coder
    Join Date
    Sep 2009
    Location
    Calgary, Alberta
    Posts
    239
    Thanks
    47
    Thanked 3 Times in 3 Posts
    As much as I do not understand what you have just said Fou-Lu...

    I have no real restrictions to the table....integer for the main id... varchar for other data... and the size of the field... nothing more really.

    The reason for the flushing of the one table is that it is easier with the data we download from our contact/vehicle database to just reimport the data back in.

    reasons are as follows.. sold vehicle/traded vehicle/New vehicle add/ etc..

    and with there being anywhere from 50 to 90 vehicles in the table at one time.. I wanted to on importing to the table... to check against another table to see if there are duplicated entries and not insert them to the global vehicle table..

    Please explain " using surrogates ".


    P.S. If there is a better way of doing this... please explain.. I would love to know... this is how we learn.

    Thanks, Slayer

  • #9
    Regular Coder
    Join Date
    Sep 2009
    Location
    Calgary, Alberta
    Posts
    239
    Thanks
    47
    Thanked 3 Times in 3 Posts
    Insert ignore?

    I see...


    How can I get it to check from the data I am importing through PHP to the second table on insert?

    because right now the first table is blank and the second table has 3500 vehicles listed and I want it to check this table field(serial) before inserting into it.

    As we do not want to have the same vehicle in the table.

    and does the sql insert need to be in a loop of some sort to go through all the records?

    Thanks, Slayer.

  • #10
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,023
    Thanks
    75
    Thanked 4,323 Times in 4,289 Posts
    Okay, so define "duplicate" from your point of view.

    If you *do* have a PRIMARY KEY (or any UNIQUE key, for that matter) that is able to detect a duplicate, then all you have do to id something like
    Code:
    INSERT IGNORE INTO backuptable
    SELECT * FROM maintable
    But another possibily, which I prefer, is to keep a FULL AUDIT TRAIL.

    So my backuptable adds 1 or 2 fields to the maintable. For example, whenAdded DATETIME

    And then I just do
    Code:
    INSERT INTO backuptable
    SELECT NOW(), * FROM maintable
    Yes, you will now have duplicates in the backuptable. But you can always find the most recent one for a given id, when needed. For example:
    Code:
    SELECT * FROM backuptable WHERE id = 77183 ORDER BY whenAdded DESC LIMIT 1
    But now you can *also* get ALL insertions of id 77183 if you need/want them.
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.

  • #11
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,987
    Thanks
    4
    Thanked 2,660 Times in 2,629 Posts
    I'm still not seeing a need for using a temporary table or db for this. Why move and delete the old stuff out in the first place? Why not keep adding to the existing table? If you use non-surrogate keys or use a unique key than you don't need to worry about actual duplicates ever being created.

    A surrogate is a key that doesn't directly relate to the record but does represent the entire entity. Typical surrogates are things like auto incrementing numbers, but anything can be surrogate (I use GUID's for example which carry no value to the entity as a whole, but sure make it easy to write the joins). They need to uniquely represent the tuple system wide in an application (auto increment would do so by representing itself as the surrogate key of the table x making itself unique by both its number and where it belongs). They carry no real worth in describing the entity, but can be used to find it. These are especially problematic when describing duplicates since two identical records can exist with differing surrogate keys which therefore defines these as being unique at the storage level. Non-surrogate is where you use something that uniquely identifies the record (single or composite keys). An email address for a user would likely be a good primary key, a VIN on a vehicle would be a good primary key, and a house address or lot package number for a home.

    Since there's been a couple of replies whilst I was typing this (and got caught up in actual work to do :P), with the PHP code you want to do as little as you can with PHP. It is much much faster to let SQL pull with an INSERT SELECT than it is to use SELECT and then loop for an INSERT in PHP.

    I'm not sure if I'm being dense or not, but I still don't see a need for the two tables. . .

  • #12
    Regular Coder
    Join Date
    Sep 2009
    Location
    Calgary, Alberta
    Posts
    239
    Thanks
    47
    Thanked 3 Times in 3 Posts
    Well on my secondary... or global table as I call it.

    has 2 real items that must be UNIQUE..

    vehicle_id Primary Key
    serial UNIQUE

    I do not want to add any of the 18 fields of the table to the global table if I can get away with it if the Primary Key or Unique field match any in this table.

    but I need it to if it finds them to just not insert them... and continue on with the rest of the insert...

    Slayer.

  • #13
    God Emperor Fou-Lu's Avatar
    Join Date
    Sep 2002
    Location
    Saskatoon, Saskatchewan
    Posts
    16,987
    Thanks
    4
    Thanked 2,660 Times in 2,629 Posts
    If a field is unique, it will reject any entry attempting to reuse that value. So if serial has previously existed and you attempt to use INSERT (or INSERT SELECT as well) it will reject the attempt. With the INSERT SELECT or any "batch" type insert, use the INSERT IGNORE syntax so it keeps going when it fails to insert.
    I assume by this:
    I do not want to add any of the 18 fields of the table to the global table if I can get away with it if the Primary Key or Unique field match any in this table.
    You mean you don't want to add the record if it conflicts with the PK/UK right?

  • #14
    Regular Coder
    Join Date
    Sep 2009
    Location
    Calgary, Alberta
    Posts
    239
    Thanks
    47
    Thanked 3 Times in 3 Posts
    Fou-Lu..


    I guess I am not explaining it to you properly...


    The reason for the 2 table is this.


    Step 1

    I have a 55 to 90 vehicle database that I upload through my php file to import the data from to my vehicle table.

    then I upload a zip file of over 5000 image that go to this vehicle based on the serial number and then it unzips it self to a predefined dir to associate with the vehicle.

    Step 2

    I would like to see the data I imported to the table also imported to another table without duplicated serial numbers and the data related to it. so that I can reference it again later or on a second or third vehicle website.

    Step 3

    If 1 or 2 weeks later I need to change this table as we sold 25 of the 50 cars in the database and gained another 35 additional cars.. I would download the CSV file from our main contact/Vehicle software and reupload to our table using step 1.

    Is there a way of keeping the data in one table but making them old or mark unused... when a new csv file is imported.. not sure of a proper way to make this happen with the ease of having multiple tables.

    The Global Table as I call it would be for reference sake later if the vehilce came back to us on a lease or we bought it back for some reason or it gets traded in to us again.. and then I can look up the vehicle details and associated information in without having to add it all in again.

    Slayer

  • #15
    Supreme Master coder! Old Pedant's Avatar
    Join Date
    Feb 2009
    Posts
    25,023
    Thanks
    75
    Thanked 4,323 Times in 4,289 Posts
    Is there a way of keeping the data in one table but making them old or mark unused... when a new csv file is imported.. not sure of a proper way to make this happen with the ease of having multiple tables.
    Ummm... add one field to your *ONE* table: isCurrentlyActive BIT NOT NULL DEFAULT 0 (or DEFAULT 1 ... can make arguments for either).

    Import the new CSV file to a temp working table. (I do agree with doing the upload to a separate table...just in case there is a glitch in the CSV and also because the upload can consume measurable time.)

    Then do this:
    Code:
    UPDATE maintable SET isCurrentlyActive = 0;
    
    INSERT INTO maintable (primaryKeyField, field1, field2, field3, isCurrentlyActive )
    SELECT primaryKeyField, field1, field2, field3, 1 FROM stagingtable
    ON DUPLICATE KEY UPDATE isCurrentlyActive = 1;
    
    TRUNCATE TABLE stagingTable;
    An optimist sees the glass as half full.
    A pessimist sees the glass as half empty.
    A realist drinks it no matter how much there is.


  •  
    Page 1 of 2 12 LastLast

    Posting Permissions

    • You may not post new threads
    • You may not post replies
    • You may not post attachments
    • You may not edit your posts
    •