I know there are quite a few of you who are very knowledgable in things beyond, just making it work So I hope you can advise me with this.
We update our database from external data sources, but they often contain duplicates, the files we work with a fairly large - 100k-800k records per file. To remove the duplicates, what I have been doing is doing a select distinct into another table then taking everything out of that to put into the main table - this is slow and resource hoggy.
Now the idea I had (and have been trialling) is to instead, add a unique index containing all the fields that we output to the website, so the duplicates never get inserted into the database in the first place.
SO the index looks like
Can anyone advise me how this might effect performance - particularly of selects and joins etc.
PRIMARY PRIMARY 547154 fldOfferID
fldClientID_2 UNIQUE 547154 fldClientID
fldDepartureDate INDEX 6754 fldDepartureDate
fldDestinationAPC INDEX 32185 fldDestinationAPC
fldDepartureAPC INDEX 31 fldDepartureAPC
fldDuration INDEX 31 fldDuration
fldBoardBasis INDEX 14 fldBoardBasis
fldClientID INDEX 10 fldClientID