08-24-2006, 09:46 AM
I'm trying to modify a query to avoid using group by or distinct to reduce the amount of time to get results.
I want to get every record from table 1 that has a matching record in table 2
The tables are in a 1 to many relationship
ie. One record in table 1 will have many matching records in table 2.
I've tried right, left and inner and neither seem to give the desired results - any suggestions from the gurus?
08-24-2006, 03:11 PM
You're not going to be able to do this only using joins. Using distinct, I think, is your best bet.
08-24-2006, 04:19 PM
Distinct is basically Group By - I tested and group by was marginally faster. Thgese damn queries are taking too long, just cant get them <0.2s
08-24-2006, 06:26 PM
I know you have other posts, but, have you ensured that your queries are using the indeces on your tables? Is it 0.2s when you run the query? or is it 0.2s from the time your application sends the query to the time it gets it's data back from the query (that is, is some of that time transferring data from the DB to the application)? Are you limiting your resultset to a reasonable number like 50 or 100 or are you selecting hundreds of thousands of rows?
Once you get down to it, with tons of data and fully tweaked queries, you have 2 options. 1) Modify your data model to increase your speed. 2) Distribute load over multiple servers.
If you manage to solve your problem, please post a writeup on the forums so we can all benefit from your pain. :)
08-24-2006, 07:52 PM
I'm making small progress here and there, in many cases the queries werent well written (I didnt write them ;) ) They werent optimised or benchmarked before going live.
The data that is coming back is small < 50 rows in every case but the data that has to be examined can be thousands of rows - basically every row within up to a 10 day time period needs to be examined. On the searches I was testing this morning that was around 20-25k rows, then each has to be examined to meet other criteria - all of which are indexed. Explain is pretty much telling me I'm getting the best out of the queries I can - other than filesorts and temporary tables but interestingly there was no improvement when I rejigged queries to not use temporary or filesort (keeping everything else the same) which I would have expected there to be.
I think we all need to get stoned and approach this from a completely different angle. I just cant see a way to keep all the functionality we have and get the query speed < 0.1s
I'm making improvements but they're small - 5-20% time decreases - not going to make a significant impact on the site. Oh well I get into the really gnarly queries next week. Wont that be a barrel of laughs ;)
08-24-2006, 09:53 PM
And is your data model sound?
08-24-2006, 10:11 PM
with the data model we've tried to hit a sound balance between fast processing of flat files containing in some cases 750k offers and minimalisng redundancy and at the same time offering flexability to clients departures and destinations are all linked on 3 digit iata codes, we could move over to linking on integers but I dont think that would make much of a saving
We store a lot of data but everything we search on is indexed. The rest is just used for display and we only display 10 -20 records at a time. Our mulit-value search fields use integers and bitwise operations which should be fairly fast. Search operations on really use 2 tables - the departure/destination airports (which I guess counts as 2 tables in searches) and the offers table. So our biggest joins should be only 3 tables.
I havent changed the key buffer size so thats something to look at.