nxzmplty
11-21-2011, 09:30 AM
Morning all
I've been thinking of ways to match different database records (let's call them XXX), and give them a "match" score. Whether this is a percentage, or simply "yes" or "no", I haven't decided, but I don't think it'll make much of a difference. One important thing to note is that I would expect a table of approximately 10000 records. Since a lot of these matches will be calculated via tags, we could then potentially have a hell of a lot of queries/joins.
So far I have come up with three solutions:
1) Write a function to do so, and when viewing a XXX the server will calculate the scores live. This will be the most up-to-date method, although potentially very slow if the site does actually grow to levels I'd be happy with.
2) Use a CRON job during quieter times to calculate matches from one XXX to another. We could then store these in a new table, say 'matches', and have a float/integer with the two XXXIDs. Obviously this will mean that the matches are delayed somewhat, and also the table could be 10000*10000*0.5 = huge. Unless of course you only keep matches above a certain threshold.
3) Use a CRON job as above, but store the data in another field in the XXXs table. This field would contain a string of related XXXs (limited to however many , possibly such as "356,89;6456,88;14,78;" in which the XXX with ID 6456 has a match of 88%. Again, this will eliminate low matches.
Or... secret option!
4) Stick an additional field 'lastUpdated' or something in the XXX table. If it was updated in the last 24/48/168 hours, then we run option 1) and generate a new set of matches. If not, we conserve our power and simply read from the existing record.
Thanks in advance for any input. Which would you choose and why? Maybe I'm missing something. I appreciate this is not specific to PHP but it's what I'll be using. What do you call XXX in this case? Entity? Widget? Thing? This is more of a theoretical debate rather than a request for code.
I've been thinking of ways to match different database records (let's call them XXX), and give them a "match" score. Whether this is a percentage, or simply "yes" or "no", I haven't decided, but I don't think it'll make much of a difference. One important thing to note is that I would expect a table of approximately 10000 records. Since a lot of these matches will be calculated via tags, we could then potentially have a hell of a lot of queries/joins.
So far I have come up with three solutions:
1) Write a function to do so, and when viewing a XXX the server will calculate the scores live. This will be the most up-to-date method, although potentially very slow if the site does actually grow to levels I'd be happy with.
2) Use a CRON job during quieter times to calculate matches from one XXX to another. We could then store these in a new table, say 'matches', and have a float/integer with the two XXXIDs. Obviously this will mean that the matches are delayed somewhat, and also the table could be 10000*10000*0.5 = huge. Unless of course you only keep matches above a certain threshold.
3) Use a CRON job as above, but store the data in another field in the XXXs table. This field would contain a string of related XXXs (limited to however many , possibly such as "356,89;6456,88;14,78;" in which the XXX with ID 6456 has a match of 88%. Again, this will eliminate low matches.
Or... secret option!
4) Stick an additional field 'lastUpdated' or something in the XXX table. If it was updated in the last 24/48/168 hours, then we run option 1) and generate a new set of matches. If not, we conserve our power and simply read from the existing record.
Thanks in advance for any input. Which would you choose and why? Maybe I'm missing something. I appreciate this is not specific to PHP but it's what I'll be using. What do you call XXX in this case? Entity? Widget? Thing? This is more of a theoretical debate rather than a request for code.