Go Back   CodingForums.com > :: Server side development > MySQL

Before you post, read our: Rules & Posting Guidelines

Reply
 
Thread Tools Rate Thread
Enjoy an ad free experience by logging in. Not a member yet? Register.
Old 12-14-2010, 01:01 AM   PM User | #1
macleodjb
Regular Coder

 
Join Date: Apr 2007
Posts: 317
Thanks: 24
Thanked 3 Times in 3 Posts
macleodjb is on a distinguished road
Querying the most frequent numbers in a database

Hi Guys,

I am trying to make a quick little lotto number tool on my website and I'm not sure how to write a couple of the queries, hopefully one of you can help.

I have a database with lotto numbers similar to this.

ID B1 B2 B3 B4 B5 S1 S2
1 20 19 22 34 45 45 26
2 7 19 34 45 53 18 10


Now the columns prefixed with B are the white lotto balls, and columns prefixed with S are the supplemental balls or powerball if you are in the u.s.

Query 1:
I would like to query the white balls to see which numbers show up most frequent.

Query 2:
Using the white balls, i would like to query to see which numbered pairs come up the most, in this example you will see 19 and 34 being a pair.

If someone could shed some light on this would be great. I appreciate any and all help. Thanks in advance.
macleodjb is offline   Reply With Quote
Old 12-14-2010, 02:12 AM   PM User | #2
toddandrae
New Coder

 
Join Date: Oct 2007
Posts: 84
Thanks: 0
Thanked 8 Times in 8 Posts
toddandrae is an unknown quantity at this point
I would probably set the tables up a little differently.

Code:
create table lottery_drawings (
lottery_id int not null auto_increment,
draw_date date,
location text,
primary key (lottery_id)) engine = innodb;

create table lottery_results (
 ball_id int not null auto_increment,
 lottery_id int not null,
 ball_number int not null,
 ball_type varchar(1) not null,
 primary key(ball_id)) engine = innodb;
Then you would be able to run a query like:
Code:
select
  ball_number,
  count(ball_number),
  ball_type
from
  lottery_results
group by
  ball_number, ball_type;
This would give you the count of the number as either a white ball or a 'special' ball.

You could also modify the query to find the percentage of times that ball has been drawn against the total number of draws, as well as how hot that number is in a given month or year.
toddandrae is offline   Reply With Quote
Old 12-15-2010, 01:01 AM   PM User | #3
macleodjb
Regular Coder

 
Join Date: Apr 2007
Posts: 317
Thanks: 24
Thanked 3 Times in 3 Posts
macleodjb is on a distinguished road
Thanks for the help. How could I also use this to query the most paired numbers as well?
macleodjb is offline   Reply With Quote
Old 12-15-2010, 01:58 AM   PM User | #4
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,168
Thanks: 59
Thanked 3,992 Times in 3,961 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
Quote:
Originally Posted by macleodjb View Post
How could I also use this to query the most paired numbers as well?
You can't. But assuming you modified your DB as he suggested, then you could write a different query to find the pairs.

But it will be a messy and complex query.

Let's see...

First of all, I disagree a little bit with Todd's DB design. Let me correct him:
Code:
create table lottery_results (
 lottery_id int not null,
 ball_number int not null,
 ball_type varchar(1) not null,
 primary key(lottery_id, ball_number) ) engine = innodb;
There is no reason at all to have that auto_increment ball_id in that table. The ball_id in no way helps identify a particular record in a useful way.

So get rid of it and change the primary key to a *composite* key of (lottery_id, ball_number) and you will have a useful key that (a) uniquely identifies each ball and (b) makes looking them up by lottery and ball_number as fast as you want and need.

So then...

Just to FIND all the the pairs in the table:
Code:
CREATE VIEW v_ball_pairs AS
    SELECT L1.lottery_id, L1.ball_number AS ball1, L2.ball_number AS ball2
    FROM lottery_results AS L1, lottery_results AS L2
    WHERE L1.lottery_id = L2.lottery_id
    AND L1.ball_number < L2.ball_number
    AND L1.ball_type = 'W'
    AND L2.ball_type = 'W';
And then to find pairs that match from different lotteries:
Code:
SELECT P1.lottery_id AS lottery1, P2.lottery_id AS lottery2, P1.ball1, P2.ball2
FROM v_ball_pairs AS P1, v_ball_pairs AS P2
WHERE P1.lottery_id < P2.lottery_id
AND P1.ball1 = P2.ball1
AND P1.ball2 = P2.ball2
Wasn't that fun? There will be a quiz on Friday.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Old 12-15-2010, 02:01 AM   PM User | #5
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,168
Thanks: 59
Thanked 3,992 Times in 3,961 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
Oh, I forgot! You wanted to know the *number* of cross-lottery pairs and find which pairs occurred most often.

Okay.
Code:
SELECT P1.lottery_id AS lottery1, P2.lottery_id AS lottery2, 
       P1.ball1, P2.ball2, COUNT(*) AS pairCount
FROM v_ball_pairs AS P1, v_ball_pairs AS P2
WHERE P1.lottery_id < P2.lottery_id
AND P1.ball1 = P2.ball1
AND P1.ball2 = P2.ball2
GROUP BY P1.lottery_id, P2.lottery_id, P1.ball1, P2.ball1
ORDER BY pairCount DESC LIMIT 20;
The LIMIT there is arbitrary. LIMIT 1 or LIMIT 5 or however many top results you want.

************

EDIT: The use of P2.ball2 in the SELECT and GROUP BY was actually a typo. I meant to use P1.ball2 both places. But since P1.ball2 = P2.ball2, it doesn't matter. Will work the same either way.

RE-EDIT: This query is *WRONG*!!! See the revised query in post #9.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.

Last edited by Old Pedant; 12-15-2010 at 07:24 PM..
Old Pedant is offline   Reply With Quote
Old 12-15-2010, 04:28 PM   PM User | #6
macleodjb
Regular Coder

 
Join Date: Apr 2007
Posts: 317
Thanks: 24
Thanked 3 Times in 3 Posts
macleodjb is on a distinguished road
wow, I appreciate the help here. i am trying to follow what exactly you are doing in this statement but i am completely lost. Can you break it down in laymens terms what you are doing. Thanks.
macleodjb is offline   Reply With Quote
Old 12-15-2010, 07:07 PM   PM User | #7
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,168
Thanks: 59
Thanked 3,992 Times in 3,961 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
Okay...when you create a VIEW, it's just kind of like a macro. You can then later use the view multiple times in another query without repeating all the code in the VIEW.

SO:
Code:
create a view and give it a name:
CREATE VIEW v_ball_pairs AS
    we want to get all possible pairs of ball numbers for *each* possible lottery
    so of course we need a lottery_id and two ball numbers:
    SELECT L1.lottery_id, L1.ball_number AS ball1, L2.ball_number AS ball2
    we have to join the lottery_results table TO ITSELF, because each record
    in that table only holds one ball_number; by joining to itself, we can get
    two ball_number values
    FROM lottery_results AS L1, lottery_results AS L2
    we join the table to itself, but we insist that the two copies of the table
    use the same lottery_id, because that's what we want: to balls from same lottery
    WHERE L1.lottery_id = L2.lottery_id
    and here is where we specify that the two ball numbers can *NOT* match
    by using < instead of !=, we ensure that we get each pair only once, with
    the first ball being the lower numbered one
    AND L1.ball_number < L2.ball_number
    and then we just make sure that both balls are white balls
    AND L1.ball_type = 'W'
    AND L2.ball_type = 'W';
That's part one. Have any trouble following that?
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Old 12-15-2010, 07:13 PM   PM User | #8
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,168
Thanks: 59
Thanked 3,992 Times in 3,961 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
Code:
so this is the same principle as the view, but since this
gets the final result we don't make it a view.
Again we want two lottery numbers (ids) and two ball numbers,
where the two ball numbers are both found, as a pair, in each lottery
SELECT P1.lottery_id AS lottery1, P2.lottery_id AS lottery2, P1.ball1, P2.ball2
but this time, instead of joining two tables, we join two identical views!
FROM v_ball_pairs AS P1, v_ball_pairs AS P2
The rest is the same idea:  We want a pair of lotteries, and to make sure
we don't get each pair twice we get the lower numbered lottery first
WHERE P1.lottery_id < P2.lottery_id
and then we insist that the pair of balls from each one be the same
AND P1.ball1 = P2.ball1
AND P1.ball2 = P2.ball2
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Old 12-15-2010, 07:22 PM   PM User | #9
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,168
Thanks: 59
Thanked 3,992 Times in 3,961 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
Finally, my TOTALS query was *WRONG*!!! WAY WAY too complex. I was after a correlation index, which is far beyond what you needed.

Much simpler to get what you wanted!

Here it is revised:
Code:
SELECT ball1, ball2, COUNT(*) AS pairCount
FROM v_ball_pairs
GROUP BY ball1, ball1
ORDER BY pairCount DESC LIMIT 20;
That will show you all possible pairs of balls *AND* how many times that pair appears in any lotteries. It orders the results by the count descending, meaning it shows those pairs that occur more often first. And it limits the results to (arbitrary...you can change) 20.

Explaining:
Code:
we just need to know what the ball numbers are and how many times
that particular pair of numbers appeared
SELECT ball1, ball2, COUNT(*) AS pairCount
we again use the view, for convenience.  Recall, the view shows us
all possible pairs in all possible lotteries.
FROM v_ball_pairs
to get the count for each pair, we just need to GROUP BY that particular pair
GROUP BY ball1, ball2
and then we want those pairs that occur most often to show up first
we limit the results to the top 20.
ORDER BY pairCount DESC LIMIT 20;
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Old 12-15-2010, 09:21 PM   PM User | #10
macleodjb
Regular Coder

 
Join Date: Apr 2007
Posts: 317
Thanks: 24
Thanked 3 Times in 3 Posts
macleodjb is on a distinguished road
Ok thanks for that explanation. I can somewhat understand that. However, I am thinking that there is a flaw unless you tell me otherwise. If i use the database structure that you set up, it does not contain the date the number was drawn. So, how will your query distinguish between The number 7 from August 10th versus the number 7 from December 10th. The pairing needs to come from the same drawing date. If i am wrong let me know, then i must not have understood thoroughly. Thanks a lot so far for the help.
macleodjb is offline   Reply With Quote
Old 12-15-2010, 09:43 PM   PM User | #11
toddandrae
New Coder

 
Join Date: Oct 2007
Posts: 84
Thanks: 0
Thanked 8 Times in 8 Posts
toddandrae is an unknown quantity at this point
You have your parent table lottery_drawings that is referenced by your results table's lottery_id column. That parent has the date drawn as well as a field to add a 'location', although you could go a step further and have another table lottery location and have that referenced by a foreign key.
toddandrae is offline   Reply With Quote
Users who have thanked toddandrae for this post:
macleodjb (12-16-2010)
Old 12-15-2010, 10:01 PM   PM User | #12
Old Pedant
Supreme Master coder!

 
Old Pedant's Avatar
 
Join Date: Feb 2009
Posts: 23,168
Thanks: 59
Thanked 3,992 Times in 3,961 Posts
Old Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to allOld Pedant is a name known to all
Translating Todd's comment:

If you use his main table, as given:
Code:
create table lottery_drawings (
lottery_id int not null auto_increment,
draw_date date,
location text,
primary key (lottery_id)) engine = innodb;
Then you will have records in there such as
Code:
lottery_id   draw_date    location 
         1   2010-9-10    NY
         2   2010-9-10    NJ
         3   2010-9-11    NY
etc.
And then in the other table (modified to show the relationship, as you really should do it):
Code:
create table lottery_results (
 lottery_id int not null references lottery_drawings(lottery_id),
 ball_number int not null,
 ball_type varchar(1) not null,
 primary key(lottery_id, ball_number) ) engine = innodb;
you will have records such as:
Code:
lottery_id  ball_number  ball_type
          1           23    W
          1           31    W 
          1           67    W
 ...
         17           2     W
         17          31     W
         17          23     W
...
So the lottery_id identifies WHICH drawing the balls come from.

That VIEW that I showed you would return results such as
Code:
lottery_id  ball1  ball2
         1     23     31
         1     23     67
         1     31     67
       ...
        17      2     23
        17      2     31
        17     23     31
       ...
Okay?
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
Old Pedant is offline   Reply With Quote
Users who have thanked Old Pedant for this post:
macleodjb (12-16-2010)
Old 12-16-2010, 02:42 AM   PM User | #13
macleodjb
Regular Coder

 
Join Date: Apr 2007
Posts: 317
Thanks: 24
Thanked 3 Times in 3 Posts
macleodjb is on a distinguished road
Excellent, I believe i understand it now. Thanks for taking the time to break it down for an idiot like me. Very much appreciated.
macleodjb is offline   Reply With Quote
Old 01-07-2011, 05:38 PM   PM User | #14
macleodjb
Regular Coder

 
Join Date: Apr 2007
Posts: 317
Thanks: 24
Thanked 3 Times in 3 Posts
macleodjb is on a distinguished road
I had a couple of errors when i put this into sql so i had a make a couple changes. I'm not sure if this altered any of your code or not but it was the best i could come up with my knowledge.

Code:
SELECT        TOP (20) ball1, ball2, COUNT(*) AS pairCount
FROM            WhiteBallPairs
GROUP BY ball1, ball2
ORDER BY pairCount DESC
I was trying to get triple pairs but I can't get the syntax correct for the view

This is what i had
Code:
SELECT        L1.drawing_id, L1.result_number AS ball1, L2.result_number AS ball2, L3.result_number AS ball3
FROM            dbo.game_results AS L1 INNER JOIN
                         dbo.game_results AS L2 INNER JOIN
                       dbo.game_results AS L3 ON L1.drawing_id = L2.drawing_id AND L2.drawing_id= L3.drawing_id AND L1.result_number < L2.result_number AND L2.result_number < L3.result_number
WHERE        (L1.result_type = 'WBall') AND (L2.result_type = 'WBall') AND (L3.result_type = 'WBall')

Last edited by macleodjb; 01-07-2011 at 05:45 PM..
macleodjb is offline   Reply With Quote
Old 01-07-2011, 05:54 PM   PM User | #15
macleodjb
Regular Coder

 
Join Date: Apr 2007
Posts: 317
Thanks: 24
Thanked 3 Times in 3 Posts
macleodjb is on a distinguished road
Ok I think i got the triple pair down, not sure, let me know if this is correct.

Code:
SELECT        L1.drawing_id, L1.result_number AS ball1, L2.result_number AS ball2, L3.result_number AS ball3
FROM            dbo.game_results AS L1 INNER JOIN
                         dbo.game_results AS L2 ON L1.drawing_id = L2.drawing_id AND L1.result_number < L2.result_number INNER JOIN
                         dbo.game_results AS L3 ON L2.drawing_id = L3.drawing_id AND L2.result_number < L3.result_number
WHERE        (L1.result_type = 'WBall') AND (L2.result_type = 'WBall') AND (L3.result_type = 'WBall')
macleodjb is offline   Reply With Quote
Reply

Bookmarks

Jump To Top of Thread


Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 12:38 AM.


Advertisement
Log in to turn off these ads.