PDA

View Full Version : Count only Distincts (SQL)???


christrinder
06-17-2003, 11:46 AM
Hi guys,

I'm using a stored procedure in SQL to create a report. What I need it to do is count some records... sounds simple. However, I want the total by Type, which is either 0 or 2 in this case. The trouble is, whereas the return value of a count for Type=0 is correct, the type=2 count is way off. This is because the collationID is repeated numerous times with a Type=2 value. What I need to do somehow is count the fields with a Type=2 but where there is a DISTINCT CollationID value. Any ideas? I'm well confused?!

Thanks,
Chris

-- lists all
SELECT Type, CollationID
FROM Task

--counts Type=2
SELECT count(*) AS Total
FROM Task
WHERE Type =2
GROUP BY Type

raf
06-17-2003, 12:27 PM
I'm well confused?!

So am i :D

SELECT count(*) AS Total
FROM Task
WHERE Type =2
GROUP BY Type

The goup by wount do a lot here since you have the type in your condition.
If you need the counts for both 0 and 2, then run
SELECT count(*) AS Total, type
FROM Task
GROUP BY Type
This wull return a recordset with two records. (count for 1 and count for 0)

What I need to do somehow is count the fields with a Type=2 but where there is a DISTINCT CollationID value

Hmm. Something like
SELECT count(*) AS TotCollation, CollationID
FROM Task
WHERE Type=2
GROUP BY CollationID
This will return a recordset with just as many CollationID there are for records with type=2. So it agregates all records with the same value for CollationID. If CollationID is the primary key or unique variable... it wount do anything

Or, if you need the count for all combination of CollationID and Type (maximum of number collationID's X 2)
SELECT count(*) AS TotCollationType, CollationID, Type
FROM Task
GROUP BY CollationID, Type

christrinder
06-17-2003, 01:27 PM
Thanks Raf, but I think my terrible explanation has mislead you a little, Sorry. Imagine a table that contains sales leads. It starts with an inquiry, which is given a Type=0 value, and then numerous sales leads follow-ups, which are entered into the same table but with a Type=2 value. I suppose you could think of it as this forum, with a threadstarter given a Type=0 value and all replies to that post are give a Type=2. Of course, all of these records are linked with the CollationID which is the same for each record (not unique). The figure I actually want to produce is the number of sales leads with no follow-ups. The best way I can think to do that is to count the Type=0 records, and minus from that the count for Type=2 records, BUT where the CollationID's in the second count are DISTINCT. Any ideas because I'm properly at a loose end! Thanks in advance. Chris

christrinder
06-17-2003, 01:46 PM
OK, this is about as close I can come up with. It produces the error: "Only one expression can be specified in the list when the subquery is not introduced with exists". If you take out the 'CollationID' from the second subquery, then it actually works, but the count is then all of the type=2 records and not those that are distinct from each other. Any ideas anybody?

SELECT TOP 1
(SELECT COUNT(*) AS total
FROM Task
WHERE Type = 0) AS leads,
(SELECT DISTINCT CollationID, COUNT(*) AS total2
FROM Task
WHERE type = 2) AS followups
FROM Task

raf
06-17-2003, 03:17 PM
The best way I can think to do that is to count the Type=0 records, and minus from that the count for Type=2 records, BUT where the CollationID's in the second count are DISTINCT
I think i get the picture.
--> If a new sales lead is started, a records is inserted with type=0 --> each CollationID can have only one record with type=0
--> for each follow up, a record is inserted with type=2--> each CollationID can have n records with type=2

No?

The the number of sales leads with no follow up is the number of CollationID's that do not appear in the follow-up collection

SELECT count(*) as numvirging, CollationID FROM task WHERE CollationID NOT IN (SELECT DISTINCT CollationID FROM task WHERE type=2 GROUP BY CollationID)

<edit>Also, since the "started" or "followed up" atribute, is on the Collation level, you should have this in the collation-table. It'll speed up things + follows more a relational database logic.

You can probably drop the "GROUP BY CollationID"
</edit>

christrinder
06-17-2003, 03:24 PM
Cheers Raf. You should really started charging for all this help you dish out... could buy yourself a couple of posh motors within a few years!!! Thanks again, really appreciate your help on that. Chris

raf
06-17-2003, 03:40 PM
You're welcome.
You should really started charging for all this help you dish out... could buy yourself a couple of posh motors within a few years!!!
"I measure my wealth by the things i can do without", so charging for this would get me into even bigger problems :D

christrinder
06-19-2003, 02:11 PM
Back once again with the same Renegade Stored Procedure!!!

OK, if you remember, you helped me sort this out yesterday so that it counts the number of records that don't have an associated type2 record. However, I've now been told that the system automatically creates the first type2 record. Therefore, I need the system to count the number of records with less than 2 (i.e. only 1) associated type2 record? Any ideas?

This stored proc has become so complex that I've had to make it so that it runs overnight and creates a whole table of all the relevant data, and then another stored procedure accesses this table to create the reports on the fly!

Cheers,
Chris

raf
06-19-2003, 03:34 PM
Hmm. OK. No Problem. Say we first ran a regular count, like
SELECT DISTINCT Coun(*) as numrec, CollationID FROM task GROUP BY CollationID
to create or first view. In this view ('table') we'd need only the records where count is 1. I don't think we can use the view as a subselect . But we can use it as the source for the first select and use the first count in the condition. So to get the count of the virgin records, you can use something like
SELECT Count(*) as numvirging FROM (SELECT DISTINCT Coun(*) as numrec, CollationID FROM task GROUP BY CollationID) WHERE numrec =1
I've ran selects like these without problems, so this one should also work

This stored proc has become so complex that I've had to make it so that it runs overnight and creates a whole table of all the relevant data, and then another stored procedure accesses this table to create the reports on the fly!

Welcome to the interesting and extremely profitable world of datapreparation! The only thing you need to take care of, is that the underlying data doesn't change after you made the agregated data. So you can only report on 'fixed' historical data + probably best to save these 'datamarts' --> the generated and optimized flatfiles you create as source for your reports and set up some sort of datawarehous architecture (which means you also need to archive the dimensiontabels after each run). But your reports will be generated much faster + you'll be able to reconstruct the reports at a later date.