SELECT name,COUNT(*) as count FROM tablename GROUP BY name ORDER BY count DESC;
It will give me the count of each occurrence of a distinct value of a field. However, I am wondering if there would be a way to modify this to have a wildcard.
The background is, I have a field in my table that contains URL's. There are plenty that are in the same domain...ie www.codingforums.com, however, they all contain different points of reference after the .com. More specifically, my question is I want to search for number of occurrences for the domains. They all should begin either http:// or https:// but not all would have the www. and such.
Not sure it is possible but figured I would ask since Google searching didn't really give me anything useful to go off of.
SELECT name,COUNT(*) as count FROM tablename GROUP BY name ORDER BY count DESC;
It will give me the count of each occurrence of a distinct value of a field.
*** NOT TRUE! ***
It will count *ALL* records *PER NAME*.
If you wanted a count of DISTINCT VALUES then you would need to use:
Code:
SELECT name, COUNT( DISTINCT fieldname ) AS count
FROM tablename GROUP BY name ORDER BY count DESC
And if you wanted a count of NON-NULL values of a given field you would do
Code:
SELECT name, COUNT( fieldname ) AS count
FROM tablename GROUP BY name ORDER BY count DESC
But using COUNT(*) simply counts *ALL* records. Period.
***********
Having cleared that up...
Quote:
The background is, I have a field in my table that contains URL's. There are plenty that are in the same domain...ie www.codingforums.com, however, they all contain different points of reference after the .com. More specifically, my question is I want to search for number of occurrences for the domains. They all should begin either http:// or https:// but not all would have the www. and such.
So, more simply stated, you want to group by domain name alone. Yes?
There's no simple way build into MySQL. It's true that MySQL has regular expressions, which would seem the ideal way, but unfortunately they are limited to use with the RLIKE operator.
So you could easily search for all occurrences of codingforums.com for example, but FINDING all the domains is much trickier.
A lot of it will depend on how "regular" the contents of that field are. Does the field *ONLY* contain URLs? Or does it contain URLs embedded within text? e.g.,
*IF* all the fields contain *ONLY* URLs of the form
Code:
http(s)://(www.)xxxxx.yyy/anything
Then I think we can do this. But what about a url such as http://download.microsoft.com? That is, a subdomain?
Are you worried about them? I think it would be tough to create a simple expression to look for those, as well.
Ignoring subdomains, this SEEMS to work:
Code:
SELECT
SUBSTR(
REPLACE(REPLACE(REPLACE(CONCAT(url,'/'),'http://',''),'https://',''),'www.',''),
1,
LOCATE('/',REPLACE(REPLACE(REPLACE(CONCAT(url,'/'),'http://',''),'https://',''),'www.',''))-1
) AS domain,
COUNT(*) AS howmany
FROM urls
GROUP BY domain
ORDER BY howmany DESC
But *ONLY* if your URL field contains urls alone, no other text.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.
First, we concat a '/' to the end of the URL, just in case there is none there already.
Then we replace 'http://' and 'https://' and 'www.' with nothing.
That gets us down to something like codingforums.com/gobbledygook/
So then we use LOCATE to find the first '/' in that (which MIGHT be the one we added at the first step).
And then we use SUBSTR to get all the characters up to but not including that first '/'.
Make sense?
It does make sense, however I double checked my table and it does contain subdomains as well. Perhaps if it were possible to also ignore the first part of the subdomain as we did with the www., then as long as I could display the entire field at the end, I think I would get good results. I don't see any duplicate domains that have different sudomains....it seems if I have download.microsoft.com, that is all I have, I don't have ie.microsoft.com as far as I can tell.
If it's not obvious, that will collect subdomains separately from their domains.
That is, you might get something like this:
Code:
microsoft.com 119
download.microsoft.com 17
Ok, got notified of this reply as I was replying. If the query you posted returns results like you have above, then that would be exactly what I would want to see.
Ok, got notified of this reply as I was replying. If the query you posted returns results like you have above, then that would be exactly what I would want to see.
?? You are saying it is okay to get those subdomains separately? In that case we are done.
__________________
An optimist sees the glass as half full.
A pessimist sees the glass as half empty.
A realist drinks it no matter how much there is.