01-26-2013, 02:20 PM
I'm setting up basic tagging for a song list system, and thought some experienced input before I go delving into setting up the system would be beneficial.
I'm adding 50s, 60s, 70s, 80s etc, so users can select one to include in the search query.
I'm wondering which would be quicker for queries:
1. assign each decade a number eg 50s = 1, 60s = 2, and store the number in a `decades` column - queries would include eg - "AND `decades` = 3"
2. have a char(0) column for each decade, set default to null, and set the relevant column to an empty string, so a query for an 80s search would include "AND `80s` ='' "
Also, just wondering if the amount of columns in a table affect search speed? I have other tags to add (about 50), and if option 2 above is quicker, I was thinking of adding 50 columns using the char(0) method (of which up to four would be queried at any one time).
Thanks for your input.
01-26-2013, 06:06 PM
The first one is the only one of the two that make sense. It's scalable while the second is not and requires modification to both structure and queries in order to accommodate a new decade added (or more likely, an OLD decade added). I wouldn't suggest using either directly as you have here though; music has been around for a very long time. If I say I want from decade 2 for example, what year would that represent between? Only the 1970's? Or perhaps I want to pull from 1670 as well. Even just dealing with the past 100 years would have a problem (or you need to keep giving it new decade multipliers).
I would suggest using the entire publish year, and if you have an actual date for it, the datetime datatype would work for this. You can do some quick calculations based on the year to get just the "10" from 2013 for example. Or you can simply use BETWEEN syntax to find it.
If you really want to stick with a decade identifier, I'd suggest using a second table with that (a lookup table), or using an ENUM with the data you need. I like the lookup tables especially if you want to add more information. You'd simply use an identifier number (likely autonumber), the decade it represents (1980), and you could add some stuff like the type of music information if you really want.
Tagging systems themselves are simply a many to many relationship between a tag an and record. Decade tags carry no meaningful information on their own, and are equal in value to a genre tag. So I could tag a song with 1980, pop, rock for example, and can find that record by searching for any of the three tags, or refine as needed.
When you are designing, if you are in doubt, than lean in favour of data, not structure. It is far easier to manipulate data into a normalized structure than it is to hammer structure together that was built as afterthought.
01-27-2013, 06:26 AM
Fou-Lu - thank you for the very thorough reply.
I have a couple more questions if you wouldn't mind? Plus I'll further explain my set up.
The system is for displaying karaoke songs to singers on their mobile phones. There will be one drop down menu that allows them to select Genres, or decades or duets. So essentially, decades will become genres, starting at 'pre-50s', '60s' etc, as will duets.
The reason I'm piling them all in together is that there is limited space on mobile phones, and I am happy with adding just one more user input option to the screen to compliment the other essential input options I already have.
Although this limits the search options (eg all 80s songs), I think the simplicity of having less user input options to deal with makes sense, especially as people are usually drinking while operating!
From a karaoke host's point of view, giving your user's the option of choosing '80s songs', or 'duets', or 'love songs' is a huge step above having all those songs mixed up in one hard copy book, and from experience, I've found that people ask for those types of books anyway eg 'do you have a duets book' or 'do you have a love songs book' - they never ask for 'a rock song duets book from the 80s'.
So bearing that in mind, plus the following info below, what would be a better way to set up tagging?
Re the many to many relationship for tagging, in my system book owners can tag their songs independently from other owners. So the main library that contains all the songs has a many to many relationship with the book owners, but I don't want to set up a many to many tagging relationship between the main library and tags, as owners will tag the same song differently. That's why I want to have the tagging data attached to the table that holds the owners-songs relationship.
Other question is - you mentioned that option 2 isn't scalable? That concerns me, as I have that option in place to record which of 5 books (karaoke shows can have 5 books per account) a song belongs to. How is it not scalable?
eg table columns (where the book columns are char(0) recorded as either "" or null to reduce data size:
userID | songID | book_1 | book_2 | etc
Thanks again Fou-Lu
01-28-2013, 04:12 AM
Why do you insist on making the HUMAN interface match the DATABASE schema?
Just put the ACTUAL year of the song into a SMALLINT column (SMALLINT can go up to 32767).
Then, if the user select's "80s", you just do
... WHERE songYear BETWEEN 1980 AND 1989 ...
If you make the field songYear and index, that will be every bit as fast as using an artificial decades numbering system. Plus it gives you full flexibility, later, to show the full year the song was published/first appeared, if you want to. (And you probably will, I would bet.)
And why why why would you be concerned about data size?
Your (pardon me, but it's true) silly scheme of using "" or NULL will save you, at most, *FOUR* bytes per song. If you had a million songs in your database, that's only 4 Megabytes. *NOISE* on a modern disk drive, where we typically measure database size in GIGABYTES. You queries will be *MUCH* slower using your scheme than if you just used a BOOLEAN NOT NULL field (which is the same as a TINYINT NOT NULL field, by the by). Plus, if you needed you could index the field(s) for performance. Something you can't do with your scheme. But the primary reason for doing this is for ease of maintenance. Somebody can come along and do a DB dump and instantly see the purpose and values of all your data.
Personally, with MySQL I don't worry about database size until I start getting up over 10 to 20 GIGAbytes. (And even then, if there is a good reason for the size, I don't worry about it.) It's not like you have to store this data *ON* the mobile devices. It's all stored on the server, so who cares how big it gets, within reason? *PERFORMANCE* is much more important, in almost all cases. And if I can trade off disk space for performance, I'll do it almost every time.
01-28-2013, 04:15 AM
Why do you limit karaoke books to 5 per account? Seems like a horribly artificial limit, to me.
Have you heard of database NORMALIZATION? If you lifted the limit *completely* and normalized the DB, you'd have a system now and for the future. As it is, you are boxing yourself in a corner because of your perception of current devices, without thinking about the future when surely device screens are going to be larger and/or more flexible. Think iPad, and its kin, with adjustable screen resolutions with a pinch of the fingers.
01-28-2013, 06:38 AM
You say they never ask for 'a rock song duets book from the 80s'.
Maybe not, but it would be pretty easy to offer that as an option.
Say they first choose 80s. And then you use JS to change the question to "ALL/Rock/Country/Metal" (or whatever choices you offer). If they don't choose something else within NN seconds (you determine NN through testing), they get "ALL".
And then you use JS to change the question to "ALL/Solo/Duets". Again, if they don't answer withing NN seconds they get "ALL".
And then you start displaying the list.
3 major KINDS of choices all in the same "real estate" on the phone's web page.
Lots more service to your users, hopefully will attract even more users.
And the best part: If you auto-detect the screen size, then if the screen is big enough you show all 3 questions at the same time. You only use the one-at-a-time on truly small screens.
p.s.: I worked for a phone-apps company in the earlier days of not-so-smart phones and this is the kind of tricks we would use. Dynamic "forms", though we never called them forms. Oh...and our system was all based on MySQL, by the by.