View Full Version : Basic (hopefully) Question about MySQL

09-15-2006, 05:11 AM
Hi... I am a newbie at SQL, and I've been doing some tutorials and making some good progress with the technical side. However, I'm approaching MySQL from an XML standpoint, and there's one aspect of XML data structures that I can't figure out how to represent in MySQL. To be clear, let me represent the relationships I want in XML:


Basically there's 2 parts to my question:

1. If each of these XML tags represents a field in MySQL (fname, lname, etc), then how can I represent the data structures WITHIN the "email" field?

2. Let's say that EVERY other "employee" has only 2 email addresses, but this jerk "Bob" has 15. It seems really silly to have to create a ton of extra "email" fields just for this one guy... and besides, what if someone else comes along with 30 email addresses?

This is the one thing I can't find in any MySQL tutorials... any help is appreciated! Thank you!

09-15-2006, 10:04 AM
it's not a real mysql issue...

i don't know your expertise, but i'm assuming you're not an expert on db-normalisation because of the number of emails question.
typically, in a normalised db, you would have a table with the employees data in (1 row for each employee), that has a primary key column (an numeric columntype, with a unique index created on it, and that is the autonumber column of your table)
you would then have another table with all email-details (one row for each email-address, or more precise, for each email-employee combination because it's very well possible you'll have an email address that is used by multiple employees) and that also has a primary key + that contains the PK of the employees table as a foreign key.
this way, it's doesn't maater how many emailaddresses the employee has, and you can simply get the employee and email-details by joining the tables on the PK - FK columns.

09-15-2006, 11:13 AM
Just to expand on what raf has said, you table structure would look something like

Table Employees
|EmployeeID|fname|lname |phone |
| 1 |Bob |Smith |388-2339 |

Table Email
|EmployeeID|Type | Email |
|1 |Home | something@yahoo.com |
|1 |Home | something@hotmail.com |
|1 |Work | somethingelse@yahoo.com |
|1 |Work | somethingelse@hotmail.com |

You could in theory have a table for email types aswell but if thats something that wont often change it may be simpler/better to use an enum field instead

09-15-2006, 02:44 PM
Yes, I am definitely NOT a DB expert anything! This is all coming as news to me...but it's very helpful too!

So let's say I want to obtain all the data pertaining to Mr. Bob Smith and his 10,000 email accounts... I would need to join the tables then right?

Also I'm not so clear on primary and (especially) foreign keys, but from what I've gathered, I don't think I'll be able to use foreign keys as I'm using MyISAM... would that be correct?

Thanks for all the help!

09-15-2006, 02:57 PM
yes you would performa a join, see the link to my site, there is an article on basic joins that should clarify things for you.

also you can use foreign keys, they are just not enforced when using myisam tables.

are you on a hosted server? If so then they may limit you to myisam tables, if you are hosting the database yourself then you can easily change the table type to innodb by altering your table as follows:

alter table tablename engine=innodb

09-15-2006, 04:54 PM
So let's say I want to obtain all the data pertaining to Mr. Bob Smith and his 10,000 email accounts... I would need to join the tables then right?
we'll, that depends.
for instance; consider a situation where you have a screen with all the employees and then a link like 'email this person'. In the querystring of that link, you can then add the PK from the employees-table for that employee. And to get the email(s), you then just run a simple select with a where clause like "select bla from email wher employeeID=$_GET['eid']"
I usually try to optimise my db-design for the application that is running on top of it and by cleverly using some drill-down logic an storing PK values is sessionvariables, i try to avoid making joins and subselects.
if you pass on info about employees or emails or ... then you should be able to pass it by it's PK instead of using the employees name etc. Names and emailadresse are just descriptive data so if you need to get Bob Smith's emailadresses, then in almost every possible situation, you should already know Bob Smiths employeesID at that point.

but joining tables is not realy difficult and, as long as your tables aren't to big and if the columns you join on are indexed, it doesn't really have a big effect on your db-performance. so you should realy look into it and use them whenever they are the best sollution.

09-15-2006, 05:06 PM
To give an idea of what BIG is, (because some people are under the impression that 50,000 rows is big) we are talking in the tens of millions of rows.

properly indexed tables, properly tuned database would give you know problems retrieving information.

Have a look at the Partners list on mysql.com website. Those are some pretty impressive companies, you know they wouldn't be using mysql if it couldn't handle huge applications seemlessly.

Mind you at large sizes you may also consider computer performance on top of everything else.

09-16-2006, 05:19 PM
Thanks everyone!

I actually managed to get my DB up and running!
I see now it was a simple question, but the problem is that there are tons of tutorials out there for the technical side of things, and almost none that tell you about database theory and how (and when) to arrange your data in certain ways.

I'm sure that larger databases (even 50,000 lines) would be more challenging to deal with (mine is only 2 tables of about 1000-2000 lines each), but for now I got done what I wanted!

Thanks a lot!

09-16-2006, 08:12 PM
did you index your table? If you didn't you should do so on the columns you query most or ones you will join on and you will see query performance increase.

remember that indexes speed up searches and sorts but slow down inserts and updates so you don't want to overdo the indexes.

09-17-2006, 07:16 AM
Well, I tried it after I saw this posting, but it started returning my data out of order... maybe I have something wrong in the coding, but this wasn't the functionality I wanted (and I didn't notice any significant performance increase), so I removed the indices. Perhaps my queries aren't complex enough to need them. I'll keep that in mind in the future though.

09-17-2006, 03:16 PM
I'd say your queries are wrong then. indexes have nothing to do with the order your data is returned, they have to do with how fast your table is searched.

Without an index on last name, find my name in the phone book. You will have to look at every name on every page until you find it.

I will turn to 'G' and then look for 'Gu' etc. and find it significantly faster.

Perhaps you can show us one of your queries that you say did not work the same with an index on it.

09-18-2006, 12:03 AM
I understand that, but nonetheless, turning the index on for a given column results in a different order than having it off. Perhaps it is because the data is not English characters...?

But then now is the part when I have to come clean and say that the "employee data" was just an analogy for the sake of simplicity. My database actually indexes Chinese characters.

An example of one of my queries is selecting all characters from the table:

"SELECT kanji FROM kanjitable"

Sure enough, this will produce different results depending on whether or not an index is turned on in the "kanji" field. You can see for yourself at my site, which is now up and running:


click on the "all" button... this is the actual "order" of the characters... the first 10 are the characters for 1-10, etc... these will all become jumbled if the index is turned on. It works great as it is now, though, so no problems!

09-18-2006, 01:44 AM
so then I have to ask you are you using the proper collation for your table? collation type determines how your characters are sorted. are you even using the correct character set? the default character set and collation type are latin 1. since you are not using this character set you should specify not only the chracter set but also the collation type.

you will note in your example above you are not using an ORDER BY clause at all so you are trying to determine the order based on how you have entered your rows in this table.

that is a false assumption since the select statement can return your data in any order at all and not the order you have entered them in the table. you need to specifically have an ORDER BY clause. since you don't, and say you have a different order when the index is on then I will almost guarantee you are not using the proper collation type for your table.

and yes, you should explain your situation along with your original example. had you mentioned that at the beginning of the thread you'd have had the information about character sets and collation types sooner, rather than after the fact.

09-18-2006, 07:16 AM
I think I see what the problem is now. Turning on indexing has the same effect as an ORDER BY clause, as the data has to be in order to allow MySQL to "jump around" (i.e. your example of finding "Gu" by starting at "G").

The problem is that I am using a utf8 character set, and there apparently isn't any standard collation method for higher-order utf8 characters (i.e. non-European languages, or perhaps just the Chinese character subset in particular). This really doesn't come as any suprise to me, as collating these characters would require determining the order of something like 50,000 characters for Chinese characters alone, none of which have anything other than an arbitrarily set order, as they are not alphabetic.

There are only 2 collation types for the utf8 character set which do not apply only to European languages, "utf8_general_ci" and "utf8_unicode_ci". Neither of these apparently collate Chinese characters properly.

You're right that the characters just so happened to be in the order in which I entered them, however the point is that this is the order in which I need them, not the jumbled order that indexing or a GROUP BY command returns.

Also, remember that my original post had nothing to do with character sets or collation. I purposely left that part out to pinpoint the information I needed and avoid confusion (as it has clearly caused).

09-18-2006, 02:00 PM
You're right that the characters just so happened to be in the order in which I entered them, however the point is that this is the order in which I need them, not the jumbled order that indexing or a GROUP BY command returns.

When you query your table without an order by clause you are not guaranteed to receive your results in the order you have entered them into your table. this is especially true if you delete any items from your table.

rows can be inserted anywhere into the table and the database decides where to insert them. You are not working with a flat file system.

You may not be aware of that information that is why I am passing it along as you seem to be relying on your data always being in the same order when you retrieve it from your table.

Also, remember that my original post had nothing to do with character sets or collation. I purposely left that part out to pinpoint the information I needed and avoid confusion (as it has clearly caused).

No worries, but it is important to you that you knew about the information. Now you do.

09-18-2006, 05:57 PM
Yes, in my case I think I am lucky as the data itself is not going to change, and I won't be performing any deletion of rows. It only stands to reason that without an ORDER BY clause the server will return the data as it sees fit. If things start getting wacky with the ordering then I may need to "ORDER BY id".

But I think the important information for people to know is that if you use an ORDER BY command on varchar data using higher-order utf8 characters... it may not work! This could be considered a bug, but I think it's more of an inevitability of languages with complicated writing systems.

Anyway, it took a lot of research as well as trial and error to figure this out, so keep it in mind!