PDA

View Full Version : Designing a Normalized Database


atheistrical
10-13-2009, 07:02 PM
I am in a fix on how to crack this database design nut. I believe that all you coders are gonna love this.

I have a class of students whose grades are to be recorded. There are 52 distinct types of grades that a single student can get. A single student can get plural types of grades. Also a student can get same type of grade multiple number of times or not receive any grades at all. Consider the following sample:

Student A gets Type I Grade 3 times, Type II Grade 1 time
Student B gets No Grade at all
Student C gets Type II Grade twice, Type XII Grade 5 times
etc, and so forth

I have a report based on grades to be displayed by querying this database, example report may be:

Names of students that received no grade at all
Names of students that received Type II Grade once
Names of students that received Type VI Grade more than once
etc and so forth

What are your suggestions??

Old Pedant
10-13-2009, 09:00 PM
Well, my first suggestion would be that you do your own homework.

We don't *DO* homework. See the rules of this forum.

You show that you have made some effort. Show a DB design, show some queries that you have tried and show in what ways the worked and what ways they failed. Then we can help you out.

But we can't do your entire homework for you.

I will tell you that this DB design couldn't be much simpler. At most you would need 3 tables. For homework purposes, I would imagine the instructor will let you get away with only 2.

atheistrical
10-14-2009, 02:02 AM
My apologies! I admit my fault, I forgot to mention what I had already tried before posting this here. The following is what I cooked up:

Table : Grade_Times (Two Columns)

student_id (foreign key references student table)
grades (varchar type default null)

I used a single column to store both the grade code no as well as the factor (i.e. times that grade was received by a student)

Lets assume the following:

Student I (with student_id = 1) received grade code 5, 2 times
Student II (with student_id = 12) received no grade
Student III (with student_id = 43) received grade code 1, 1 time
etc and so forth

I use the following php code to enter the above data:


<php
connect_to_db(); //Function that connects to database
-------------
$grade1[0][0]=5; //Grade code for Student I
$grade1[0][1]=2; //Times received
$grade2[1][0]=NULL;//For Student II
$grade2[1][1]=NULL;
$grade3[2][0]=1;//For Student III
$grade3[2][1]=1;//Times received
$data1=serialize($grade1);
$data2=serialize($grade2);
$data3=serialize($grade3);
$query1="INSERT INTO Grade_Times(student_id,grades) VALUES (1,{$data1})";
$query2="INSERT INTO Grade_Times(student_id,grades) VALUES (12,{$data2})";
$query3="INSERT INTO Grade_Times(student_id,grades) VALUES (43,{$data3})";


Data retrieval was just vice-versa by using the unserialize() function of php and using array operations on the data retrieved after unserializing.

Old Pedant
10-14-2009, 05:33 AM
No, bad design. Terrible design, in fact. *NEVER* store multiple values into a single DB field. Especially delimited values. Most especially not values delimited by some serialization scheme that is not part of the DB in use.

And don't store non-existent data. No reason to record that student 12 has no grade. If you are going to do that, you should also record that student 1 has no grade I and no grade II and no grade III and.... And same for all the students. *SHUDDER*. That would get horribly ugly fast!

Also, I would have just kept each grade in a separate record. Much less trouble than trying to keep both the grade and a count for that grade. (If you use your scheme, any time you need to add a grade for a student, you have to first check if that student already has that grade and, if so, perform an UPDATE instead of an INSERT. Follow the KISS principle.)

And I don't think you should even try to associate an array in PHP (or any other language) with a table in the DB. Oh, there are occasions where that's useful, but clearly not for something this simple.

So I would have done just:
INSERT INTO Grades(student_id,grade) VALUES (1,5);
INSERT INTO Grades(student_id,grade) VALUES (1,5);
INSERT INTO Grades(student_id,grade) VALUES (43,1);

With no records at all for student_id 12, of course.

That would be the normal normalized way of things in a DB. (I could have said "usual normalized" but that's not as much fun.)

From that data, you can answer virtually any question needed.

I'll pick one of yours, the one you are most likely to struggle with (I think):

"Names of students that received no grade at all"
SELECT S.name
FROM Students AS S LEFT JOIN Grades AS G
ON S.student_id = G.student_id
WHERE G.grade IS NULL
ORDER BY S.name;

That could also be done via:

SELECT name
FROM Students
WHERE student_id NOT IN ( SELECT student_id FROM Grades )
ORDER BY name;

And a good DB will convert whichever of those is less efficient into the more efficient form, for you. Most likely, the LEFT JOIN for will be more efficient, but I wouldn't presume to make that a statement of fact.

Old Pedant
10-14-2009, 05:39 AM
Even in PHP, by the by, your code falls flat on its face.

How would you represent student_id 17, who has 3 Grade 17s, 2 Grade 5s, and 1 Grade 3???

Your scheme only allows for a single pair of numbers per array and doesn't show how to associate a given pair with a given student. Much less a *set* of pairs with a given student!

You'd need a THREE DIMENSIONAL array to do that:
$grades[student_id][grade][occurrences]

And even then it would be problematic to use that. How could you quickly find out if student_id 43 has any grade 4's??? You'd have to loop through the array looking for 43 in the first element *AND* 4 in the second element. Ugh.

This is what DBs were invented for and it's why DBs are superior to arrays in so many ways. But when you try to store an array in a DB, you tend to defeat all the relational advantages.

atheistrical
10-18-2009, 06:31 AM
Thank you for your feedback. I figured out from what we have discussed so far that the most likely database design would be:

2 Tables

table 1 : student (2 columns, student_id & name)
table 2 : grades (3 columns, grade_id, factor & student_id)

Samples of the above table would be:

table : student
student_id | name
~~~~~~~~~~~~~
1 | Jack
2 | John
3 | Amy

table : grades
grade_id | factor | student_id
~~~~~~~~~~~~~~~~~~~~~
5 | 1 | 1 (ie Jack)
3 | 3 | 1 (ie Jack)
2 | 2 | 3 (ie Amy)

The above table can be explained row-wise as: Jack received Grade 5, 1 time and Grade 3, 3 times. John received no grades at all. Amy received Grade 2, 2 times

Sample query would be:

Q: Who received no grades at all?
A: SELECT S.name FROM student AS S WHERE S.student_id NOT IN grades

I require your valuable feedback on this please!

Old Pedant
10-19-2009, 04:23 AM
I say, again, I think you are making a minor mistake in having the factor field.

As I noted before, if you do that, then every time you get a new grade report you have to *first* check to see if that same grade for that same student already exists in the table. If so, you do an UPDATE query on the table. If not, you do an INSERT query.

As simple as this table is and as little room/data as it takes to represent each grade, I would *NOT* use the factor field.

Instead, I'd have:

table : grades
grade_id | student_id
~~~~~~~~~~~~~~~~~~~~~
5 | 1 (ie Jack)
3 | 1 (ie Jack)
3 | 1 (ie Jack)
3 | 1 (ie Jack)
2 | 3 (ie Amy)
2 | 3 (ie Amy)

Now...my answer might be different if this was a huge DB and/or if you expected the number of READ requests of the data to outnumber the WRITEs by a large amount.

It's clearly not *wrong* in any sense to have the factor there; I just think it adds more work to what could be a dead simple design.

If you were to adopt your scheme, you'd *probably* want to implement a stored procedure that would accept a new grade (grade_id and student_id) and make the needed "if" test and then do the UPDATE or INSERT.

bazz
10-19-2009, 03:38 PM
I would think that, because you have a many to many relationship between students and grades, that you need a third table.



create table student
( id int not null auto_increment primary key
, name varchar(99)
/* other student data */
) bleh.....

1 | Jack
2 | John
3 | Amy

create table grades
( grade varchar(99) /* 99 may be excessive depending on what data you need there*/
) bleh....


create table students_grades
( grade | student_id
)

5 | 1 (ie Jack)
3 | 1 (ie Jack)
2 | 3 (ie Amy)



I have not done any of the other things that you would need to do with your tables such as clarifying the default charset or engine indexes and keys etc.

hth

bazz

Old Pedant
10-19-2009, 08:10 PM
Yes, except that his grade_id and grade_value are one and the same thing.

His grades table, if it existed, would be just

grade_id : grade_value
1 1
2 2
3 3
etc.

So why bother with the table?

bazz
10-19-2009, 08:16 PM
Ah. I was thinking of when he wants to input the data to the db through his 'control panel'. the grades table would enable a 'select' drop down to be populated reducing the need to continually alter his code if new grades were to be available. Just add them to the grades table.

bazz

atheistrical
10-21-2009, 02:37 AM
I say, again, I think you are making a minor mistake in having the factor field.

As I noted before, if you do that, then every time you get a new grade report you have to *first* check to see if that same grade for that same student already exists in the table. If so, you do an UPDATE query on the table. If not, you do an INSERT query.

As simple as this table is and as little room/data as it takes to represent each grade, I would *NOT* use the factor field.

Instead, I'd have:

table : grades
grade_id | student_id
~~~~~~~~~~~~~~~~~~~~~
5 | 1 (ie Jack)
3 | 1 (ie Jack)
3 | 1 (ie Jack)
3 | 1 (ie Jack)
2 | 3 (ie Amy)
2 | 3 (ie Amy)

Now...my answer might be different if this was a huge DB and/or if you expected the number of READ requests of the data to outnumber the WRITEs by a large amount.

It's clearly not *wrong* in any sense to have the factor there; I just think it adds more work to what could be a dead simple design.

If you were to adopt your scheme, you'd *probably* want to implement a stored procedure that would accept a new grade (grade_id and student_id) and make the needed "if" test and then do the UPDATE or INSERT.

I would ask now, how would you generate a report based on your suggested scheme?
Say, displaying a report that gives a class summary about grades which would display:

Grades Received by Jack are : Grade 5, 1 time and Grade 3, 3 times
Grades Received by Amy are : Grade 2, 2 times

Old Pedant
10-21-2009, 02:56 AM
Well, I showed you how to find those students with no grades at all in my 2nd post:
SELECT S.name
FROM Students AS S LEFT JOIN Grades AS G
ON S.student_id = G.student_id
WHERE G.grade_id IS NULL
ORDER BY S.name;


So a similar query:

SELECT S.name, G.grade_id, COUNT(*) AS howMany
FROM Students AS S LEFT JOIN Grades AS G
ON S.student_id = G.student_id
GROUP BY S.name, G.grade_id
ORDER BY S.name, G.grade_id

The will get you records such as:

name : grade_id : howMany
Amy : 2 : 2
Bob : null : 0
Jack : 5 : 1
Jack : 3 : 3

And then some simple coding in PHP (or ASP or JSP or whatever) will produce:

Grades Received by Amy are : Grade 2, 2 times
Grades Received by Bob are : NONE
Grades Received by Jack are : Grade 5, 1 time and Grade 3, 3 times

atheistrical
10-21-2009, 05:29 AM
Thank you Old Pedant for the excellent guidance. You really rock! I appreciate all of your posts. Thanks also to bazz for your inputs.