PDA

View Full Version : Data Normalization : practical help needed (theory is understood)


mOrloff
03-25-2010, 05:48 PM
I have a huge table (1.7M records) which is just a big bucket, and I want to break it up into a more optimized structure (this one table (products) is to be divided into 9 tables (products, categories, vendors, etc...)).

I can write a PHP script to grab each row and write the data into the new tables, but that seems like the long way about it.

How should this be done ??

~ Mo

Old Pedant
03-25-2010, 07:38 PM
Why not do it all with MySQL queries???

For example:

CREATE TABLE categories
(
catid INT auto_number primary key,
catname VARCHAR(80)
);
INSERT INTO categories( catname )
SELECT DISTINCT categoryName
FROM products;

CREATE TABLE vendors
(
vendid INT auto_number primary key,
vendname VARCHAR(80)
);
INSERT INTO vendors( vendname )
SELECT DISTINCT vendorName
FROM products;


And do similarly for other tables.

Then, as the last step, you can do something like:

CREATE table newproducts
(
prodid INT auto_number primary key,
catid INT references categories(catid),
vendid INT references vendors(vendid),
xxx ,
yyy ,
...
);
INSERT INTO newproducts( catid, vendid, xxx, yyy, ... )
SELECT C.catid, V.vendid, P.xxx, P.yyy
FROM oldproducts AS P, categories AS C, vendors AS V
WHERE P.categoryName = C.catname
AND P.vendorName = V.vendname
...


???

mOrloff
03-25-2010, 08:43 PM
So, the INSERT INTO is self looping ??
Hey, if there's no need to expressly call for looping, that explains my roadblock :D
(For some reason, my brain sometimes envisions each and every little step, even if those are already taken care of for me :rolleyes:)

I'll pull up PHPMyAdmin and try it out (unless you had any other suggestions).

Thanks-a-bunch.
~ Mo

Old Pedant
03-25-2010, 10:00 PM
No, the SELECT is "self looping".

SELECT will select *ALL* the records you specify.

And then, yes, it "feeds" them into the INSERT INTO. But it's the SELECT that does the "looping". (Funny, I never think of it as "looping". I just think of it as creating a set of records--and then inserting the entire set into the other table--but of course you are correct, that internal to the query engine it has to be doing a loop.)

Old Pedant
03-25-2010, 10:02 PM
p.s.: Of course, you should do this all in a backup copy, until you have it working.

I'd just create a temporary "working" table with a couple of thousand records and then use it as your test bed.

Again, you can do that from your main table, thus:

CREATE testtable LIKE products;

INSERT INTO testtable
SELECT * FROM productes LIMIT 2000;

mOrloff
03-26-2010, 03:52 PM
Thanks once again.