...

View Full Version : Complete Reorganization of Column/Row data



sglass
11-22-2011, 05:25 PM
I inherited a really disorganized database and I'd like to clean it up but am not sure what the best way would be. I am using SQL Maestro for MySQL. The data in each column/row is preceded with a prefix (N01, N02, N02, etc), but the data within the rows is not in order. I've created columns for each prefix, and am looking for the most efficient way to reorganize the data into the correct columns now without having to do 64x64 update statements.

Here is what I have to work with:

Original Table: Universal
UPC col1, col2, col3, N01, N02, N03, N04...N64
1234567890 N01, N02, N03
2345678901 N33, N04, N05
3456789012 N01, N03, N04

I'd like to make it look like:

UPC col1, col2, col3, N01, N02, N03, N04, N05...N33...N64
1234567890 N01, N02, N03
2345678901 N04, N05....N33...N64
3456789012 N01, N03, N04

guelphdad
11-22-2011, 06:09 PM
Um, no you wouldn't.
Look up database normalization so you understand the concept before going forward.

you shouldn't have to worry about putting the data in the table in order, you should be doing that in a query.

any time you find out you have a sequence of columns like you do N01... N02... N64 then your data is most likely not normalized.

sglass
11-22-2011, 06:48 PM
I am somewhat familiar with the concepts of normalization, but I don't think that is what I need here. Even if it IS what I wanted, it would not be possible until I get all this data into organized columns.

N01, N02, etc are just the prefixes needed with our end application. Each field will have unique data after the prefix.

Ideally, after getting the data into the correct columns I could eliminate the prefixes from the data itself and have them concatenated onto the data during the export process only.

BTW I'm working with 1 million plus rows here, so re-entering data is not an option. I mean....I'd rather do the 64X64 queries. LOL

Old Pedant
11-22-2011, 08:48 PM
Sorry, but I'm with GuelphDad on this.

But I can't make sense out of your first post.

Your two examples are essentially identical. They both have all those columns labelled N01 through N64.

And neither one seems to have any data in the Col1, Col2, Col3 fields.

Almost surely, the best organization would be something like this:


UPC | tag | data
1234567890 | N01 | whatever data goes with N01
1234567890 | N02 | whatever data goes with N02
...
1234567890 | N37 | whatever data goes with N37
...
1234567890 | N64 | whatever data goes with N64

But unless we see the real table design and data, it's hard to know for sure.

sglass
11-23-2011, 01:47 PM
Sorry for the improper example. Hope this gives a better view of the deplorable state of this data:

UPC | A | B | C
1234567890 | N01 Green Beans | N03 12 oz. | N02 French Style
4564567897 | N01 Blue shirt | N02 XXL | N32 Hanes
5645613232 | N01 Frozen Dinner | N03 16 oz. | N21 Micro 6 minutes
7878454566 | N34 Famous Movie | N57 PG-13 | N55 synopsis

but would like it to be as such:

UPC | N01 | N02 | N03 | N32 | N34
1234567890 | Green Beans | French Style |12 oz. | |
4564567897 | Blue shirt | XXL | | Hanes |
5645613232 | Frozen Dinner | |16 oz. | |
7878454566 | | | | | Famous Movie

guelphdad
11-23-2011, 02:24 PM
And what we are saying is you don't want that, it will be much too difficult to effectively query your table beyond a trivial SELECT * from the table.

sglass
11-23-2011, 02:40 PM
On the contrary, I KNOW this is what I want. N01 is the shortcode for Product, N02 for product description, N03 size, N33 Artist, etc. They really should have been the column headers from the get go, but the person before me underestimated how large the database would get, and the need to actually organize it properly. Now I'm left with a big problem.

Right now it is impossible to find stuff. Some Instructions (N21) are in column F and some in column P, some have product size (N03) in column B others have product description (N02) in column B. Basically, previously they filled in the data across the rows without regard to column organization at all.

Perhaps there is no easy way to clean it up. :(

Old Pedant
11-23-2011, 09:45 PM
Sorry, but we *STILL* have to disagree with you.

There *ARE* reasons that people invented NORMALIZATION. People much more expert than GuelphDad or I. And both GuelphDat and I have surely been "bitten" way too many times with bad DB designs that we had to clean up.

If you organize the data the way you are suggesting, many many queries will be more difficult and/or slower to perform.

Anyway, whether you do it your way or listen to us, what you want is not particularly hard. Just time consuming.

You could either do this by writing a dirt simple PHP/ASP/JSP/whatever program or you could do it using a MySQL stored procedure.

You *would* need to do it by copying the existing data into a new table (or, if you listen to us, pair of tables). But the code is pretty trivial.

I don't code in PHP, though I probably could figure out how to do this one, but I'm not going to try. I'll write it in ASP code and you can translate.

The code assumes that you have already created the new table. It assumes that UPC is in the first column (column 0) and that all the other columns contain your "Nxx" stuff.



<%
Set conn = Server.CreateObject("ADODB.Connection")
conn.Open "...connection string to your DB..."

Set RS = conn.Execute("SELECT * FROM oldtable")
Do Until RS.EOF
SQL = "INSERT INTO newtable (UPC) VALUES('" & RS("UPC") & "')"
conn.Execute SQL
For fldnum = 1 To RS.Fields.Count
data = RS(fldnum)
ns = Left(data,3) ' will be N01, etc... but let's make sure:
If Left(ns,1) = "N" AND IsNumeric(Mid(ns,2)) Then
SQL = "UPDATE newtable (" & ns & ") VALUES('" & Trim(Mid(data,4)) & "')"
conn.Execute SQL
End IF
Next
RS.MoveNext
Loop
%>

That's the simplest coding. With a little more work, we could make it much more efficient. But as I doubt you would be using ASP, I won't bother writing it.

It would probably be most efficient to translate this into a MySQL Stored Procedure.

*************

If you follow our advice, you would instead do something like this:


<%
Set conn = Server.CreateObject("ADODB.Connection")
conn.Open "...connection string to your DB..."

Set RS = conn.Execute("SELECT * FROM oldtable")
Do Until RS.EOF
SQL = "INSERT INTO newtable (UPC) VALUES('" & RS("UPC") & "')"
conn.Execute SQL
For fldnum = 1 To RS.Fields.Count
data = RS(fldnum)
ns = Left(data,3) ' will be N01, etc... but let's make sure:
If Left(ns,1) = "N" AND IsNumeric(Mid(ns,2)) Then
SQL = "INSERT INTO nsTable (UPC,NUM,VALUE) " _
& " VALUES('" & RS("UPC") & "'," & Mid(ns,2) & ",'" & Trim(Mid(data,4)) & "')"
conn.Execute SQL
End IF
Next
RS.MoveNext
Loop
%>


That assumes your nsTable looks like this:


CREATE nsTable (
UPC varchar(50) REFERENCES newtable(UPC),
NUM int,
VALUE varchar(500),
PRIMARY KEY (upc,num)
);

Using a composite primary key is a good idea, but if you do a lot of searching by NUM then you might be better off with a separate key on that column (non-unique, of course).

sglass
11-25-2011, 01:37 AM
Thanks for the advice. I'll have to address this with my boss.



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum