View Full Version : Subquery to Update one table based on different table
StupidRalph
02-24-2008, 08:37 PM
Need help writing a query (subquery) to help me normalize data. I have two tables:
catalog_product
catalog_artist
Currently, catalog_product table contains the artist's first and last name of which is rather redundant. So I created a catalog_artist table with each distinct artist and assigned them an artist id. I just need to UPDATE the catalog_product table to have the corresponding artist_id thats in the catalog_artist table.
catalog_product
CREATE TABLE `catalog_product` (
`artist_id` mediumint(8) unsigned NOT NULL,
`proper_first` varchar(150) default NULL,
`proper_last` varchar(150) default NULL,
`all_keywords` text,
`cat_subject` varchar(255) default NULL,
PRIMARY KEY (`publisher`,`number`),
CONSTRAINT `FK_catalog_products` FOREIGN KEY (`PUBLISHER`) REFERENCES `catalog_publisher` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 MAX_ROWS=250000 CHECKSUM=1 DELAY_KEY_WRITE=1 ROW_FORMAT=DYNAMIC
I've created a `catalog_artist` table to
catalog_artist
CREATE TABLE `catalog_artist` (
`id` mediumint(8) unsigned NOT NULL auto_increment,
`proper_last` varchar(150) default NULL,
`proper_first` varchar(150) default NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 CHECKSUM=1 DELAY_KEY_WRITE=1 ROW_FORMAT=DYNAMIC
This is the incorrect syntax for what I'm trying to do.
UPDATE catalog_product
SET catalog_product.artist_id = catalog_artist.id
WHERE catalog_product.proper_first = catalog_artist.proper_first
AND catalog_product.proper_last = catalog_artist.proper_last;
Brandoe85
02-24-2008, 10:06 PM
Use a join statement like any other query, but with a little syntax tweaking your query should work:
UPDATE catalog_product INNER JOIN catalog_artist ON catalog_product.proper_first = catalog_artist.proper_first AND catalog_product.proper_last = catalog_artist.proper_last SET catalog_product.artist_id = catalog_artist.id
Good luck
StupidRalph
02-24-2008, 11:45 PM
Thanks Brandoe! I actually had something similar but aborted it b/c it was running for a few minutes. But I have 14660 different artist being associated with 157,000+ products.
I'm going to let this run until it finishes and check out the results.
StupidRalph
02-25-2008, 07:22 PM
Note: Brandoe I'm just outlining my steps I've taken to solve my problem in case someone else runs into a similar problem and happens to find this thread. :thumbsup:
Okay, so I after letting the query run for more than two hours and then falling asleep only to awaken to see it still running I aborted the query again to hopefully add an index on the proper_first and proper_last columns. So I dropped the autoincrementing `id` index from the catalog_artist and made a composite primary key out of proper_first and proper_last. I then went to add the reference for the catalog_product table. At first I kept getting the (Cannot add or update a child row: a foreign key constraint fails) error message, so I had to run the following query to find out what was in the child table (catalog_product) but was not in the parent table (catalog_artist).
SELECT DISTINCT
cp.proper_first,cp.proper_last
FROM catalog_product cp
LEFT JOIN catalog_artist ca ON
(cp.proper_first = ca.proper_first AND cp.proper_last = ca.proper_last)
WHERE (cp.proper_last AND cp.proper_first) IS NOT NULL
AND
(ca.proper_last AND ca.proper_first)
IS NULL;
This returned all of the failed constraints. I then noticed that it failed on names that contained a comma. During the LOAD LOCAL DATA INFILE command it truncated names after the comma b/c that was what terminated a field EVEN THO they were enclosed in quotations.
#For example if in the .csv file I had:
"Wyatt , Jr","Norman"
#It would become
+-------------+--------------+
| proper_last | proper_first |
+-------------+--------------+
| Wyatt | Norman |
+-------------+--------------+
#When it should have been
+-------------+--------------+
| proper_last | proper_first |
+-------------+--------------+
| Wyatt, Jr | Norman |
+-------------+--------------+
So I corrected all of the mistakes and successfully added the foreign key.
Now I should be able to run the same query a lot faster now since it will be using the indexes.
I'll post back any follow up.
StupidRalph
02-25-2008, 07:44 PM
Success! It updated all of the `artist_id` columns in the `catalog_product` table.
It took (732073 ms) opposed to last night which ran for an estimated 11 hours. Now I can drop the proper_first and proper_last index and once again make the auto-incrementing `id`field the primary key. And also, drop the proper_first and proper_last columns out of the catalog_product table and solely use the artist_id foreign key.
But I can't help but think that it should have been faster since I used this:
EXPLAIN SELECT * FROM catalog_product
INNER JOIN catalog_artist
ON catalog_product.proper_first = catalog_artist.proper_first
AND catalog_product.proper_last = catalog_artist.proper_last;
Which is similar to the UPDATE query except its a SELECT statement and it did use a key for the artist table...any ideas there?
+----+-------------+-------+------+-------------------+-------------------+---------+------------------------------------------------+-------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+------+-------------------+-------------------+---------+------------------------------------------------+-------+-------------+
| 1 | SIMPLE | ca | ALL | PRIMARY | NULL | NULL | NULL | 15215 | |
| 1 | SIMPLE | cp | ref | FK_catalog_artist | FK_catalog_artist | 906 | tf_prod.ca.proper_last,tf_prod.ca.proper_first | 4 | Using where |
+----+-------------+-------+------+-------------------+-------------------+---------+------------------------------------------------+-------+-------------+
Nevermind, it was the * in the SELECT query that wouldn't allow the artist table not to use its primary key since only proper_first and proper_last are a part of the primary key.
EXPLAIN SELECT cp.proper_first, cp.proper_last, ca.proper_last, ca.proper_first FROM catalog_product cp
INNER JOIN catalog_artist ca
ON cp.proper_first = ca.proper_first
AND cp.proper_last = ca.proper_last;
The above query will use both indices.
+----+-------------+-------+-------+-------------------+-------------------+---------+------------------------------------------------+-------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+-------------------+-------------------+---------+------------------------------------------------+-------+--------------------------+
| 1 | SIMPLE | ca | index | PRIMARY | PRIMARY | 904 | NULL | 15215 | Using index |
| 1 | SIMPLE | cp | ref | FK_catalog_artist | FK_catalog_artist | 906 | tf_prod.ca.proper_last,tf_prod.ca.proper_first | 4 | Using where; Using index |
+----+-------------+-------+-------+-------------------+-------------------+---------+------------------------------------------------+-------+--------------------------+
vBulletin® v3.8.2, Copyright ©2000-2012, Jelsoft Enterprises Ltd.