...

View Full Version : Convert TSV to CSV for import- Is it necessary??



treetops
06-18-2011, 09:05 AM
Hello

I paid a guy to fix a data import script my site uses, as we use a different provider for the data now.

The old data we imported (which is very similiar) was in a CSV (comma seperated), whereas the new data is provided to us in a TSV (tab seperated)

He spent a day figuring out how to convert the tsv to a csv, and now i notice that somewhere in that process, data is getting lost, or fields are getting mixed up. Therefore I'm asking, is it necessary to convert the data ? or can the data be imported straight from a TSV?

Here is a sample of the code:
system("cat {$tmp} | tr '\\t' ',' > {$fixtures}caravans.csv"); // convert TSV to CSV

Any help would be appreciated, as although the script is importing, and therefore working to some degree, some of the data is missing and i can only think it must be in the conversion, as when i open the original tsv in excel, the missing data is there.

Cheers for any help,
Tree

Old Pedant
06-18-2011, 09:59 PM
Well, not to ask a silly question, but... How do you then import the CSV file??

You gave zero details of that process.

Old Pedant
06-18-2011, 10:02 PM
Oh, and I'm sure you are right about the conversion process having a bug.

CSV files need "..." around string values, in case there are any commas in the fields themselves.

That is:


"Jones, Bob", "Programmer", "82750"

TSV files do *NOT* normally include those quotes:

Jones, Bob Programmer 82750

So if indeed any of your TSV fields have commas in them, the conversion that you showed us will produce too many fields in the CSV file.

treetops
06-19-2011, 12:39 AM
Oh, and I'm sure you are right about the conversion process having a bug.

CSV files need "..." around string values, in case there are any commas in the fields themselves.

That is:


"Jones, Bob", "Programmer", "82750"

TSV files do *NOT* normally include those quotes:

Jones, Bob Programmer 82750

So if indeed any of your TSV fields have commas in them, the conversion that you showed us will produce too many fields in the CSV file.

Luckily, the tsv file does have quotes around long bits of text, so thats not causing a problem. However, I have resolved it by removing his convert tsv to csv, and doing the conversion myself in excel, which seems to do a better job. Thanks for the help,

kunz
08-18-2011, 12:55 PM
Why do people even bother with TSV?

I always end up having issues with them..

guelphdad
08-18-2011, 04:21 PM
Why do you need to convert them to CSV for the import then? you can specify your separator upon import so can specify a TAB instead of COMMA as your separator for example.

Old Pedant
08-18-2011, 09:45 PM
Why do people even bother with TSV?


Possible reasons:

(A) Because you get TSV files from some outside source that you have no control of.

(B) Because they are easier to create than CSV [you don't have to put the quotes around strings that might have commas in them].

(C) Because they aren't a bother at all if you know what you are doing.

kunz
08-20-2011, 02:50 AM
Possible reasons:

(A) Because you get TSV files from some outside source that you have no control of.

(B) Because they are easier to create than CSV [you don't have to put the quotes around strings that might have commas in them].

(C) Because they aren't a bother at all if you know what you are doing.

Fair enough..your post does make sense.

I just find it easier to tell my clients to always use CSV - makes like so much easier when importing/exporting to other products as CSV is used everywhere.

kunz
08-29-2011, 02:40 PM
Couldn't edit my post above - so my apologies for a new post.

I've come across a clients TSV file (they have close to 100,000 of these files) where we were missing data when being imported.

After some analysis we found that some of these files had been edited over the years by humans who have mistakenly added spaces instead of TABS.

Such a headache :(

We're running it by a script that removes any occurrence of three or more spaces together and replacing it with a TAB instead. Is there a better way to do this programmatically before we import all that data into their new CRM?

Old Pedant
08-29-2011, 05:26 PM
After doing that, you might run it through a simple program (PHP, ASP, whatever) that simply counts the number of TABs in each line and tells you of any lines where the count is wrong. Then you could go hand edit those lines back to what they need to be.

Kevin Richards
08-30-2011, 05:35 AM
I've never had anything but problems using TSV files in the past but sometimes it's not always up to you on what to use I guess.

ekgrad
09-02-2011, 06:18 PM
Why mess with all this, just the CSVED package. Use it to open the csv file , change the delimiter and save it. Works like a breeze for me.

dhape
09-17-2011, 03:25 AM
Does it have to be a regex? Can just Parse the CSV using your fav csv library, and then rejoin using tabs?


require 'csv'

test = '"foo,bar,baz",one,two,three'
CSV.parse_line(test).join("\t")

"foo,bar,baz\tone\ttwo\tthree"

dhape
09-17-2011, 11:33 PM
.............



EZ Archive Ads Plugin for vBulletin Copyright 2006 Computer Help Forum