I have a number of large CSV files and would like them in TSV (tab separated format). The complication is that there are commas in the fields of the CSV file, eg:

 A,,C,"D,E,F","G",I,"K,L,M",Z

Expected output:

 A      C   D,E,F   G   I   K,L,M   Z

(where whitespace in between are 'hard' tabs)

I have Perl, Python, and coreutils installed on this server.

share|improve this question
    
I would do this with node.js or with perl. – peterh yesterday
1  
Replace non quoted commas with tabs... – cricket_007 yesterday
    
Yes, if I had more than 5 minutes to this question. But I will happily support the answerers with my votes. What I tried to say, that the common sed/awk things are probably ineligible for that (at least in their commonly used usage). – peterh yesterday
    
You've got a solution in a half hour... I think you could have solved it in around the same time. :-) Be more patient :-) – peterh yesterday
5  
I'm not sure if your example is representative of the actual data, but if those are going to be actual text strings then don't forget that you may need to handle the case where the string includes a tab... – A C yesterday
up vote 10 down vote accepted

One option might be perl's Text::CSV module e.g.

perl -MText::CSV -lne 'BEGIN { $csv = Text::CSV->new() }
  print join "\t", $csv->fields() if $csv->parse($_)
' somefile

to demonstrate

echo 'A,,C,"D,E,F","G",I,"K,L,M",Z' |
  perl -MText::CSV -lne 'BEGIN { $csv = Text::CSV->new() }
  print join "\t", $csv->fields() if $csv->parse($_)
'
A       C   D,E,F   G   I   K,L,M   Z
share|improve this answer
    
Wouldn't be correct if a field contains a tab – Neil McGuigan 13 hours ago
1  
This is not even correct. unix.stackexchange.com/a/359838/9452 should be accepted instead. – chx 12 hours ago

Python

#!/usr/bin/env python

import sys
import csv
from StringIO import StringIO

for line in sys.stdin:
    print('\t'.join(csv.reader(StringIO(line)).next()))

(Tested in Python 2... I'm probably misusing that StringIO function)

EDIT

Or better, as commented,

import csv, sys
csv.writer(sys.stdout, dialect='excel-tab').writerows(csv.reader(sys.stdin))

Test runs

$ echo 'A,,C,"D,E,F","G",I,"K,L,M",Z' | ./csv2tab.sh                         
A       C   D,E,F   G   I   K,L,M   Z

$ ./csv2tab.sh < data.csv > data.tsv && head data.tsv                                                   
1A      C   D,E,F   G   I   K,L,M   Z
2A      C   D,E,F   G   I   K,L,M   Z
3A      C   D,E,F   G   I   K,L,M   Z
share|improve this answer
2  
You don't need the StringIO, just use sys.stdin directly: ideone.com/SAeeya – muru yesterday
1  
A possible bug: this answer doesn't escape internal tabs. – Morgen yesterday
4  
@Morgen csv.writer(sys.stdout, dialect='excel-tab').writerows(csv.reader(sys.stdin))? Eliminates the loop as well. – muru yesterday
1  
@chx try python -c 'import csv,sys; csv.writer(sys.stdout, dialect="excel-tab").writerows(csv.reader(sys.stdin))'. I doubt -m works that way. – muru yesterday

For fun, sed.

sed -E 's/("([^"]*)")?,/\2\t/g' file

If your sed doesn't support -E, try with -r. If your sed doesn't support \t for a literal tab, try putting a literal tab (in many shells, ctrl-v tab) or in Bash, use a $'...' C-style string (in which case the backslash in \2 needs to be doubled). If you want to keep the quotes, use \1 instead of \2 (in which case the inner pair of parentheses is useless, and can be removed).

This makes no attempt to handle escaped double quotes inside double quotes; some CSV dialects support this by doubling the quoted double quote (sic).

share|improve this answer
    
I think that i tried about 100 different sed scripts to achieve this one but all my attempts failed. This is awesome. – George Vasiliou 11 hours ago

Perl

perl -lne '
   my $re = qr/,(?=(?:[^"]*"[^"]*")*(?![^"]*"))/;
   print join "\t", map { s/(?<!\\)"//gr =~ s/\\"/"/gr } split $re;
'

Awk

awk -v Q=\" -v FPAT="([^,]*)|(\"[^\"]+\")" -v OFS="\t" '{
   for (i=1; i<=NF; ++i)
      if ( substr($i, 1, 1) == Q )
         $i = substr($i, 2, length($i) - 2)
   print $1, $2, $3, $4, $5, $6, $7, $8
}'

Result:

A               C       D,E,F   G       I       K,L,M   Z
share|improve this answer

The thermonuclear flyswatter solution must be using libreoffice. While https://ask.libreoffice.org/en/question/19042/is-is-possible-to-convert-comma-separated-value-csv-to-tab-separated-value-tsv-via-headless-mode/ suggests this is not possible but it is wrong (or just outdated?) and the following command works on my 5.3.:

loffice "-env:UserInstallation=file:///tmp/LibO_Conversion" --convert-to csv:"Text - txt - csv (StarCalc)":9,34,UTF8 --headless --outdir some/path --infilter='csv:44,34,UTF8' *.csv

the env argument could be skipped but this way the documents won't appear in your recent document.

share|improve this answer

Using csvkit (Python):

https://github.com/wireservice/csvkit

$ csvformat -T in.csv > out.txt

Does streaming, with correct CSV input and correct TSV output

share|improve this answer

Vim

Just for fun, regex substitutions can be performed in Vim. Here's a potential four line solution, adapted from: http://stackoverflow.com/questions/33332871/remove-all-commas-between-quotes-with-a-vim-regex

  1. Commas between quotes are first changed to underscores (or other absent character),
  2. All other commas are replaced with tabs,
  3. Underscores inside quotes are restored to commas,
  4. Quotation marks are removed.

    :%s/".\{-}"/\=substitute(submatch(0), ',', '_' , 'g')/g
    :%s/,/\t/g
    :%s/_/,/g
    :%s/"//g
    
share|improve this answer

Your Answer

 
discard

By posting your answer, you agree to the privacy policy and terms of service.

Not the answer you're looking for? Browse other questions tagged or ask your own question.