mirror of
https://github.com/flokoe/bash-hackers-wiki.git
synced 2024-11-01 06:53:05 +01:00
43 lines
1.3 KiB
Plaintext
43 lines
1.3 KiB
Plaintext
====== Using ''awk'' to deal with CSV that uses quoted/unquoted delimiters ======
|
|
|
|
---- dataentry snipplet ----
|
|
snipplet_tags : awk, csv
|
|
LastUpdate_dt : 2010-07-31
|
|
Contributors : SiegX (IRC)
|
|
type : snipplet
|
|
----
|
|
|
|
CSV files are a mess, yes.
|
|
|
|
Assume you have CSV files that use the comma as delimiter and quoted data fields that can contain the delimiter.
|
|
|
|
<code>
|
|
"first", "second", "last"
|
|
"fir,st", "second", "last"
|
|
"firtst one", "sec,ond field", "final,ly"
|
|
</code>
|
|
|
|
Simply using the comma as separator for ''awk'' won't work here, of course.
|
|
|
|
Solution:
|
|
Use the field separator ''", "|^"|"$'' for ''awk''.
|
|
|
|
This is an OR-ed list of 3 possible separators:
|
|
|''", "''|matches the area between the datafields|
|
|
|''^"''|matches the area left of the first datafield|
|
|
|''"$''|matches the area right of the last data field|
|
|
|
|
You can tune these delimiters if you have other needs (for example if you don't have a space after the commas).
|
|
|
|
Test:
|
|
|
|
The ''awk'' command used for the CSV above just prints the fileds separated by ''###'' to see what's going on:
|
|
<code>
|
|
$ awk -v FS='", "|^"|"$' '{print $2"###"$3"###"$4}' data.csv
|
|
first###second###last
|
|
fir,st###second###last
|
|
firtst one###sec,ond field###final,ly
|
|
</code>
|
|
|
|
**ATTENTION** If the CSV data changes its format every now and then (for example it only quotes the data fields if needed, not always), then this way will not work.
|