bash-hackers-wiki/docs/snipplets/awkcsv.md
2024-10-18 17:57:25 -05:00

47 lines
1.3 KiB
Markdown

---
tags:
- awk
- csv
---
# Using `awk` to deal with CSV that uses quoted/unquoted delimiters
CSV files are a mess, yes.
Assume you have CSV files that use the comma as delimiter and quoted
data fields that can contain the delimiter.
"first", "second", "last"
"fir,st", "second", "last"
"firtst one", "sec,ond field", "final,ly"
Simply using the comma as separator for `awk` won't work here, of
course.
Solution: Use the field separator `", "|^"|"$` for `awk`.
This is an OR-ed list of 3 possible separators:
| | |
|--------|----------------------------------------------|
|`", "` | matches the area between the datafields|
|`^"` | matches the area left of the first datafield|
|`"$` | matches the area right of the last data field|
You can tune these delimiters if you have other needs (for example if
you don't have a space after the commas).
Test:
The `awk` command used for the CSV above just prints the fileds
separated by `###` to see what's going on:
$ awk -v FS='", "|^"|"$' '{print $2"###"$3"###"$4}' data.csv
first###second###last
fir,st###second###last
firtst one###sec,ond field###final,ly
**ATTENTION** If the CSV data changes its format every now and then (for
example it only quotes the data fields if needed, not always), then this
way will not work.