CSV File Handling
The software I’ve developed typically supports the export of data in CSV format, as such data can generally be processed simply by third party software. The processing of comma-separated values (CSV) is a fairly simple process – and is probably used as a teaching example for tokenization in basic software development classes.The closest thing to a standard for CSV is RFC 4180 – which provides some guidelines:
- where a field contains a line break, double quotes or a comma, the value should be enclosed with double-quotes;
- where a field contains double quotes, they must be escaped by preceding it with another double quote (to differentiate between double quotes surrounding the field and double quotes in the field).
Converting between the field as-is, and the field as-to-be-written can be done with a simple search for characters requiring the double quotes. In C# it could look like this (this has been taken from a post on StackOverflow):
private static char[] quotedCharacters = { ',', '"', '\n' };
private const string quote = "\"";
private const string escapedQuote = "\"\"";
private static string Escape(string value)
{
if (value == null) return "";
if (value.Contains(Quote)) value = value.Replace(Quote, EscapedQuote);
if (value.IndexOfAny(QuotedCharacters) > 1)
value = Quote + value + Quote;
return value;
}
private static string Unescape(string value)
{
if (value == null) return "";
if (value.StartsWith(Quote) && value.EndsWith(Quote))
{
value = value.Substring(1, value.Length - 2);
if (value.Contains(EscapedQuote))
value = value.Replace(EscapedQuote, Quote);
}
return value;
}
The RFC 4180 format documentation doesn’t specify that all records can’t be surrounded by double quotes – in the example above double quotes are only used when necessary. Writing an array of records to disk can be done utilising the string.Join
method (C# 3+):
public static void WriteAllLines(string path, IEnumerable<string[]> values)
{
List lines = new List();
foreach (string[] line in values)
lines.Add(string.Join(",", line.Select(CSV.Escape)));
System.IO.File.WriteAllLines(path, lines);
}