Re: Fw: CSV Quote Escaping from Alan Painter on 2014-03-05 (public-csv-wg@w3.org from March 2014)

From: Alan Painter <alan.painter@gmail.com>
Date: Wed, 5 Mar 2014 09:24:34 +0100
To: public-csv-wg@w3.org
Cc: will.moss@airbnb.com
Message-ID: <CAN+GtW0F4YPNK81TQigq-MAsHVfPhjd_PvDLHEkw53bJ5H_0oA@mail.gmail.com>

Just a remark from experience:

It is indeed trivial to parse CSV with double-escaped quotes.  As Will
mentions, flip the escape flag or, said otherwise, have a two-state
automaton with 1-character lookahead and a 1-character pushback.

Writing out CSV in this fashion is even easier: if the cell to be written
contains any quote, separator or line-ending characters, then replace quote
chararacters by double-escaped quotes and surround the whole cell by
quotes.

It's somewhat more difficult to parse such double-escaping with regexes,
although I don't doubt that is possible even if less palatable.
See below.

------------------------------------------------------
From: Will Moss will.moss@airbnb.com
Date: 28 February 2014 at 17:43:57

I was discussing the CSV format with someone recently and we got
to wondering why quotes are escaped using repeated quotes (i.e.
"abc""def","hij"). The only thing we could come up with is it makes
writing a character-by-character parser somewhat easier. You can flip the
bit that represents whether you are inside or outside quotes every time you
hit one and only add on the odd flips. Anyway, I emailed Yakov
Shafranovich, who wrote the original CSV RFC (
http://tools.ietf.org/html/rfc4180), but he didn't know. He mentioned that
you all were working on a CSV for the web spec, so I figured I'd follow the
rabbit hole a little deeper and see if any of you knew the history.

Thanks,
Will

--
Jeni Tennison
http://www.jenitennison.com/

Received on Wednesday, 5 March 2014 08:25:02 UTC