- From: Alan Painter <alan.painter@gmail.com>
- Date: Wed, 5 Mar 2014 09:24:34 +0100
- To: public-csv-wg@w3.org
- Cc: will.moss@airbnb.com
- Message-ID: <CAN+GtW0F4YPNK81TQigq-MAsHVfPhjd_PvDLHEkw53bJ5H_0oA@mail.gmail.com>
Just a remark from experience: It is indeed trivial to parse CSV with double-escaped quotes. As Will mentions, flip the escape flag or, said otherwise, have a two-state automaton with 1-character lookahead and a 1-character pushback. Writing out CSV in this fashion is even easier: if the cell to be written contains any quote, separator or line-ending characters, then replace quote chararacters by double-escaped quotes and surround the whole cell by quotes. It's somewhat more difficult to parse such double-escaping with regexes, although I don't doubt that is possible even if less palatable. See below. ------------------------------------------------------ From: Will Moss will.moss@airbnb.com Date: 28 February 2014 at 17:43:57 I was discussing the CSV format with someone recently and we got to wondering why quotes are escaped using repeated quotes (i.e. "abc""def","hij"). The only thing we could come up with is it makes writing a character-by-character parser somewhat easier. You can flip the bit that represents whether you are inside or outside quotes every time you hit one and only add on the odd flips. Anyway, I emailed Yakov Shafranovich, who wrote the original CSV RFC ( http://tools.ietf.org/html/rfc4180), but he didn't know. He mentioned that you all were working on a CSV for the web spec, so I figured I'd follow the rabbit hole a little deeper and see if any of you knew the history. Thanks, Will -- Jeni Tennison http://www.jenitennison.com/
Received on Wednesday, 5 March 2014 08:25:02 UTC