W3C home > Mailing lists > Public > public-csv-wg@w3.org > February 2014

implicit delimeter escaping, log files

From: Stasinos Konstantopoulos <konstant@iit.demokritos.gr>
Date: Wed, 26 Feb 2014 09:57:36 +0200
To: Jeni Tennison <jeni@theodi.org>
Cc: public-csv-wg@w3.org
Message-ID: <20140226075736.GA27384@iit.demokritos.gr>
Jeni, all,

There is also data where the last column extends to the end-of-line
regardless any unescaped/unquoted delimeters it might contain.

There might be more examples, but the one that immediatelly springs to
mind is log files such as those written by postgreSQL:

2014-02-14 05:58:55 EET LOG:  received fast shutdown request
2014-02-14 05:58:55 EET LOG:  aborting any active transactions
2014-02-14 05:58:55 EET LOG:  autovacuum launcher shutting down
2014-02-14 05:58:55 EET LOG:  shutting down
2014-02-14 05:58:55 EET LOG:  database system is shut down
2014-02-14 05:59:28 EET LOG:  database system was shut down at 2014-02-14 05:58:55 EET
2014-02-14 05:59:28 EET LOG:  incomplete startup packet
2014-02-14 05:59:28 EET LOG:  database system is ready to accept connections
2014-02-14 05:59:28 EET LOG:  autovacuum launcher started

This format can be read in difference ways [1] so the example might not
be perfect, but it is only meant to illustrate the point. I am sure
there will more data like this, where everything left after the Nth
character or the Mth delimeter is a single text field, no matter what it

The more general point for the group's consideration is whether log
files in general in scope; regardless of whether we are discussing
difficult ones or more CSV-behaved ones, such as the Common Logfile
Format [2].

Till later,

[1] fixed length fields except the last, or two columns delimited by the
the left-most occurence of the string "LOG:"

[2] http://www.w3.org/Daemon/User/Config/Logging.html
Received on Wednesday, 26 February 2014 07:49:06 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:21:39 UTC