implicit delimeter escaping, log files

Jeni, all,

There is also data where the last column extends to the end-of-line
regardless any unescaped/unquoted delimeters it might contain.

There might be more examples, but the one that immediatelly springs to
mind is log files such as those written by postgreSQL:

2014-02-14 05:58:55 EET LOG:  received fast shutdown request
2014-02-14 05:58:55 EET LOG:  aborting any active transactions
2014-02-14 05:58:55 EET LOG:  autovacuum launcher shutting down
2014-02-14 05:58:55 EET LOG:  shutting down
2014-02-14 05:58:55 EET LOG:  database system is shut down
2014-02-14 05:59:28 EET LOG:  database system was shut down at 2014-02-14 05:58:55 EET
2014-02-14 05:59:28 EET LOG:  incomplete startup packet
2014-02-14 05:59:28 EET LOG:  database system is ready to accept connections
2014-02-14 05:59:28 EET LOG:  autovacuum launcher started

This format can be read in difference ways [1] so the example might not
be perfect, but it is only meant to illustrate the point. I am sure
there will more data like this, where everything left after the Nth
character or the Mth delimeter is a single text field, no matter what it
contains.

The more general point for the group's consideration is whether log
files in general in scope; regardless of whether we are discussing
difficult ones or more CSV-behaved ones, such as the Common Logfile
Format [2].

Till later,
stasinos

[1] fixed length fields except the last, or two columns delimited by the
the left-most occurence of the string "LOG:"

[2] http://www.w3.org/Daemon/User/Config/Logging.html

Received on Wednesday, 26 February 2014 07:49:06 UTC