W3C home > Mailing lists > Public > public-csv-wg@w3.org > February 2014

Re: CSV+ file lines with differing number of columns

From: Ivan Herman <ivan@w3.org>
Date: Wed, 19 Feb 2014 07:06:22 +0100
Message-ID: <530449DE.1080505@w3.org>
To: Tim Finin <finin@cs.umbc.edu>
CC: public-csv-wg@w3.org


Tim Finin wrote:
> The current draft of Syntax for Tabular Data on the Web
> stipulates (sec 3.3) that "Each line of a CSV+ file must contain
> the same number of comma-separated values."  While this seems
> reasonable, some existing use cases I'm familiar with allow for
> CSV files with several types of lines that differ in their number
> of columns.  Processing the CSV file requires detecting the line
> type and also the presence of an optional terminal column.
> 
> Might we explore relaxing the constraint that the CSV file have
> the same number of columns for each line?

Actually... we have to be careful how we formulate all this. The working group
is not in position to define or not constraints, because we are not in position
to define what CSV is. The only thing we can do is to describe what is out
there, and adapt our output accordingly...

(Just with my administrative hat on...)

Ivan

> 
> In the 2013 NIST Cold Start Knowledge Base Population Task [1],
> researchers submit output from their text information extraction
> systems to NIST for evaluation as tab separated files.  A line
> consists of a triple (subj pred obj) and, for some predicates,
> provenance information. Provenance includes a document ID and,
> depending on the predicate, one or three pairs of string offsets
> within the document.  Each line can also have a optional float as
> a final column to represent a certainty measure.
> 
> The submission format does not require adding extra separators to
> make all of the lines uniform or explicitly adding a default
> value for the optional last column.
> 
> The following four lines show examples of a triple without any
> annotations, an entity mention with provenance, an entity
> relation with provenance, and a relation with both provenance and
> confidence annotations.
> 
>   :e4 type         PER
>   :e4 mention      "Bart" D00124 283-286
>   :e4 per:siblings :e7    D00124 283-286 173-179 274-281
>   :e4 per:age      "10"   D00124 180-181 173-179 182-191 0.9
> 
> [1]
> http://www.nist.gov/tac/2013/KBP/ColdStart/guidelines/KBP2013_ColdStartTaskDescription_1.1.pdf
> 
> 


Received on Wednesday, 19 February 2014 06:06:52 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:21:38 UTC