W3C home > Mailing lists > Public > public-csv-wg@w3.org > February 2014

Re: CSV+ file lines with differing number of columns

From: Innovimax W3C <innovimax+w3c@gmail.com>
Date: Wed, 19 Feb 2014 18:14:31 +0100
Message-ID: <CAAK2GfFrJho4RuJ4gd0_wFk+MixH3gtPV0BeX1ZWu5KHu=M0pw@mail.gmail.com>
To: Ivan Herman <ivan@w3.org>
Cc: Tim Finin <finin@cs.umbc.edu>, public-csv-wg@w3.org
Let's call it CSV5 then

Mohamed


On Wed, Feb 19, 2014 at 7:06 AM, Ivan Herman <ivan@w3.org> wrote:

>
>
> Tim Finin wrote:
> > The current draft of Syntax for Tabular Data on the Web
> > stipulates (sec 3.3) that "Each line of a CSV+ file must contain
> > the same number of comma-separated values."  While this seems
> > reasonable, some existing use cases I'm familiar with allow for
> > CSV files with several types of lines that differ in their number
> > of columns.  Processing the CSV file requires detecting the line
> > type and also the presence of an optional terminal column.
> >
> > Might we explore relaxing the constraint that the CSV file have
> > the same number of columns for each line?
>
> Actually... we have to be careful how we formulate all this. The working
> group
> is not in position to define or not constraints, because we are not in
> position
> to define what CSV is. The only thing we can do is to describe what is out
> there, and adapt our output accordingly...
>
> (Just with my administrative hat on...)
>
> Ivan
>
> >
> > In the 2013 NIST Cold Start Knowledge Base Population Task [1],
> > researchers submit output from their text information extraction
> > systems to NIST for evaluation as tab separated files.  A line
> > consists of a triple (subj pred obj) and, for some predicates,
> > provenance information. Provenance includes a document ID and,
> > depending on the predicate, one or three pairs of string offsets
> > within the document.  Each line can also have a optional float as
> > a final column to represent a certainty measure.
> >
> > The submission format does not require adding extra separators to
> > make all of the lines uniform or explicitly adding a default
> > value for the optional last column.
> >
> > The following four lines show examples of a triple without any
> > annotations, an entity mention with provenance, an entity
> > relation with provenance, and a relation with both provenance and
> > confidence annotations.
> >
> >   :e4 type         PER
> >   :e4 mention      "Bart" D00124 283-286
> >   :e4 per:siblings :e7    D00124 283-286 173-179 274-281
> >   :e4 per:age      "10"   D00124 180-181 173-179 182-191 0.9
> >
> > [1]
> >
> http://www.nist.gov/tac/2013/KBP/ColdStart/guidelines/KBP2013_ColdStartTaskDescription_1.1.pdf
> >
> >
>
>


-- 
Innovimax SARL
Consulting, Training & XML Development
9, impasse des Orteaux
75020 Paris
Tel : +33 9 52 475787
Fax : +33 1 4356 1746
http://www.innovimax.fr
RCS Paris 488.018.631
SARL au capital de 10.000 EURO
Received on Wednesday, 19 February 2014 17:14:59 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:21:38 UTC