RE: [csvw] Is row by row processing sufficient? (#20) from Tandy, Jeremy on 2014-06-11 (public-csv-wg@w3.org from June 2014)

From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
Date: Wed, 11 Jun 2014 10:25:17 +0000
To: Stasinos Konstantopoulos <konstant@iit.demokritos.gr>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Message-ID: <2624871D9A05174691BD59F8EFD68AE208846C20@EXXCMPD1DAG3.cmpd1.metoffice.gov.uk>

Hi Stasinos - you make some great points that I had not considered. I'd like to get some feedback on this topic at today's teleconf. I'll add this item to the agenda.

Jeremy

> -----Original Message-----
> From: Stasinos Konstantopoulos [mailto:konstant@iit.demokritos.gr]
> Sent: 11 June 2014 10:45
> To: w3c/csvw
> Cc: Tandy, Jeremy
> Subject: Re: [csvw] Is row by row processing sufficient? (#20)
> 
> Jeremy, all,
> 
> Merged cells in Spreadsheets are CSV serialized as empty cells that
> mean "same as above" or "same as on the left", depending the range that
> was merged. I think that this is common enough that it deserves to be
> handled without expecting that the publisher massages their data. In
> fact, the "same as on the left" case appears in the Excel files of Use
> Case #8 - Analyzing Scientific Spreadsheets [1]
> 
> Furthermore, this is very similar to:
> 
> R-MissingValueDefinition: Ability to declare a "missing value" token
> and, optionally, a reason for the value to be missing [2]
> 
> My proposal is to extend R-MissingValueDefinition to something along
> the lines of:
> 
> Ability to declare a "missing value" token and, optionally, a reason
> for the value to be missing or an action to be taken to fill in the
> value. Actions to be taken should be selected from a closed vocabulary
> to be specified by the WG; including "same as above" and "same as on
> the left" (from UC-8).
> 
> Other interesting actions (e.g., "default value = V") might be found in
> use cases if we look at them from this perspective.
> 
> In this case, UC-8 should also require R-MissingValueDefinition.
> 
> Best,
> Stasinos
> 
> 
> [1] http://w3c.github.io/csvw/use-cases-and-requirements/index.html#UC-

> AnalyzingScientificSpreadsheets
> [2] http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R-

> MissingValueDefinition
> 
> On 11 June 2014 12:03, Jeremy Tandy <notifications@github.com> wrote:
> > In the Processing Model of the Generating RDF from Tabular Data on
> the
> > Web doc, there is an issue raised stating:
> >
> > """
> > Independently processed rows - is this always the case?
> > """
> >
> > There are examples (see Use Case #24 - Expressing a hierarchy within
> > occupational listings) where "blank" fields imply "ditto" to the
> field
> > above (or the last time that field was not blank). At first glance,
> > this seems pretty trivial, yet the example in the use case uses a
> > multi-level hierarchy, and sometimes "blank" means "empty" (null) not
> > "ditto". As such, the arbitrary processing required to "guess the
> > behaviour applied to blank cells" is somewhat challenging.
> >
> > As such, I recommend that we don't try to process this mode of
> > behaviour during the transformation. If people have CSV data with
> > "blanks that mean ditto", they need to fill in the blanks first.
> >
> > Given that, I suggest that we stick with the model that processes
> each
> > row independtly and does not require us to maintain state from row to
> row.
> >
> > —
> > Reply to this email directly or view it on GitHub.

Received on Wednesday, 11 June 2014 10:25:48 UTC