- From: Tandy, Jeremy <jeremy.tandy@metoffice.gov.uk>
- Date: Wed, 11 Jun 2014 10:25:17 +0000
- To: Stasinos Konstantopoulos <konstant@iit.demokritos.gr>, "public-csv-wg@w3.org" <public-csv-wg@w3.org>
Hi Stasinos - you make some great points that I had not considered. I'd like to get some feedback on this topic at today's teleconf. I'll add this item to the agenda. Jeremy > -----Original Message----- > From: Stasinos Konstantopoulos [mailto:konstant@iit.demokritos.gr] > Sent: 11 June 2014 10:45 > To: w3c/csvw > Cc: Tandy, Jeremy > Subject: Re: [csvw] Is row by row processing sufficient? (#20) > > Jeremy, all, > > Merged cells in Spreadsheets are CSV serialized as empty cells that > mean "same as above" or "same as on the left", depending the range that > was merged. I think that this is common enough that it deserves to be > handled without expecting that the publisher massages their data. In > fact, the "same as on the left" case appears in the Excel files of Use > Case #8 - Analyzing Scientific Spreadsheets [1] > > Furthermore, this is very similar to: > > R-MissingValueDefinition: Ability to declare a "missing value" token > and, optionally, a reason for the value to be missing [2] > > My proposal is to extend R-MissingValueDefinition to something along > the lines of: > > Ability to declare a "missing value" token and, optionally, a reason > for the value to be missing or an action to be taken to fill in the > value. Actions to be taken should be selected from a closed vocabulary > to be specified by the WG; including "same as above" and "same as on > the left" (from UC-8). > > Other interesting actions (e.g., "default value = V") might be found in > use cases if we look at them from this perspective. > > In this case, UC-8 should also require R-MissingValueDefinition. > > Best, > Stasinos > > > [1] http://w3c.github.io/csvw/use-cases-and-requirements/index.html#UC- > AnalyzingScientificSpreadsheets > [2] http://w3c.github.io/csvw/use-cases-and-requirements/index.html#R- > MissingValueDefinition > > On 11 June 2014 12:03, Jeremy Tandy <notifications@github.com> wrote: > > In the Processing Model of the Generating RDF from Tabular Data on > the > > Web doc, there is an issue raised stating: > > > > """ > > Independently processed rows - is this always the case? > > """ > > > > There are examples (see Use Case #24 - Expressing a hierarchy within > > occupational listings) where "blank" fields imply "ditto" to the > field > > above (or the last time that field was not blank). At first glance, > > this seems pretty trivial, yet the example in the use case uses a > > multi-level hierarchy, and sometimes "blank" means "empty" (null) not > > "ditto". As such, the arbitrary processing required to "guess the > > behaviour applied to blank cells" is somewhat challenging. > > > > As such, I recommend that we don't try to process this mode of > > behaviour during the transformation. If people have CSV data with > > "blanks that mean ditto", they need to fill in the blanks first. > > > > Given that, I suggest that we stick with the model that processes > each > > row independtly and does not require us to maintain state from row to > row. > > > > — > > Reply to this email directly or view it on GitHub.
Received on Wednesday, 11 June 2014 10:25:48 UTC