Re: Cleaning up the requirements in the UCR doc from Ivan Herman on 2016-02-09 (public-csv-wg@w3.org from February 2016)

From: Ivan Herman <ivan@w3.org>
Date: Tue, 9 Feb 2016 18:14:04 +0100
To: Jeremy Tandy <jeremy.tandy@gmail.com>
Cc: Jeni Tennison <jeni@theodi.org>, Dan Brickley <danbri@google.com>, Gregg Kellogg <gregg@greggkellogg.net>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <8642A2F6-B892-4687-B9D3-6261C6FCFF08@w3.org>

Jeremy,


only minor comments on the notes

<snip>
> 
> R-CsvToJsonTransformation
> 
> Ability to transform a CSV into JSON
> 
> 
> 
> [ACCEPTED]
> 
> [comment that [CSV2JSON] specifies the transformation of an annotated table to JSON; providing both _minimal mode_, where JSON output includes objects derived from the data within the annotated table, and _standard mode_, where JSON output additionally includes objects describing the structure of the annotated table. Built-in datatypes from the annotated table, as specified in [tabular-data-model] 4.6 Datatypes, are converted to JSON primitive types.]
> 
> 
> 
Maybe worth mentioning that the transformation includes a 'prettyfication' of the output (nested objectss) and not only a flat list of relations.

<snip>
> 
> R-CommentLines
> 
> Ability to identify comment lines within a CSV file and skip over them during parsing, format conversion or other processing
> 
> 
> 
> [DEFERRED … non-normative]
> 
> 

Why is this deferred? We provide the dialect description, and it is not our charter to specify the parsing, ie, the fact that it is not normative does not sound to be a problem for me...

> [use of _comment prefix_ as specified within a _dialect description_; default is “#” … a _dialect description_ provides ‘hints’ to parsers about how to process the tabular data file]
> 
> 

<snip>
> 
>  3.2.3 Data Model Requirements <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#data-model-req>
> R-CellMicrosyntax
> 
> Ability to parse internal data structure within a cell value
> 
> 
> 
> [ACCEPTED? … only lists]
> 
> 

We can have a category that says: partially accepted.

Are the relevant use cases covered by what we have?  If those use cases are validating only then we are fine, and we can mention that, claiming victory… otherwise, well, that is it.
> [comment that support is provided for validating the format of cell values … R-SyntacticTypeDefinition:
> 
> _Parsing Cells_: formats for numeric types (decimalChar, groupChar, pattern), formats for booleans, formats for dates and times, formats for durations
> formats for other types (e.g. html, json, xml and well known text literals ‘WKT”) can be validated using a regular expression for the string values, with syntax and processing defined by [ECMASCRIPT <https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/#bib-ECMASCRIPT>]]
> [comment that only limited support is provided for extracting values from structured data within cells; the parsing html, json and xml etc. to extract structured data is not support; lists of values provided in a single cell are processed into arrays wherein each array item is considered to be of consistent type]
> 
> [comment that list items in a given cell value are separated by the _separator_ character specified in the _dialect description_]
> 
> 
> 
> R-NonStandardCellDelimiter
> 
> Ability to parse tabular data with cell delimiters other than comma (,)
> 
> 
> 
> [DEFERRED … non-normative]
> 
> 

see my comment on comments (sic!). The same applied here I believe

> [use of _delimiter_ as specified within a _dialect description_; default is “,” … a _dialect description_ provides ‘hints’ to parsers about how to process the tabular data file]
> 
> 
> 
<snip>
> R-WellFormedCsvCheck
> 
> Ability to determine that a CSV is syntactically well formed
> 
> 
> 
> [DEFERRED]
> 
> 
> 

I guess the term 'deferred' is a bit pejorative here; we were not chartered to define parser behaviour!

----
Ivan Herman, W3C
Digital Publishing Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
ORCID ID: http://orcid.org/0000-0003-0782-2704

Received on Tuesday, 9 February 2016 17:14:17 UTC