Re: Cleaning up the requirements in the UCR doc from Gregg Kellogg on 2016-02-11 (public-csv-wg@w3.org from February 2016)

From: Gregg Kellogg <gregg@greggkellogg.net>
Date: Thu, 11 Feb 2016 09:40:26 -0800
To: Jeremy Tandy <jeremy.tandy@gmail.com>
Cc: Ivan Herman <ivan@w3.org>, Jeni Tennison <jeni@theodi.org>, Dan Brickley <danbri@google.com>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <1293FCCF-0AA8-42BD-A129-F34EE7E6A41E@greggkellogg.net>
> On Feb 9, 2016, at 9:53 AM, Jeremy Tandy <jeremy.tandy@gmail.com> wrote:
> 
> Hi Ivan- thanks for feedback.
> 
>> R-CsvToJsonTransformation
>> 
> 
> Good suggestion.
> 
>> R-CommentLines ... and ... R-CommentLines
>> 
> 
> 
> I wasn't quite sure how to classify these. Obviously parsing is outside the charter scope. I marked them as DEFERRED because although we provide a mechanism, it is non-normative. That said, we've delivered what we can (and enable these requirements to be met) even though it's outside the charter.

I don’t see this as being deferred, but being satisfied. The group will never have a normative way to do this, as it’s outside the current charter and any reasonable follow-on charter. Maybe it should just be marked as NON-NORMATIVE.

R-NonStandardCellDelimiter – same

R-CsvAsSubsetOfLargerDataset – Also seems permanently out of scope, maybe we should just reference Data on the Web Best Practices. Deffered implies a future group my address it, which it won’t/shouldn’t.

>> R-CellMicrosyntax
>> 
> 
> The "partially accepted" category might be appropriate. Looking at the motivating use cases we meet 4 out of 5 (see below). That said, the one we don't meet requires conditional processing (see R-ConditionalProcessingBasedOnCellValues) which is DEFERRED; we're not meeting UC #24. 
> 
> The following use cases are met by our partial implementation of the requirement:
> Use Case #6 - Journal Article Solr Search Results ... date-time format, lists of authors ... the journal title contains html markup (<i> element) - but the use case indicates that it's OK to treat this as pure text
> 
> Use Case #11 - City of Palo Alto Tree Data ... lists of comments delimited with semi-colon ";"
> 
> Use Case #18 - Supporting Semantic-based Recommendations ... the 'semantic paths' are a comma delimited list of URIs; the use case doesn't indicate that different semantics are applied to each item in the list
> 
> Use Case #20 - Integrating components with the TIBCO Spotfire platform using tabular data ... escape sequences for special characters are not supported, but the use case indicates that "These special characters don't affect the parsing [...]"
> 
> 
> The following use case is NOT met:
> Use Case #24 - Expressing a hierarchy within occupational listings ... use of regular expression to extract values from substrings; different parts of the structured occupation code
> 
> 
>> R-WellFormedCsvCheck
>> 
> 
> This requirement was already marked as deferred; I didn't see any point to change that. However, we assume that the CSV is well formed in for us to process the tabular data - it's just not in scope of our charter.
> 
> ---
> 
> Please let me know after tomorrow if you collectively think R-CommentLines and R-CommentLines should be marked as ACCEPTED and that R-CellMicrosyntax be marked as PARTIALLY ACCEPTED.
> 
> BR, Jeremy
> 
> 
> 
> 
> 
> On Tue, 9 Feb 2016 at 17:14 Ivan Herman <ivan@w3.org <mailto:ivan@w3.org>> wrote:
> Jeremy, 
> 
> 
> only minor comments on the notes
> 
> <snip>
>> 
>> R-CsvToJsonTransformation
>> 
>> Ability to transform a CSV into JSON
>> 
>> 
>> 
>> [ACCEPTED]
>> 
>> [comment that [CSV2JSON] specifies the transformation of an annotated table to JSON; providing both _minimal mode_, where JSON output includes objects derived from the data within the annotated table, and _standard mode_, where JSON output additionally includes objects describing the structure of the annotated table. Built-in datatypes from the annotated table, as specified in [tabular-data-model] 4.6 Datatypes, are converted to JSON primitive types.]
>> 
>> 
>> 
> 
> Maybe worth mentioning that the transformation includes a 'prettyfication' of the output (nested objectss) and not only a flat list of relations.
> 
> <snip>
>> 
>> R-CommentLines
>> 
>> Ability to identify comment lines within a CSV file and skip over them during parsing, format conversion or other processing
>> 
>> 
>> 
>> [DEFERRED … non-normative]
>> 
>> 
> 
> Why is this deferred? We provide the dialect description, and it is not our charter to specify the parsing, ie, the fact that it is not normative does not sound to be a problem for me...
> 
>> [use of _comment prefix_ as specified within a _dialect description_; default is “#” … a _dialect description_ provides ‘hints’ to parsers about how to process the tabular data file]
>> 
>> 
> 
> <snip>
> 
>> 
>>  3.2.3 Data Model Requirements <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#data-model-req>
>> R-CellMicrosyntax
>> 
>> Ability to parse internal data structure within a cell value
>> 
>> 
>> 
>> [ACCEPTED? … only lists]
>> 
>> 
> 
> We can have a category that says: partially accepted.
> 
> Are the relevant use cases covered by what we have?  If those use cases are validating only then we are fine, and we can mention that, claiming victory… otherwise, well, that is it.
> 
>> [comment that support is provided for validating the format of cell values … R-SyntacticTypeDefinition: 
>> 
>> _Parsing Cells_: formats for numeric types (decimalChar, groupChar, pattern), formats for booleans, formats for dates and times, formats for durations
>> formats for other types (e.g. html, json, xml and well known text literals ‘WKT”) can be validated using a regular expression for the string values, with syntax and processing defined by [ECMASCRIPT <https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/#bib-ECMASCRIPT>]]
>> [comment that only limited support is provided for extracting values from structured data within cells; the parsing html, json and xml etc. to extract structured data is not support; lists of values provided in a single cell are processed into arrays wherein each array item is considered to be of consistent type]
>> 
>> [comment that list items in a given cell value are separated by the _separator_ character specified in the _dialect description_]
>> 
>> 
>> 
>> R-NonStandardCellDelimiter
>> 
>> Ability to parse tabular data with cell delimiters other than comma (,)
>> 
>> 
>> 
>> [DEFERRED … non-normative]
>> 
>> 
> 
> see my comment on comments (sic!). The same applied here I believe
> 
>> [use of _delimiter_ as specified within a _dialect description_; default is “,” … a _dialect description_ provides ‘hints’ to parsers about how to process the tabular data file]
>> 
>> 
>> 
> 
> <snip>
>> R-WellFormedCsvCheck
>> 
>> Ability to determine that a CSV is syntactically well formed
>> 
>> 
>> 
>> [DEFERRED]
>> 
>> 
>> 
> 
> I guess the term 'deferred' is a bit pejorative here; we were not chartered to define parser behaviour!
> 
> ----
> Ivan Herman, W3C 
> Digital Publishing Lead
> Home: http://www.w3.org/People/Ivan/ <http://www.w3.org/People/Ivan/>
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704 <http://orcid.org/0000-0003-0782-2704>
> 
> 
> 
>
Received on Thursday, 11 February 2016 17:40:58 UTC