Re: Cleaning up the requirements in the UCR doc from Jeremy Tandy on 2016-02-12 (public-csv-wg@w3.org from February 2016)

From: Jeremy Tandy <jeremy.tandy@gmail.com>
Date: Fri, 12 Feb 2016 23:21:10 +0000
To: Gregg Kellogg <gregg@greggkellogg.net>
Cc: Ivan Herman <ivan@w3.org>, Jeni Tennison <jeni@theodi.org>, Dan Brickley <danbri@google.com>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-ID: <CADtUq_2Hh=0DBYV8vD-pxq7+EZH-XjhmySEM0eu=i=_LxJp_+g@mail.gmail.com>
Hi -

I have 'finished' updating the UCR doc to my satisfaction and closed the
outstanding issues relating to the UCR doc.

There is one outstanding question: how do I reference the Primer? I've
raised ISSUE #809 <https://github.com/w3c/csvw/issues/809> to make sure I
don't forget this!

I've done a proof read. I'm satisfied ... but would appreciate your views
on the content and categorisation.

I've also remembered to add a 'changes' section [1].

My hope is that if you tell me stuff that needs to be changed, I can get it
fixed before Wednesday and we can vote to release.

Jeremy

BTW: I've made no content changes to the 'use case' part of the doc - only
fixed some unclosed html elements and improved internal formatting.

[1]: http://w3c.github.io/csvw/use-cases-and-requirements/index.html#changes


On Fri, 12 Feb 2016 at 23:13 Jeremy Tandy <jeremy.tandy@gmail.com> wrote:

> (previously sent only to Gregg- oops)
>
> Hi Gregg.
>
> Thanks for the feedback. Both R-CommentLines and
> R-NonStandardCellDelimiter are now marked as accepted, see UCR doc 3.1.1
> CSV parsing requirements [1]. The descriptive note indicates the charter
> scope and the non-normative nature of the comment line / delimiter
> behaviour.
>
> I have marked R-CsvAsSubsetOfLargerDataset as partially met, see UCR doc
> 3.2.1 Data model requirements [2]. Basically, we only have a simple
> grouping mechanism, but we do support the usage of annotations from other
> vocabs (e.g. RDF Datacube and VoID) to describe richer relationships.
>
> Jeremy
>
> [1]:
> http://w3c.github.io/csvw/use-cases-and-requirements/index.html#acc-req-parsing
> [2]:
> http://w3c.github.io/csvw/use-cases-and-requirements/index.html#p-acc-req-data-model
>
>
> On Thu, 11 Feb 2016 at 17:40 Gregg Kellogg <gregg@greggkellogg.net> wrote:
>
>> On Feb 9, 2016, at 9:53 AM, Jeremy Tandy <jeremy.tandy@gmail.com> wrote:
>>
>> Hi Ivan- thanks for feedback.
>>
>> *R-CsvToJsonTransformation*
>>
>> Good suggestion.
>>
>> *R-CommentLines *... and ... *R-CommentLines*
>>
>>
>> I wasn't quite sure how to classify these. Obviously parsing is outside
>> the charter scope. I marked them as DEFERRED because although we provide a
>> mechanism, it is non-normative. That said, we've delivered what we can (and
>> enable these requirements to be met) even though it's outside the charter.
>>
>>
>> I don’t see this as being deferred, but being satisfied. The group will
>> never have a normative way to do this, as it’s outside the current charter
>> and any reasonable follow-on charter. Maybe it should just be marked as
>> NON-NORMATIVE.
>>
>> *R-NonStandardCellDelimiter* – same
>>
>> *R-CsvAsSubsetOfLargerDataset* – Also seems permanently out of scope,
>> maybe we should just reference Data on the Web Best Practices. Deffered
>> implies a future group my address it, which it won’t/shouldn’t.
>>
>> *R-CellMicrosyntax*
>>
>> The "partially accepted" category might be appropriate. Looking at the
>> motivating use cases we meet 4 out of 5 (see below). That said, the one we
>> don't meet requires conditional processing (see
>> *R-ConditionalProcessingBasedOnCellValues*) which is DEFERRED; we're not
>> meeting UC #24.
>>
>> The following use cases are met by our partial implementation of the
>> requirement:
>> Use Case #6 - Journal Article Solr Search Results ... date-time format,
>> lists of authors ... the journal title contains html markup (<i> element) -
>> but the use case indicates that it's OK to treat this as pure text
>> Use Case #11 - City of Palo Alto Tree Data ... lists of comments
>> delimited with semi-colon ";"
>> Use Case #18 - Supporting Semantic-based Recommendations ... the
>> 'semantic paths' are a comma delimited list of URIs; the use case doesn't
>> indicate that different semantics are applied to each item in the list
>> Use Case #20 - Integrating components with the TIBCO Spotfire platform
>> using tabular data ... escape sequences for special characters are not
>> supported, but the use case indicates that "These special characters
>> don't affect the parsing [...]"
>>
>> The following use case is NOT met:
>> Use Case #24 - Expressing a hierarchy within occupational listings ...
>> use of regular expression to extract values from substrings; different
>> parts of the structured occupation code
>>
>> *R-WellFormedCsvCheck*
>>
>> This requirement was already marked as deferred; I didn't see any point
>> to change that. However, we assume that the CSV is well formed in for us to
>> process the tabular data - it's just not in scope of our charter.
>>
>> ---
>>
>> Please let me know after tomorrow if you collectively think
>> *R-CommentLines *and *R-CommentLines* should be marked as ACCEPTED and
>> that *R-CellMicrosyntax* be marked as PARTIALLY ACCEPTED.
>>
>> BR, Jeremy
>>
>>
>>
>>
>>
>> On Tue, 9 Feb 2016 at 17:14 Ivan Herman <ivan@w3.org> wrote:
>>
>>> Jeremy,
>>>
>>>
>>> only minor comments on the notes
>>>
>>> <snip>
>>>
>>>
>>> *R-CsvToJsonTransformation*
>>>
>>> *Ability to transform a CSV into JSON*
>>>
>>>
>>> [ACCEPTED]
>>>
>>> [comment that [CSV2JSON] specifies the transformation of an annotated
>>> table to JSON; providing both _minimal mode_, where JSON output includes
>>> objects derived from the data within the annotated table, and _standard
>>> mode_, where JSON output additionally includes objects describing the
>>> structure of the annotated table. Built-in datatypes from the annotated
>>> table, as specified in [tabular-data-model] 4.6 Datatypes, are converted to
>>> JSON primitive types.]
>>>
>>>
>>> Maybe worth mentioning that the transformation includes a
>>> 'prettyfication' of the output (nested objectss) and not only a flat list
>>> of relations.
>>>
>>> <snip>
>>>
>>>
>>> *R-CommentLines*
>>>
>>> *Ability to identify comment lines within a CSV file and skip over them
>>> during parsing, format conversion or other processing*
>>>
>>>
>>> [DEFERRED … non-normative]
>>>
>>>
>>> Why is this deferred? We provide the dialect description, and it is not
>>> our charter to specify the parsing, ie, the fact that it is not normative
>>> does not sound to be a problem for me...
>>>
>>> [use of _comment prefix_ as specified within a _dialect description_;
>>> default is “#” … a _dialect description_ provides ‘hints’ to parsers about
>>> how to process the tabular data file]
>>>
>>>
>>> <snip>
>>>
>>>
>>>
>>>    - 3.2.3 Data Model Requirements
>>>    <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#data-model-req>
>>>
>>> *R-CellMicrosyntax*
>>>
>>> *Ability to parse internal data structure within a cell value*
>>>
>>>
>>> [ACCEPTED? … only lists]
>>>
>>>
>>> We can have a category that says: partially accepted.
>>>
>>> Are the relevant use cases covered by what we have?  If those use cases
>>> are validating only then we are fine, and we can mention that, claiming
>>> victory… otherwise, well, that is it.
>>>
>>> [comment that support is provided for validating the format of cell
>>> values … R-SyntacticTypeDefinition:
>>>
>>>    - _Parsing Cells_: formats for numeric types (decimalChar,
>>>    groupChar, pattern), formats for booleans, formats for dates and times,
>>>    formats for durations
>>>    - formats for other types (e.g. html, json, xml and well known text
>>>    literals ‘WKT”) can be validated using a regular expression for the string
>>>    values, with syntax and processing defined by [ECMASCRIPT
>>>    <https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/#bib-ECMASCRIPT>
>>>    ]]
>>>
>>> [comment that only limited support is provided for extracting values
>>> from structured data within cells; the parsing html, json and xml etc. to
>>> extract structured data is not support; lists of values provided in a
>>> single cell are processed into arrays wherein each array item is considered
>>> to be of consistent type]
>>>
>>> [comment that list items in a given cell value are separated by the
>>> _separator_ character specified in the _dialect description_]
>>>
>>>
>>> *R-NonStandardCellDelimiter*
>>>
>>> *Ability to parse tabular data with cell delimiters other than comma (,)*
>>>
>>>
>>> [DEFERRED … non-normative]
>>>
>>>
>>> see my comment on comments (sic!). The same applied here I believe
>>>
>>> [use of _delimiter_ as specified within a _dialect description_; default
>>> is “,” … a _dialect description_ provides ‘hints’ to parsers about how to
>>> process the tabular data file]
>>>
>>>
>>> <snip>
>>>
>>> *R-WellFormedCsvCheck*
>>>
>>> *Ability to determine that a CSV is syntactically well formed*
>>>
>>>
>>> [DEFERRED]
>>>
>>>
>>>
>>> I guess the term 'deferred' is a bit pejorative here; we were not
>>> chartered to define parser behaviour!
>>>
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>
>>>
>>>
>>>
>>>
>>
Received on Friday, 12 February 2016 23:21:49 UTC