Re: Cleaning up the requirements in the UCR doc from Jeremy Tandy on 2016-02-09 (public-csv-wg@w3.org from February 2016)

From: Jeremy Tandy <jeremy.tandy@gmail.com>
Date: Tue, 09 Feb 2016 17:53:05 +0000
To: Ivan Herman <ivan@w3.org>
Cc: Jeni Tennison <jeni@theodi.org>, Dan Brickley <danbri@google.com>, Gregg Kellogg <gregg@greggkellogg.net>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-ID: <CADtUq_0Fx3bjNW-3fGgsedKduSu2Y=-1p87eTScmcFt7iUzEDA@mail.gmail.com>
Hi Ivan- thanks for feedback.

*R-CsvToJsonTransformation*

Good suggestion.

*R-CommentLines *... and ... *R-CommentLines*


I wasn't quite sure how to classify these. Obviously parsing is outside the
charter scope. I marked them as DEFERRED because although we provide a
mechanism, it is non-normative. That said, we've delivered what we can (and
enable these requirements to be met) even though it's outside the charter.

*R-CellMicrosyntax*

The "partially accepted" category might be appropriate. Looking at the
motivating use cases we meet 4 out of 5 (see below). That said, the one we
don't meet requires conditional processing (see
*R-ConditionalProcessingBasedOnCellValues*) which is DEFERRED; we're not
meeting UC #24.

The following use cases are met by our partial implementation of the
requirement:
Use Case #6 - Journal Article Solr Search Results ... date-time format,
lists of authors ... the journal title contains html markup (<i> element) -
but the use case indicates that it's OK to treat this as pure text
Use Case #11 - City of Palo Alto Tree Data ... lists of comments delimited
with semi-colon ";"
Use Case #18 - Supporting Semantic-based Recommendations ... the 'semantic
paths' are a comma delimited list of URIs; the use case doesn't indicate
that different semantics are applied to each item in the list
Use Case #20 - Integrating components with the TIBCO Spotfire platform
using tabular data ... escape sequences for special characters are not
supported, but the use case indicates that "These special characters don't
affect the parsing [...]"

The following use case is NOT met:
Use Case #24 - Expressing a hierarchy within occupational listings ... use
of regular expression to extract values from substrings; different parts of
the structured occupation code

*R-WellFormedCsvCheck*

This requirement was already marked as deferred; I didn't see any point to
change that. However, we assume that the CSV is well formed in for us to
process the tabular data - it's just not in scope of our charter.

---

Please let me know after tomorrow if you collectively think
*R-CommentLines *and *R-CommentLines* should be marked as ACCEPTED and that
*R-CellMicrosyntax* be marked as PARTIALLY ACCEPTED.

BR, Jeremy





On Tue, 9 Feb 2016 at 17:14 Ivan Herman <ivan@w3.org> wrote:

> Jeremy,
>
>
> only minor comments on the notes
>
> <snip>
>
>
> *R-CsvToJsonTransformation*
>
> *Ability to transform a CSV into JSON*
>
>
> [ACCEPTED]
>
> [comment that [CSV2JSON] specifies the transformation of an annotated
> table to JSON; providing both _minimal mode_, where JSON output includes
> objects derived from the data within the annotated table, and _standard
> mode_, where JSON output additionally includes objects describing the
> structure of the annotated table. Built-in datatypes from the annotated
> table, as specified in [tabular-data-model] 4.6 Datatypes, are converted to
> JSON primitive types.]
>
>
> Maybe worth mentioning that the transformation includes a 'prettyfication'
> of the output (nested objectss) and not only a flat list of relations.
>
> <snip>
>
>
> *R-CommentLines*
>
> *Ability to identify comment lines within a CSV file and skip over them
> during parsing, format conversion or other processing*
>
>
> [DEFERRED … non-normative]
>
>
> Why is this deferred? We provide the dialect description, and it is not
> our charter to specify the parsing, ie, the fact that it is not normative
> does not sound to be a problem for me...
>
> [use of _comment prefix_ as specified within a _dialect description_;
> default is “#” … a _dialect description_ provides ‘hints’ to parsers about
> how to process the tabular data file]
>
>
> <snip>
>
>
>
>    - 3.2.3 Data Model Requirements
>    <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#data-model-req>
>
> *R-CellMicrosyntax*
>
> *Ability to parse internal data structure within a cell value*
>
>
> [ACCEPTED? … only lists]
>
>
> We can have a category that says: partially accepted.
>
> Are the relevant use cases covered by what we have?  If those use cases
> are validating only then we are fine, and we can mention that, claiming
> victory… otherwise, well, that is it.
>
> [comment that support is provided for validating the format of cell values
> … R-SyntacticTypeDefinition:
>
>    - _Parsing Cells_: formats for numeric types (decimalChar, groupChar,
>    pattern), formats for booleans, formats for dates and times, formats for
>    durations
>    - formats for other types (e.g. html, json, xml and well known text
>    literals ‘WKT”) can be validated using a regular expression for the string
>    values, with syntax and processing defined by [ECMASCRIPT
>    <https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/#bib-ECMASCRIPT>
>    ]]
>
> [comment that only limited support is provided for extracting values from
> structured data within cells; the parsing html, json and xml etc. to
> extract structured data is not support; lists of values provided in a
> single cell are processed into arrays wherein each array item is considered
> to be of consistent type]
>
> [comment that list items in a given cell value are separated by the
> _separator_ character specified in the _dialect description_]
>
>
> *R-NonStandardCellDelimiter*
>
> *Ability to parse tabular data with cell delimiters other than comma (,)*
>
>
> [DEFERRED … non-normative]
>
>
> see my comment on comments (sic!). The same applied here I believe
>
> [use of _delimiter_ as specified within a _dialect description_; default
> is “,” … a _dialect description_ provides ‘hints’ to parsers about how to
> process the tabular data file]
>
>
> <snip>
>
> *R-WellFormedCsvCheck*
>
> *Ability to determine that a CSV is syntactically well formed*
>
>
> [DEFERRED]
>
>
>
> I guess the term 'deferred' is a bit pejorative here; we were not
> chartered to define parser behaviour!
>
> ----
> Ivan Herman, W3C
> Digital Publishing Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> ORCID ID: http://orcid.org/0000-0003-0782-2704
>
>
>
>
>
Received on Tuesday, 9 February 2016 17:53:43 UTC