W3C home > Mailing lists > Public > public-csv-wg@w3.org > February 2016

Re: Cleaning up the requirements in the UCR doc

From: Jeremy Tandy <jeremy.tandy@gmail.com>
Date: Fri, 12 Feb 2016 23:13:40 +0000
Message-ID: <CADtUq_0q+Y94=9VUx=fUp5YLCvSq53v4ZSTfaBJC_cnXK58fnQ@mail.gmail.com>
To: Gregg Kellogg <gregg@greggkellogg.net>
Cc: Ivan Herman <ivan@w3.org>, Jeni Tennison <jeni@theodi.org>, Dan Brickley <danbri@google.com>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
(previously sent only to Gregg- oops)

Hi Gregg.

Thanks for the feedback. Both R-CommentLines and R-NonStandardCellDelimiter
are now marked as accepted, see UCR doc 3.1.1 CSV parsing requirements [1].
The descriptive note indicates the charter scope and the non-normative
nature of the comment line / delimiter behaviour.

I have marked R-CsvAsSubsetOfLargerDataset as partially met, see UCR doc
3.2.1 Data model requirements [2]. Basically, we only have a simple
grouping mechanism, but we do support the usage of annotations from other
vocabs (e.g. RDF Datacube and VoID) to describe richer relationships.

Jeremy

[1]:
http://w3c.github.io/csvw/use-cases-and-requirements/index.html#acc-req-parsing
[2]:
http://w3c.github.io/csvw/use-cases-and-requirements/index.html#p-acc-req-data-model


On Thu, 11 Feb 2016 at 17:40 Gregg Kellogg <gregg@greggkellogg.net> wrote:

> On Feb 9, 2016, at 9:53 AM, Jeremy Tandy <jeremy.tandy@gmail.com> wrote:
>
> Hi Ivan- thanks for feedback.
>
> *R-CsvToJsonTransformation*
>
> Good suggestion.
>
> *R-CommentLines *... and ... *R-CommentLines*
>
>
> I wasn't quite sure how to classify these. Obviously parsing is outside
> the charter scope. I marked them as DEFERRED because although we provide a
> mechanism, it is non-normative. That said, we've delivered what we can (and
> enable these requirements to be met) even though it's outside the charter.
>
>
> I don’t see this as being deferred, but being satisfied. The group will
> never have a normative way to do this, as it’s outside the current charter
> and any reasonable follow-on charter. Maybe it should just be marked as
> NON-NORMATIVE.
>
> *R-NonStandardCellDelimiter* – same
>
> *R-CsvAsSubsetOfLargerDataset* – Also seems permanently out of scope,
> maybe we should just reference Data on the Web Best Practices. Deffered
> implies a future group my address it, which it won’t/shouldn’t.
>
> *R-CellMicrosyntax*
>
> The "partially accepted" category might be appropriate. Looking at the
> motivating use cases we meet 4 out of 5 (see below). That said, the one we
> don't meet requires conditional processing (see
> *R-ConditionalProcessingBasedOnCellValues*) which is DEFERRED; we're not
> meeting UC #24.
>
> The following use cases are met by our partial implementation of the
> requirement:
> Use Case #6 - Journal Article Solr Search Results ... date-time format,
> lists of authors ... the journal title contains html markup (<i> element) -
> but the use case indicates that it's OK to treat this as pure text
> Use Case #11 - City of Palo Alto Tree Data ... lists of comments
> delimited with semi-colon ";"
> Use Case #18 - Supporting Semantic-based Recommendations ... the
> 'semantic paths' are a comma delimited list of URIs; the use case doesn't
> indicate that different semantics are applied to each item in the list
> Use Case #20 - Integrating components with the TIBCO Spotfire platform
> using tabular data ... escape sequences for special characters are not
> supported, but the use case indicates that "These special characters
> don't affect the parsing [...]"
>
> The following use case is NOT met:
> Use Case #24 - Expressing a hierarchy within occupational listings ...
> use of regular expression to extract values from substrings; different
> parts of the structured occupation code
>
> *R-WellFormedCsvCheck*
>
> This requirement was already marked as deferred; I didn't see any point to
> change that. However, we assume that the CSV is well formed in for us to
> process the tabular data - it's just not in scope of our charter.
>
> ---
>
> Please let me know after tomorrow if you collectively think
> *R-CommentLines *and *R-CommentLines* should be marked as ACCEPTED and
> that *R-CellMicrosyntax* be marked as PARTIALLY ACCEPTED.
>
> BR, Jeremy
>
>
>
>
>
> On Tue, 9 Feb 2016 at 17:14 Ivan Herman <ivan@w3.org> wrote:
>
>> Jeremy,
>>
>>
>> only minor comments on the notes
>>
>> <snip>
>>
>>
>> *R-CsvToJsonTransformation*
>>
>> *Ability to transform a CSV into JSON*
>>
>>
>> [ACCEPTED]
>>
>> [comment that [CSV2JSON] specifies the transformation of an annotated
>> table to JSON; providing both _minimal mode_, where JSON output includes
>> objects derived from the data within the annotated table, and _standard
>> mode_, where JSON output additionally includes objects describing the
>> structure of the annotated table. Built-in datatypes from the annotated
>> table, as specified in [tabular-data-model] 4.6 Datatypes, are converted to
>> JSON primitive types.]
>>
>>
>> Maybe worth mentioning that the transformation includes a
>> 'prettyfication' of the output (nested objectss) and not only a flat list
>> of relations.
>>
>> <snip>
>>
>>
>> *R-CommentLines*
>>
>> *Ability to identify comment lines within a CSV file and skip over them
>> during parsing, format conversion or other processing*
>>
>>
>> [DEFERRED … non-normative]
>>
>>
>> Why is this deferred? We provide the dialect description, and it is not
>> our charter to specify the parsing, ie, the fact that it is not normative
>> does not sound to be a problem for me...
>>
>> [use of _comment prefix_ as specified within a _dialect description_;
>> default is “#” … a _dialect description_ provides ‘hints’ to parsers about
>> how to process the tabular data file]
>>
>>
>> <snip>
>>
>>
>>
>>    - 3.2.3 Data Model Requirements
>>    <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#data-model-req>
>>
>> *R-CellMicrosyntax*
>>
>> *Ability to parse internal data structure within a cell value*
>>
>>
>> [ACCEPTED? … only lists]
>>
>>
>> We can have a category that says: partially accepted.
>>
>> Are the relevant use cases covered by what we have?  If those use cases
>> are validating only then we are fine, and we can mention that, claiming
>> victory… otherwise, well, that is it.
>>
>> [comment that support is provided for validating the format of cell
>> values … R-SyntacticTypeDefinition:
>>
>>    - _Parsing Cells_: formats for numeric types (decimalChar, groupChar,
>>    pattern), formats for booleans, formats for dates and times, formats for
>>    durations
>>    - formats for other types (e.g. html, json, xml and well known text
>>    literals ‘WKT”) can be validated using a regular expression for the string
>>    values, with syntax and processing defined by [ECMASCRIPT
>>    <https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/#bib-ECMASCRIPT>
>>    ]]
>>
>> [comment that only limited support is provided for extracting values from
>> structured data within cells; the parsing html, json and xml etc. to
>> extract structured data is not support; lists of values provided in a
>> single cell are processed into arrays wherein each array item is considered
>> to be of consistent type]
>>
>> [comment that list items in a given cell value are separated by the
>> _separator_ character specified in the _dialect description_]
>>
>>
>> *R-NonStandardCellDelimiter*
>>
>> *Ability to parse tabular data with cell delimiters other than comma (,)*
>>
>>
>> [DEFERRED … non-normative]
>>
>>
>> see my comment on comments (sic!). The same applied here I believe
>>
>> [use of _delimiter_ as specified within a _dialect description_; default
>> is “,” … a _dialect description_ provides ‘hints’ to parsers about how to
>> process the tabular data file]
>>
>>
>> <snip>
>>
>> *R-WellFormedCsvCheck*
>>
>> *Ability to determine that a CSV is syntactically well formed*
>>
>>
>> [DEFERRED]
>>
>>
>>
>> I guess the term 'deferred' is a bit pejorative here; we were not
>> chartered to define parser behaviour!
>>
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>
>>
>>
>>
>>
>
Received on Friday, 12 February 2016 23:14:22 UTC

This archive was generated by hypermail 2.3.1 : Friday, 12 February 2016 23:14:23 UTC