- From: Jeremy Tandy <jeremy.tandy@gmail.com>
- Date: Fri, 12 Feb 2016 23:21:10 +0000
- To: Gregg Kellogg <gregg@greggkellogg.net>
- Cc: Ivan Herman <ivan@w3.org>, Jeni Tennison <jeni@theodi.org>, Dan Brickley <danbri@google.com>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
- Message-ID: <CADtUq_2Hh=0DBYV8vD-pxq7+EZH-XjhmySEM0eu=i=_LxJp_+g@mail.gmail.com>
Hi - I have 'finished' updating the UCR doc to my satisfaction and closed the outstanding issues relating to the UCR doc. There is one outstanding question: how do I reference the Primer? I've raised ISSUE #809 <https://github.com/w3c/csvw/issues/809> to make sure I don't forget this! I've done a proof read. I'm satisfied ... but would appreciate your views on the content and categorisation. I've also remembered to add a 'changes' section [1]. My hope is that if you tell me stuff that needs to be changed, I can get it fixed before Wednesday and we can vote to release. Jeremy BTW: I've made no content changes to the 'use case' part of the doc - only fixed some unclosed html elements and improved internal formatting. [1]: http://w3c.github.io/csvw/use-cases-and-requirements/index.html#changes On Fri, 12 Feb 2016 at 23:13 Jeremy Tandy <jeremy.tandy@gmail.com> wrote: > (previously sent only to Gregg- oops) > > Hi Gregg. > > Thanks for the feedback. Both R-CommentLines and > R-NonStandardCellDelimiter are now marked as accepted, see UCR doc 3.1.1 > CSV parsing requirements [1]. The descriptive note indicates the charter > scope and the non-normative nature of the comment line / delimiter > behaviour. > > I have marked R-CsvAsSubsetOfLargerDataset as partially met, see UCR doc > 3.2.1 Data model requirements [2]. Basically, we only have a simple > grouping mechanism, but we do support the usage of annotations from other > vocabs (e.g. RDF Datacube and VoID) to describe richer relationships. > > Jeremy > > [1]: > http://w3c.github.io/csvw/use-cases-and-requirements/index.html#acc-req-parsing > [2]: > http://w3c.github.io/csvw/use-cases-and-requirements/index.html#p-acc-req-data-model > > > On Thu, 11 Feb 2016 at 17:40 Gregg Kellogg <gregg@greggkellogg.net> wrote: > >> On Feb 9, 2016, at 9:53 AM, Jeremy Tandy <jeremy.tandy@gmail.com> wrote: >> >> Hi Ivan- thanks for feedback. >> >> *R-CsvToJsonTransformation* >> >> Good suggestion. >> >> *R-CommentLines *... and ... *R-CommentLines* >> >> >> I wasn't quite sure how to classify these. Obviously parsing is outside >> the charter scope. I marked them as DEFERRED because although we provide a >> mechanism, it is non-normative. That said, we've delivered what we can (and >> enable these requirements to be met) even though it's outside the charter. >> >> >> I don’t see this as being deferred, but being satisfied. The group will >> never have a normative way to do this, as it’s outside the current charter >> and any reasonable follow-on charter. Maybe it should just be marked as >> NON-NORMATIVE. >> >> *R-NonStandardCellDelimiter* – same >> >> *R-CsvAsSubsetOfLargerDataset* – Also seems permanently out of scope, >> maybe we should just reference Data on the Web Best Practices. Deffered >> implies a future group my address it, which it won’t/shouldn’t. >> >> *R-CellMicrosyntax* >> >> The "partially accepted" category might be appropriate. Looking at the >> motivating use cases we meet 4 out of 5 (see below). That said, the one we >> don't meet requires conditional processing (see >> *R-ConditionalProcessingBasedOnCellValues*) which is DEFERRED; we're not >> meeting UC #24. >> >> The following use cases are met by our partial implementation of the >> requirement: >> Use Case #6 - Journal Article Solr Search Results ... date-time format, >> lists of authors ... the journal title contains html markup (<i> element) - >> but the use case indicates that it's OK to treat this as pure text >> Use Case #11 - City of Palo Alto Tree Data ... lists of comments >> delimited with semi-colon ";" >> Use Case #18 - Supporting Semantic-based Recommendations ... the >> 'semantic paths' are a comma delimited list of URIs; the use case doesn't >> indicate that different semantics are applied to each item in the list >> Use Case #20 - Integrating components with the TIBCO Spotfire platform >> using tabular data ... escape sequences for special characters are not >> supported, but the use case indicates that "These special characters >> don't affect the parsing [...]" >> >> The following use case is NOT met: >> Use Case #24 - Expressing a hierarchy within occupational listings ... >> use of regular expression to extract values from substrings; different >> parts of the structured occupation code >> >> *R-WellFormedCsvCheck* >> >> This requirement was already marked as deferred; I didn't see any point >> to change that. However, we assume that the CSV is well formed in for us to >> process the tabular data - it's just not in scope of our charter. >> >> --- >> >> Please let me know after tomorrow if you collectively think >> *R-CommentLines *and *R-CommentLines* should be marked as ACCEPTED and >> that *R-CellMicrosyntax* be marked as PARTIALLY ACCEPTED. >> >> BR, Jeremy >> >> >> >> >> >> On Tue, 9 Feb 2016 at 17:14 Ivan Herman <ivan@w3.org> wrote: >> >>> Jeremy, >>> >>> >>> only minor comments on the notes >>> >>> <snip> >>> >>> >>> *R-CsvToJsonTransformation* >>> >>> *Ability to transform a CSV into JSON* >>> >>> >>> [ACCEPTED] >>> >>> [comment that [CSV2JSON] specifies the transformation of an annotated >>> table to JSON; providing both _minimal mode_, where JSON output includes >>> objects derived from the data within the annotated table, and _standard >>> mode_, where JSON output additionally includes objects describing the >>> structure of the annotated table. Built-in datatypes from the annotated >>> table, as specified in [tabular-data-model] 4.6 Datatypes, are converted to >>> JSON primitive types.] >>> >>> >>> Maybe worth mentioning that the transformation includes a >>> 'prettyfication' of the output (nested objectss) and not only a flat list >>> of relations. >>> >>> <snip> >>> >>> >>> *R-CommentLines* >>> >>> *Ability to identify comment lines within a CSV file and skip over them >>> during parsing, format conversion or other processing* >>> >>> >>> [DEFERRED … non-normative] >>> >>> >>> Why is this deferred? We provide the dialect description, and it is not >>> our charter to specify the parsing, ie, the fact that it is not normative >>> does not sound to be a problem for me... >>> >>> [use of _comment prefix_ as specified within a _dialect description_; >>> default is “#” … a _dialect description_ provides ‘hints’ to parsers about >>> how to process the tabular data file] >>> >>> >>> <snip> >>> >>> >>> >>> - 3.2.3 Data Model Requirements >>> <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#data-model-req> >>> >>> *R-CellMicrosyntax* >>> >>> *Ability to parse internal data structure within a cell value* >>> >>> >>> [ACCEPTED? … only lists] >>> >>> >>> We can have a category that says: partially accepted. >>> >>> Are the relevant use cases covered by what we have? If those use cases >>> are validating only then we are fine, and we can mention that, claiming >>> victory… otherwise, well, that is it. >>> >>> [comment that support is provided for validating the format of cell >>> values … R-SyntacticTypeDefinition: >>> >>> - _Parsing Cells_: formats for numeric types (decimalChar, >>> groupChar, pattern), formats for booleans, formats for dates and times, >>> formats for durations >>> - formats for other types (e.g. html, json, xml and well known text >>> literals ‘WKT”) can be validated using a regular expression for the string >>> values, with syntax and processing defined by [ECMASCRIPT >>> <https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/#bib-ECMASCRIPT> >>> ]] >>> >>> [comment that only limited support is provided for extracting values >>> from structured data within cells; the parsing html, json and xml etc. to >>> extract structured data is not support; lists of values provided in a >>> single cell are processed into arrays wherein each array item is considered >>> to be of consistent type] >>> >>> [comment that list items in a given cell value are separated by the >>> _separator_ character specified in the _dialect description_] >>> >>> >>> *R-NonStandardCellDelimiter* >>> >>> *Ability to parse tabular data with cell delimiters other than comma (,)* >>> >>> >>> [DEFERRED … non-normative] >>> >>> >>> see my comment on comments (sic!). The same applied here I believe >>> >>> [use of _delimiter_ as specified within a _dialect description_; default >>> is “,” … a _dialect description_ provides ‘hints’ to parsers about how to >>> process the tabular data file] >>> >>> >>> <snip> >>> >>> *R-WellFormedCsvCheck* >>> >>> *Ability to determine that a CSV is syntactically well formed* >>> >>> >>> [DEFERRED] >>> >>> >>> >>> I guess the term 'deferred' is a bit pejorative here; we were not >>> chartered to define parser behaviour! >>> >>> ---- >>> Ivan Herman, W3C >>> Digital Publishing Lead >>> Home: http://www.w3.org/People/Ivan/ >>> mobile: +31-641044153 >>> ORCID ID: http://orcid.org/0000-0003-0782-2704 >>> >>> >>> >>> >>> >>
Received on Friday, 12 February 2016 23:21:49 UTC