- From: Jeremy Tandy <jeremy.tandy@gmail.com>
- Date: Fri, 12 Feb 2016 23:13:40 +0000
- To: Gregg Kellogg <gregg@greggkellogg.net>
- Cc: Ivan Herman <ivan@w3.org>, Jeni Tennison <jeni@theodi.org>, Dan Brickley <danbri@google.com>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
- Message-ID: <CADtUq_0q+Y94=9VUx=fUp5YLCvSq53v4ZSTfaBJC_cnXK58fnQ@mail.gmail.com>
(previously sent only to Gregg- oops) Hi Gregg. Thanks for the feedback. Both R-CommentLines and R-NonStandardCellDelimiter are now marked as accepted, see UCR doc 3.1.1 CSV parsing requirements [1]. The descriptive note indicates the charter scope and the non-normative nature of the comment line / delimiter behaviour. I have marked R-CsvAsSubsetOfLargerDataset as partially met, see UCR doc 3.2.1 Data model requirements [2]. Basically, we only have a simple grouping mechanism, but we do support the usage of annotations from other vocabs (e.g. RDF Datacube and VoID) to describe richer relationships. Jeremy [1]: http://w3c.github.io/csvw/use-cases-and-requirements/index.html#acc-req-parsing [2]: http://w3c.github.io/csvw/use-cases-and-requirements/index.html#p-acc-req-data-model On Thu, 11 Feb 2016 at 17:40 Gregg Kellogg <gregg@greggkellogg.net> wrote: > On Feb 9, 2016, at 9:53 AM, Jeremy Tandy <jeremy.tandy@gmail.com> wrote: > > Hi Ivan- thanks for feedback. > > *R-CsvToJsonTransformation* > > Good suggestion. > > *R-CommentLines *... and ... *R-CommentLines* > > > I wasn't quite sure how to classify these. Obviously parsing is outside > the charter scope. I marked them as DEFERRED because although we provide a > mechanism, it is non-normative. That said, we've delivered what we can (and > enable these requirements to be met) even though it's outside the charter. > > > I don’t see this as being deferred, but being satisfied. The group will > never have a normative way to do this, as it’s outside the current charter > and any reasonable follow-on charter. Maybe it should just be marked as > NON-NORMATIVE. > > *R-NonStandardCellDelimiter* – same > > *R-CsvAsSubsetOfLargerDataset* – Also seems permanently out of scope, > maybe we should just reference Data on the Web Best Practices. Deffered > implies a future group my address it, which it won’t/shouldn’t. > > *R-CellMicrosyntax* > > The "partially accepted" category might be appropriate. Looking at the > motivating use cases we meet 4 out of 5 (see below). That said, the one we > don't meet requires conditional processing (see > *R-ConditionalProcessingBasedOnCellValues*) which is DEFERRED; we're not > meeting UC #24. > > The following use cases are met by our partial implementation of the > requirement: > Use Case #6 - Journal Article Solr Search Results ... date-time format, > lists of authors ... the journal title contains html markup (<i> element) - > but the use case indicates that it's OK to treat this as pure text > Use Case #11 - City of Palo Alto Tree Data ... lists of comments > delimited with semi-colon ";" > Use Case #18 - Supporting Semantic-based Recommendations ... the > 'semantic paths' are a comma delimited list of URIs; the use case doesn't > indicate that different semantics are applied to each item in the list > Use Case #20 - Integrating components with the TIBCO Spotfire platform > using tabular data ... escape sequences for special characters are not > supported, but the use case indicates that "These special characters > don't affect the parsing [...]" > > The following use case is NOT met: > Use Case #24 - Expressing a hierarchy within occupational listings ... > use of regular expression to extract values from substrings; different > parts of the structured occupation code > > *R-WellFormedCsvCheck* > > This requirement was already marked as deferred; I didn't see any point to > change that. However, we assume that the CSV is well formed in for us to > process the tabular data - it's just not in scope of our charter. > > --- > > Please let me know after tomorrow if you collectively think > *R-CommentLines *and *R-CommentLines* should be marked as ACCEPTED and > that *R-CellMicrosyntax* be marked as PARTIALLY ACCEPTED. > > BR, Jeremy > > > > > > On Tue, 9 Feb 2016 at 17:14 Ivan Herman <ivan@w3.org> wrote: > >> Jeremy, >> >> >> only minor comments on the notes >> >> <snip> >> >> >> *R-CsvToJsonTransformation* >> >> *Ability to transform a CSV into JSON* >> >> >> [ACCEPTED] >> >> [comment that [CSV2JSON] specifies the transformation of an annotated >> table to JSON; providing both _minimal mode_, where JSON output includes >> objects derived from the data within the annotated table, and _standard >> mode_, where JSON output additionally includes objects describing the >> structure of the annotated table. Built-in datatypes from the annotated >> table, as specified in [tabular-data-model] 4.6 Datatypes, are converted to >> JSON primitive types.] >> >> >> Maybe worth mentioning that the transformation includes a >> 'prettyfication' of the output (nested objectss) and not only a flat list >> of relations. >> >> <snip> >> >> >> *R-CommentLines* >> >> *Ability to identify comment lines within a CSV file and skip over them >> during parsing, format conversion or other processing* >> >> >> [DEFERRED … non-normative] >> >> >> Why is this deferred? We provide the dialect description, and it is not >> our charter to specify the parsing, ie, the fact that it is not normative >> does not sound to be a problem for me... >> >> [use of _comment prefix_ as specified within a _dialect description_; >> default is “#” … a _dialect description_ provides ‘hints’ to parsers about >> how to process the tabular data file] >> >> >> <snip> >> >> >> >> - 3.2.3 Data Model Requirements >> <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#data-model-req> >> >> *R-CellMicrosyntax* >> >> *Ability to parse internal data structure within a cell value* >> >> >> [ACCEPTED? … only lists] >> >> >> We can have a category that says: partially accepted. >> >> Are the relevant use cases covered by what we have? If those use cases >> are validating only then we are fine, and we can mention that, claiming >> victory… otherwise, well, that is it. >> >> [comment that support is provided for validating the format of cell >> values … R-SyntacticTypeDefinition: >> >> - _Parsing Cells_: formats for numeric types (decimalChar, groupChar, >> pattern), formats for booleans, formats for dates and times, formats for >> durations >> - formats for other types (e.g. html, json, xml and well known text >> literals ‘WKT”) can be validated using a regular expression for the string >> values, with syntax and processing defined by [ECMASCRIPT >> <https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/#bib-ECMASCRIPT> >> ]] >> >> [comment that only limited support is provided for extracting values from >> structured data within cells; the parsing html, json and xml etc. to >> extract structured data is not support; lists of values provided in a >> single cell are processed into arrays wherein each array item is considered >> to be of consistent type] >> >> [comment that list items in a given cell value are separated by the >> _separator_ character specified in the _dialect description_] >> >> >> *R-NonStandardCellDelimiter* >> >> *Ability to parse tabular data with cell delimiters other than comma (,)* >> >> >> [DEFERRED … non-normative] >> >> >> see my comment on comments (sic!). The same applied here I believe >> >> [use of _delimiter_ as specified within a _dialect description_; default >> is “,” … a _dialect description_ provides ‘hints’ to parsers about how to >> process the tabular data file] >> >> >> <snip> >> >> *R-WellFormedCsvCheck* >> >> *Ability to determine that a CSV is syntactically well formed* >> >> >> [DEFERRED] >> >> >> >> I guess the term 'deferred' is a bit pejorative here; we were not >> chartered to define parser behaviour! >> >> ---- >> Ivan Herman, W3C >> Digital Publishing Lead >> Home: http://www.w3.org/People/Ivan/ >> mobile: +31-641044153 >> ORCID ID: http://orcid.org/0000-0003-0782-2704 >> >> >> >> >> >
Received on Friday, 12 February 2016 23:14:22 UTC