- From: Jeremy Tandy <jeremy.tandy@gmail.com>
- Date: Sat, 06 Feb 2016 16:32:03 +0000
- To: "public-csv-wg@w3.org" <public-csv-wg@w3.org>
- Message-ID: <CADtUq_2xxHvvt3D30wDifRnMVNr0Smf-5eOxm9_nFJMAuQ5Yxg@mail.gmail.com>
Hi. You'll see that I've updated ISSUE #539 <https://github.com/w3c/csvw/issues/539>, identifying which candidate requirements should be accepted, and those which should be marked as deferred. Please shout if you think my classification should be changed :-) I intend to update the UCR document over the next few days in time for the call on Wednesday (hoping I can find time around the edges of another WG meeting I'm attending next week!) I will not be able to make the call myself ... FWIW, I've included my notes relating to each requirement so that you will get an idea of the amendments I'm planning to make. Jeremy ---- *CSVW Requirements cross reference* 3. Requirements <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#req> - 3.1 Accepted Requirements <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#acc-req> - 3.1.1 Requirements relating to parsing of CSV <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#acc-req-parsing> *R-RightToLeftCsvDeclaration* *Ability to determine that a CSV should be rendered using RTL column ordering and RTL text direction in cells.* [ACCEPTED] [It is possible to set the column direction using the tableDirection <http://w3c.github.io/csvw/metadata/#tableDirection> property and the text direction on columns using the textDirection <http://w3c.github.io/csvw/metadata/#cell-textDirection> property, as defined in [tabular-metadata <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#bib-tabular-metadata> ]] - 3.2 Candidate Requirements <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#can-req> - 3.2.1 Requirements relating to applications <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#can-req-applications> *R-CsvValidation* *Ability to validate a CSV for conformance with a specified metadata definition* [ACCEPTED] [comment on validating tables; table compatibility (correct number of non-virtual columns, matching names/titles for those columns where specified in header row), primary key uniqueness, missing foreign key references, cell validation] [comment on validating cells; parsing cells (_datatype_ parsing), length constraints and value constraints] *R-CsvToRdfTransformation* *Ability to transform a CSV into RDF* [ACCEPTED] [comment that [CSV2RDF] specifies the transformation of an annotated table to RDF; providing both _minimal mode_, where RDF output includes triples derived from the data within the annotated table, and _standard mode_, where RDF output additionally includes triples describing the structure of the annotated table.] [comment that built-in types are limited to those defined in [tabular-data-model] 4.6 Datatypes; geo:wktLiteral and other types from [geosparql] are not supported natively.] *R-CsvToJsonTransformation* *Ability to transform a CSV into JSON* [ACCEPTED] [comment that [CSV2JSON] specifies the transformation of an annotated table to JSON; providing both _minimal mode_, where JSON output includes objects derived from the data within the annotated table, and _standard mode_, where JSON output additionally includes objects describing the structure of the annotated table. Built-in datatypes from the annotated table, as specified in [tabular-data-model] 4.6 Datatypes, are converted to JSON primitive types.] *R-CsvToXmlTransformation* *Ability to transform a CSV into XML* [DEFERRED] [The charter of the Working Group ( http://www.w3.org/2013/05/lcsv-charter.html) includes a work item for CSV to XML conversion. Given that there is only a single use case providing motivation for this requirement, and that the Working Group was unable to find XML experts to assist in delivery of this work item, the Working Group were forced to abandon this deliverable.] *R-CanonicalMappingInLieuOfAnnotation* *Ability to transform CSV conforming to the core tabular data model yet lacking further annotation into a object / object graph serialisation* [ACCEPTED] [comment that an annotated table is always generated by applications implementing this specification when processing tabular data; albeit that those annotations are limited. The _titles_ annotation may be populated from column headings provided within the tabular data file. Transformations to both RDF and JSON operate on the annotated table, and are, therefore, unaffected by the use of a tabular metadata file to provide additional annotations.] *R-IndependentMetadataPublication* *Ability to publish metadata independently from the tabular data resource it describes* [ACCEPTED] [comment that [tabular-metadata] specifies the format and structure of a metadata file that may be used to provide annotations on an annotated table or group of tables.] *R-SpecificationOfPropertyValuePairForEachRow* *Ability to define a property-value pair for inclusion in each row.* [ACCEPTED] [comment that to meet this requirement a _virtual column_ must be specified for the additional property-value pair that is to be included in each row. The _default_ annotation may be used to provide the value for every row, or the _value URL_ annotation may be used to specify a URI Template, as defined in [RFC6570], that is evaluated for each row] *R-ConditionalProcessingBasedOnCellValues* *Ability to apply conditional processing based on the value of a specific cell* [DEFERRED] [comment on use of _transformation definitions_ that define how a script or template may be used to provide such conditional processing; also that the output from JSON or RDF transformation may be subjected to post-processing to achieve the desired outcome. Details of these transformation scripts / templates and post processing is outside the scope of this specification] *R-CommentLines* *Ability to identify comment lines within a CSV file and skip over them during parsing, format conversion or other processing* [DEFERRED … non-normative] [use of _comment prefix_ as specified within a _dialect description_; default is “#” … a _dialect description_ provides ‘hints’ to parsers about how to process the tabular data file] - 3.2.2 Non-functional requirements <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#can-req-non-func> *R-ZeroEditAdditionOfSupplementaryMetadata* *Ability to add supplementary metadata to an existing CSV file without requiring modification of that file* [ACCEPTED] [comment on use of complementary metadata document containing annotations for tabular data; as specified in REC-metadata … Applications MAY provide alternative mechanisms to gather the annotations on an _annotated table_ or _group of tables_] - 3.2.3 Data Model Requirements <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#data-model-req> *R-CellMicrosyntax* *Ability to parse internal data structure within a cell value* [ACCEPTED? … only lists] [comment that support is provided for validating the format of cell values … R-SyntacticTypeDefinition: - _Parsing Cells_: formats for numeric types (decimalChar, groupChar, pattern), formats for booleans, formats for dates and times, formats for durations - formats for other types (e.g. html, json, xml and well known text literals ‘WKT”) can be validated using a regular expression for the string values, with syntax and processing defined by [ECMASCRIPT <https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/#bib-ECMASCRIPT> ]] [comment that only limited support is provided for extracting values from structured data within cells; the parsing html, json and xml etc. to extract structured data is not support; lists of values provided in a single cell are processed into arrays wherein each array item is considered to be of consistent type] [comment that list items in a given cell value are separated by the _separator_ character specified in the _dialect description_] *R-NonStandardCellDelimiter* *Ability to parse tabular data with cell delimiters other than comma (,)* [DEFERRED … non-normative] [use of _delimiter_ as specified within a _dialect description_; default is “,” … a _dialect description_ provides ‘hints’ to parsers about how to process the tabular data file] *R-PrimaryKey* *Ability to determine the primary key for rows within a tabular data file* [ACCEPTED] [comment on use of the _primaryKey_ annotation; a primary key may be compiled from multiple cell values in a given row] *R-ForeignKeyReferences* *Ability to cross reference between CSV files* [ACCEPTED] [comment on use of the _foreign key_ annotation on an annotated table for validation purposes; any cell value in a column referenced by the foreign key statement must have a unique value in the column of the referenced table] [comment that references between resources may be asserted, irrespective of whether the resource is listed elsewhere in another table, may be created by converting local identifiers into URIs using URI templates; the _value URL_ annotation can be used to refer to a resource and the _about URL_ annotation used to identify a resource. Referenced resources do not need to be specified in an annotated table at all] *R-AnnotationAndSupplementaryInfo* *Ability to add annotation and supplementary information to CSV file* [ACCEPTED] [comment that any annotation may be used in addition to _core annotations_ specified in this specification, such as title, author, license etc.; these are referred to as _common properties_; see 5.8 Common Properties for more details] [comment on use of _notes_ annotation for tables and groups of tables; these may be used to provide any number of additional annotations for a table or group of tables; such annotations are interpreted in the same way as _common properties_] [comment that the Web Annotation Working Group <http://www.w3.org/annotation/> is developing a vocabulary for expressing annotations; for example, see CSV2RDF 7.2 Example with single table and rich annotations] *R-AssociationOfCodeValuesWithExternalDefinitions* *Ability to associate a code value with externally managed definition* [ACCEPTED] [comment that an identifier referenced a cell value may either be mapped to a URL that can be resolved to provide a definition for the identified resource, or a foreign key reference can be asserted to another table published in the same group of tables where the definition associated with the identifier could be provided] *R-CsvAsSubsetOfLargerDataset* *Ability to assert how a single CSV file is a facet or subset of a larger dataset* [DEFERRED] [comment that this specification does not provide any description of the relationship between tables beyond their membership in a given _group of tables_; other specifications such as [RDF Data Cube] and [VoID] provide mechanisms to describe subsets of data that may be of use in meeting this requirement. Such descriptions can be included as metadata annotations in the form of _notes_ or _common properties_] *R-SyntacticTypeDefinition* *Ability to declare syntactic type for cells within a specified column.* [ACCEPTED] [comment that syntactic type for a cell value is defined using the _datatype_ annotation; built-in datatypes include those defined in [ xmlschema11-2 <https://www.w3.org/TR/2015/REC-tabular-metadata-20151217/#bib-xmlschema11-2>], plus number, binary, datetime, any, xml, html and json. Datatypes can be derived from the built-in datatypes using further annotations; refer to 5.11.2 Derived datatypes for further details] *R-SemanticTypeDefinition* *Ability to declare semantic type for cells within a specified column.* [ACCEPTED] [comment that the identifier for the semantic type associated with a given cell value can be specified using the _property URL_ annotation (a URI template property); this is normally specified for the column and inherited by all the cells within that column] *R-MissingValueDefinition* *Ability to declare a "missing value" token and, optionally, a reason for the value to be missing* [ACCEPTED] [comment that the string (or strings) representing missing values in an annotated table is defined using the _null_ annotation] *R-URIMapping* *Ability to map cell values within a given column into corresponding URI* [ACCEPTED] [comment that a URI Template, as defined in [RFC 6570], can be specified to map the value of a cell to a URI using the _value URL_ annotation] *R-UnitMeasureDefinition* *Ability identify/express the unit of measure for the values reported in a given column.* [<< requirement needs additional description >>] [ACCEPTED] [comment that this specification provides no native mechanism for expressing the unit of measurement associated with values of cells in a column; for example, stating that the floating-point numbers in a column with name “distance” are provided in kilometers. However, annotations may be used to provide this additional information. The [CSVW Primer] provides examples of how this may be achieved ( http://w3c.github.io/csvw/primer/#how-do-you-support-units-of-measure); from providing descriptive metadata to enabling transformation of cell values to structured data with unit of measurement statements. The [RDF Data Cube vocabulary] provides another alternative for annotations; structural metadata is used to provide metadata to interpret data values - such as the unit of measurement.] *R-GroupingOfMultipleTables* *Ability to group multiple data tables into a single package for publication* [ACCEPTED] [comment that _group of tables_ ( https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/#dfn-group-of-tables) is a first class entity within the tabular data model; comprising a set of annotated tables and a set of annotations that relate to that group of tables.] *R-LinkFromMetadataToData* *Ability for a metadata description to explicitly cite the tabular dataset it describes* [ACCEPTED] [comment that in addition to providing mechanisms to locate metadata relating to a tabular data file, see [tabular-data-model] (#locating-metadata), the table annotation _url_ allows the URL of the source of the data in the annotated table to be defined; for example, referring to a specific CSV file] *R-MultilingualContent* *Ability to declare a locale / language for content in a specified column* [ACCEPTED] [comment that the annotation _lang_ may be used to express the code for the expected language for values of cells in a particular column, expressed in a format defined by [BCP47 <https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/#bib-BCP47>]. Furthermore, the annotation _titles_ allows for any number of human-readable titles to be given for a column, each of which may have an associated language code as defined by [BCP47 <https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/#bib-BCP47>].] *R-RepeatedProperties* *Ability to provide multiple values of a given property for a single entity described within a tabular data file* [ACCEPTED] [comment that within an annotated tables, the values of cells can be considered as RDF subject-predicate-object triples [rdf11-concepts <http://w3c.github.io/csvw/publishing-snapshots/REC-csv2rdf/Overview.html#bib-rdf11-concepts>]. The annotation _about URL_ may be used to define the subject of the triple derived from a cell, and, where the same _about URL_ annotation is used for every cell within a row, the resource identified by the _about URL_ annotation can be considered to be the subject of the row. The same _about URL_ annotation may be used to describe cells in more than one row. Similarly, the _property URL_ annotation may be used to define the predicate of the triple. The same _property URL_ annotation may be used to describe multiple columns, meaning that multiple values of a property for may be provided from a series of columns.] [comment that arrays of values may be supplied within a cell value; values in the array are delimited using the character specified using the _separator_ annotation within a _dialect description_] - 3.3 Deferred requirements <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#deferred-req> - 3.3.1 Requirements relating to parsing of CSV <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#deferred-req-parsing> *R-WellFormedCsvCheck* *Ability to determine that a CSV is syntactically well formed* [DEFERRED] *R-MultipleHeadingRows* *Ability to handle headings spread across multiple initial rows, as well as to distinguish between single column headings and file headings.* [DEFERRED] *R-TableNormalization* *Ability to transform data that is published in a normalized form into tabular data.* [DEFERRED] - 3.3.2 Requirements relating to applications <http://w3c.github.io/csvw/use-cases-and-requirements/index.html#deferred-req-applications> *R-RandomAccess* *Ability to access and/or extract part of a CSV file in a non-sequential manner.* [DEFERRED]
Received on Saturday, 6 February 2016 16:32:46 UTC