Re: Request for comments about requirements from Ivan Herman on 2014-05-21 (public-csv-wg@w3.org from May 2014)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 21 May 2014 11:21:37 +0200
To: "Ceolin, D." <d.ceolin@vu.nl>
Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-Id: <FF8139B1-D7C1-4B4A-B347-DC90DDFD4B07@w3.org>
Thanks Davide,

my comments on the acceptance or not below. For those where I do not give any comment I am fine 'accepting' them.

Note that 'acceptance' for me means that this WG should somehow act upon that requirement. Ie, there may be absolutely valid requirements which we do not decide to work on...


On 20 May 2014, at 23:22 , Ceolin, D. <d.ceolin@vu.nl> wrote:

> Dear all,
> 
> first of all, it's been an hectic period, so my apologies for coming up with this only now.
> Anyway, following last telco, here is the current list of requirements, that I've tried to reorganize based on recent discussion and email exchange. 
> To keep the email manageable, I've reported only the short description and the categorization. I'll go more into detail about each of them in a second stage.
> I moved R-MultipleHeadingRows and R-RandomAccess to "Deferred Requirements".
> The rest of requirements is still "Candidate" and organized in categories which have changed over time (I've kept the old ones as well to remind of them and make sure that we agree on the new categorization).
> So, I'd have a few queries for you:
> - the first issue regards R-AnnotationAndSupplementaryInfo. I think that we agree that this is a "super-requirement" wrt a few other reqs (e.g., R-LinksToExternallyManagedDefinitions). I would be inclined to keep both super- and sub-requirements because the sub-requirements allow to address specific issues, but we should also provide a generic mechanism to annotate CSVs with classes of information we have not explicitly considered, but might be relevant (e.g., license of use). 
> - I think that a categorization is useful, given the relevant number of requirements we have. Do you agree? If so, do you suggest any change in it?
> - Is there any candidate requirement you suggest to defer or delete or do you agree to accept all of them?
> Thanks,
> 
> Davide
> 4.2 Candidate Requirements
> 
> 4.2.1 Requirements relating to parsing of CSV
> 
> R-WellFormedCsvCheck
> Ability to determine that a CSV is syntactically well formed
> R-TableNormalization
> Ability to normalize data that is not in normal form and possibly vice-versa.


This is a common comment to both requirements.

At the moment, the parsing section of the document is non normative. What we define, and base our work on, is the abstract tabular data model. What this means is that, strictly speaking, we do _not_ define a normative parsing of CSV, ie, we do not define normatively anything that would address these two. Ie, these are not 'Accepted', nor are they, I believe, 'Deferred' but, rather, 'Out of Scope'


> R-RightToLeftCsvCheck
> Ability to determine that a CSV is using RTL
> 4.2.2 Requirements relating to annotation of CSV
> 
> 4.2.3 Requirements relating to metadata discovery
> 
> 4.2.4 Requirements relating to applications
> 
> R-CsvValidation
> Ability to validate a CSV for conformance with a specified DDR

I guess the same issue as above applies to this, too.

> R-CsvToRdfTransformation
> Ability to automatically transform a CSV into RDF
> R-CsvToJsonTransformation
> Ability to automatically transform a CSV into JSON
> R-CanonicalMappingInLieuOfAnnotation
> Ability to transform CSV conforming to the core tabular data model yet lacking further annotation into a object / object graph serialisation
> R-IndependentMetadataPublication
> Ability to publish metadata independently from the tabular data resource it describes
> 4.2.5 Non-functional requirements
> 
> R-ZeroEditCompatibility
> Compatibility of data analysis tools in common usage with CSV+

I am not sure of that one. At first glance, it looks like 'Deferred' to me.

> R-ZeroEditAdditionOfSupplementaryMetadata
> Ability to add supplementary metadata to an existing CSV file without requiring modification of that file
> 4.2.6 Data Model Requirements
> 
> R-HeadingColumns
> Ability to handle columns as row headers.

We have not discussed that yet, ie, I am not sure it is an 'accepted' one.

> R-CellValueMicroSyntax
> Ability to parse internal data structure within a cell value

I guess the template discussion is relevant here, so it is accepted...

> R-NonStandardFieldDelimiter
> Ability to parse tabular data with field delimiters other than comma (,)

Same comment as before: we do not define parsing.

> R-PrimaryKey
> Ability to determine the primary key for entities described within a CSV file
> R-ForeignKeyReferences
> Ability to cross reference between CSV files
> R-ExternalDataDefinitionResource
> Ability to reference a Data Definition Resource defining supplementary metadata external to the CSV file

I guess the same issue on 'out scope' might be relevant here, too.

> R-AnnotationAndSupplementaryInfo
> Ability to add annotation and supplementary information to CSV file

Again a matter of the textual CSV file...

> R-AssociationOfCodeValuesWithExternalDefinitions
> Ability to associate a code value with externally managed definition

I would expect that to be covered by the template and metadata discussions, but not sure

> R-CsvAsSubsetOfLargerDataset
> Ability to assert how a single CSV file is a facet or subset of a larger dataset

Hm. Do we define CSV packaging in this Working Group? Is this in scope?

We could decide it is, though we should check this with the charter.

> R-LinksToExternallyManagedDefinitions
> Ability to provide (hyper)links to externally managed definitions from with a CSV file
> R-SyntacticTypeDefinition
> Ability to declare syntactic type for data values
> R-SemanticTypeDefinition
> Ability to declare semantic type for data values
> R-MissingValueDefinition
> Ability to declare a "missing value" token and, optionally, a reason for the value to be missing
> R-URIMapping
> Ability to map the values of a CSV row/column into corresponding URI (e.g. by concatenating those values with a prefix).
> R-UnitMeasureDefinition
> Ability identify/express the unit of measure for the values reported in a given column.
> R-GroupingOfMultipleTables
> Ability to group multiple data tables into a single package for publication

This is the packaging issue again. Do we define that in this WG (normatively)?

> R-LinkFromMetadataToData
> Ability for a metadata description to explicitly cite the tabular dataset it describes
> 4.3 Deferred requirements
> 
> R-MultipleHeadingRows
> Ability to handle headings spread across multiple initial rows, as well as to distinguish between single column headings and file headings.
> R-RandomAccess
> Ability to access and/or extract part of a CSV file in a non-sequential manner.
> ReSpec
> 
> 

Cheers

Ivan



----
Ivan Herman, W3C 
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me
Received on Wednesday, 21 May 2014 09:22:07 UTC