[ACTION-15] General text on conversion from Ivan Herman on 2014-05-08 (public-csv-wg@w3.org from May 2014)

From: Ivan Herman <ivan@w3.org>
Date: Thu, 8 May 2014 13:26:03 +0200
To: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Cc: Andy Seaborne <andy@apache.org>, Jeni Tennison <jeni@jenitennison.com>
Message-Id: <2B1A037D-BF5B-4ACD-B9E9-21C2A07346D3@w3.org>

Guys

my action from yesterday[1] refers to a text that should be added to the RDF conversion document. I have come up with something, based on the email discussion, but also some additional issues; however, I am not sure whether the RDF conversion document is the right place for this. I wonder whether adding this as a separate section in the syntax document is not a better choice.

After discussion and probably some word-smithing I am happy to put the text into either of the documents themselves.

So here we go...

[[[

This specification defines some general principles for the conversion of CSV to other formats. These are:

* The conversions are defined on <em>tabular data</em>, as defined in the "Model for Tabular Data and Metadata on the Web" specification [[!Tabular-Data-Model]]. This means that some of the specificities (like Right-to-Left writing modes, or empty rows in the source file) of CSV files are to be handled by the parsing step yielding the tabular data.

* A conversion specification MUST define a "default" mapping; i.e., a mapping from core tabular data (as opposed to annotated tabular data).

* For the conversion of annotated tabular data:

** A conversion specification MUST specify how certain property-value pairs, provided by the by the "Metadata Vocabulary for Tabular Data" [[!Tabular-Metadata]], is mapped on the output. These are:
*** @id
*** @type
*** field types
*** Primary Key
*** Foreign Key

** A conversion specification MAY specify how other property-value pairs, like column names, may be used on the output (e.g., as additional metadata in the output)

* The conversion specification MAY specify a number of additional metadata on the output, regardless of whether that particular information is present in the annotations of the tabular data

* The conversion specification MAY specify a number of format specific property-value annotation pairs. These pairs are part of the tabular data annotations, i.e., the metadata field descriptors (@@@REF@@@), but only relevant for the specific output format. Examples may be flags to specify whether a specific field should be output as an XML element or an XML attribute, or a patterns generating a URI for the RDF object (rather than using a literal).

* The conversion specification MAY also specify a global, format specific property (as part for the CSV annotation) specifying an external processing step that should occur on the generated output. Example may be a reference to an XSLT file, a literal defining a SPARQL CONSTRUCT pattern, or a reference to a Javascript file. The specification of those processing steps are not provided by this Working Group.
]]]

A specific issue: I was wondering whether the usage of, eg, field types or primary keys should be a MUST or a MAY. At the moment I set it as a MUST, although a conversion specification may say that a particular type is simply ignored as a type; But at least this has to be specified. Another is to set it as a MAY.

I realize that this formulation means that the RDF conversion may need some serious editing (not conceptually, just the way things are presented). Sorry...

Thoughts?

Ivan

[1] http://www.w3.org/2013/csvw/track/actions/15

----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me

Received on Thursday, 8 May 2014 11:26:35 UTC