Re: A draft outline for the CSV2RDF document from Andy Seaborne on 2014-05-20 (public-csv-wg@w3.org from May 2014)

From: Andy Seaborne <andy@apache.org>
Date: Tue, 20 May 2014 22:00:06 +0100
To: Ivan Herman <ivan@w3.org>
CC: CSV on the Web Working Group <public-csv-wg@w3.org>
Message-ID: <537BC256.7090404@apache.org>
On 20/05/14 11:59, Ivan Herman wrote:
>
> On 20 May 2014, at 12:16 , Andy Seaborne <andy@apache.org> wrote:
>
>> On 20/05/14 05:52, Ivan Herman wrote:
>>> But also... If my application needs (forgive me:-) RDF/XML, but
>>> the author of the metadata has put in the row-level template
>>> using JSON-LD as a base syntax, then I need a JSON-LD parser to
>>> make any sense of it, right? In other words, the field-level
>>> template approach is RDF syntax independent. That seems to be
>>> another major difference, too...
>>>
>>
>> We're defining the correct output of a conversion process when the
>> input is the metadata (without any user templates).  We aren't
>> requiring the processor does exactly and only those steps.  It
>> outputs whatever format(s) it supports.
>>
>> Adding user templates is 'advanced' and if we want to allow
>> control of the shape of the RDF emitted (c.f. Jeremy's example) we
>> do need to have a language for describing shape. However, that's
>> not the required mechanism for implementation of metadata\templates
>> to RDF.
>>
>
> I am still trying to turn my head around it; sorry if I am slow...
> Is this so that (at least conceptually for the user):
>
> - The 'field level templates', essentially as I described and used
> in [1] can be used essentially as described there (what templates
> exactly do is something that we still have to define, but I guess we
> have an idea about a simple mechanism, like the one in R2RML)
 > - There is, _additionally_, the possibility to define a 'shape', ie, a
> row level template; if present, that replaces the mechanism described
> in [1]

Yes.

'field level templates' has another, different dimension that {col} 
simply isn't enough to generate output (URI construction, transformation 
of values e.g. upepr/lower case, trim, extracting part of a field, ... 
and all the ETL-like themes).

Templating for shape only used uses field values (that needs to be 
tested - it might be insufficient).

> (Specification-wise, one can of course turn things upside down,
> describe the 'shape' template mechanism and, if, for a specific
> data, no shape is defined, one could virtually generate such a shape
> from the metadata. But that is for specification writers and,
> possibly, for implementers.)

That is what I am suggesting.

It means there is smooth progression from simple to shape-based conversion.

>
> I think that this, technically, works indeed. But I am not sold on
> it...
>
> - I have the impression that the generic shape mechanism is more
> complicated to understand for a user and more complex to implement

?? The user does not see it unless they want advanced translation goes 
beyond what can be expressed in the basic field level conversion.

> - Although I forgot to add this to [1] (and we were not sure whether
> that should go into the metadata spec in the first place) we did say
> that we can assign, say, an XSLT script for XML, or a SPARQL
> CONSTRUCT pattern for RDF that would be executed on the result of the
> RDF generation; such an extra step could take care of Jeremy's
> example, right?

It is something that has been suggested but no one has worked through
the details.

Certainly possible in XSLT, but SPARQL CONSTRUCT isn't as powerful as 
XSLT.  Greeg has made suggestion for CSV-LD.  The XML publishing world 
commonly has XSLT.  Other communities don't necessary have the same 
degreee of conversion pipelines.

See Jeni's
http://lists.w3.org/Archives/Public/public-csv-wg/2014May/0063.html
want for conditionality and filed level processing.

(where do you stand on that msg?)

If the output required is JSON-LD, I'd expect the CSV->JSON conversion 
would be a better starting point because it has control over the JSON.

> It is, of course, a bit more complex to do this than
> with shapes, but how frequently do I have to do this?

Having looked at all the conversions we (Epimorphics) have been involved 
in, the basic level of CSV -> simple RDF is not sufficient.   One 
conversion (LandRegistry, 400e6 triples) is actually SPARQL Update not 
Turtle.

Do we have a real example where is simple is the required output? 
Jeremy's example needs reshaping.  Reshaping is putting 
knowledg/semantics/information into the data that wasn't completely 
theer in the input.  A typical knowledge capture exercise.

A question I have is whether complete tables are the common case of 
whether there is commonly multi-row structure in tables.  e.g. repeated 
fields or empty to present tree.

We need to ground out the requirements.

> - I still do not see how you can get around the fact that the shape
> is very language specific, ie, I am not sure how you would define
> metadata that RDF serialization syntax independent and, even more,
> independent on whether the target is RDF, JSON, or XML (which works
> much more easily with the scheme in [1])

RDF serialization syntax independence is your issue not mime.

As far as I'm concerned, the metadata can provide a turtle template for 
Turtle.

If the output required is JSON-LD, I'd expect the CSV->JSON conversion 
would be a better starting point because it has control over JSON.

If RDF/XML is required, converting RDF formats isn't hard at least not 
in that direction.  Managing the XML namespaces might mean the CSv to 
XML is a better route.

The weakness of the post-process argument is if the conversion is 
sosimple that it becomes a common need to reshape then you are asking 
the end user to get involved with skills they may not have.  It's only 
half a standard from consumers POV.

	Andy

>
> Cheers
>
> Ivan
>
> [1]
> http://htmlpreview.github.io/?https://github.com/w3c/csvw/blob/rdfconversion-ivan/csv2rdf/index.html
>
>
>
>
>> Andy
>>
>>> Ivan
>
>
> ---- Ivan Herman, W3C Digital Publishing Activity Lead Home:
> http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D
>  WebID: http://www.ivan-herman.net/foaf#me
>
>
>
>
>
Received on Tuesday, 20 May 2014 21:00:36 UTC