Re: Graph Fragments Templates from Andy Seaborne on 2014-05-27 (public-csv-wg@w3.org from May 2014)

From: Andy Seaborne <andy@apache.org>
Date: Tue, 27 May 2014 22:29:51 +0100
To: Ivan Herman <ivan@w3.org>
CC: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
Message-ID: <538503CF.1030603@apache.org>
Ivan,

I am not saying the metadata is disjoint - it is a valuable input to 
conversion and I fully expect e.g. it to be used to get datatypes right 
if that is what the data consumer wants.

The metadata as it stands at the moment is insufficient to capture the 
conceptual meaning of denormalized data.

There is no guaranteed there is metadata at all.

The roles and intentions of data publisher and data consumer are not the 
same.

The duplication I see is that we would have to define flat conversion 
and, separately, reshaping conversion.

One of the examples we have is

https://github.com/w3c/csvw/blob/gh-pages/examples/simple-weather-observation.md

How would you approach that?

	Andy

On 27/05/14 20:32, Ivan Herman wrote:
> Well... I think we disagree here.
>
> Of course, the metadata is *also* used for, say, displaying the CSV file. But making the metadata disjoint from the RDF/JSON/XML conversion means that there is an unnecessary duplication of the terms, and I am pretty much against that at this point. Obviously, some of the terms may be meaningful for, say, an XML conversion only, others have no meaning for any of the syntaxes, but I find it counterintuitive if things have to be repeated. The obvious examples are language tags or the datatype for a field, just from the top of my head; the choice of the primary key columns that would determine the subject for JSON or RDF may be another. (I have just arrived to my hotel in NYC and I have to go out, so I could not check all the details.)
>
> What this means, in my view, is that
>
> - if we go for the 'mechanical' approach that I wrote down then, for *some* of the metadata entries we provide a natural mapping to the RDF concepts (which may always be overwritten somehow with RDF specific values). This is more or less what I wrote down, though not all keys have a systematic RDF equivalent
>
> - if we go to the graph templating approach then the graph templates should be defined in a way that, for some of the values (like the ones I cited above) there is a syntax extracting those. The syntax may be very simple (something like {{key}} meaning that the value of a key valid for that field is used), but I am not sure we would not get to some 'if-then-else' issues ('if the language is set, generate a language tagged literal, otherwise a plain literal').
>
> I am *not* saying the template mechanism cannot be defined and, if so, it may well be superior. But I do not believe the specification would be as simple as in the examples you had... But I believe we should have a more detailed sketch for a specification, a bit what I did for the mechanical approach, before making an informed decision.
>
> Chers
>
> Ivan
>
>
> On 27 May 2014, at 06:56 , Andy Seaborne <andy@apache.org> wrote:
>
>> Ivan,
>>
>> What I gave was a description of the graph templating approach.  It is not a complete spec.   As I see it, we are trying to establish the scope of part of the technical work of the working group and Jeremy's example is a (the) example we have of CSV to RDF conversion.
>>
>> Part of the scoping is the relationship of metadata to conversion.
>>
>> The metadata is about what the CSV file "is", and details about it's publication.  So it is not capturing everything about the conceptual information that is CSV file is about.
>>
>> An explicitly provided template for RDF conversion is what the user wants and puts in structure that isn't obvious from the CSV file alone nor declared in the metadata.
>>
>> They may be different; it may be intended.  The authorship roles are different.
>>
>> Metadata is going help display CSV files in HTML and it's a great help in finding CSV files on the web, and validating them. Metadata comes primarily from the data publisher. An advanced template comes from the data consumer and is the format use by the conversion tool.
>>
>> Only deriving conversion from metadata makes assumptions about the emergence of provided metadata - I doubt that metadata info for existing CSV publications is going to emerge quickly and there is a lot of existing CSV data.  It seems dubious to me to assume the data consumer is going to write missing metadata to drive flat conversion, when they still have further steps to perform to get what they want.
>>
>> When there is CSV publisher, and data consumer wanting RDF (or JSON, or XML), so they aren't reading the CSV file directly as CSV, all you need is a template, written by the data consumer, and a tool that processes templates.
>>
>> 	Andy
>>
>> On 22/05/14 15:18, Ivan Herman wrote:
>>> Hi Andy,
>>>
>>> thanks.
>>>
>>> My problem is not with these simple cases. My problem is to understand how templates will be combined with the metadata definition in general; at the moment these are fairly disconnected.
>>>
>>> Looking at the latest draft of Jeni, each field may have its own particular set of properties (although some of them can be set for the column as a whole, it can be specialized for a specific field). This means that a pattern of the sort
>>>
>>> 	<something> <something> {colname} .
>>>
>>> may become slightly underspecified. For example, in your example, you translated the metadata including a datatype definition into something like
>>>
>>> 	<something> <something> {colname}^^xsd:double
>>>
>>> but that may not be o.
>>
>> Only deriving conversion from metadata makes assumptions about the emergence of provided metadata - I doubt that metadata info for existing CSV publications is going to emerge quickly and there is a lot of existing CSV data.  It seems dubious to me to assume the data consumer is going to write missing metadata to drive flat conversion, when they still have further steps to perform to get what they want.  Instead, they'll write code to go CSV to what they want.
>>
>> When there is CSV publisher, and data consumer wanting RDF (or JSON, or XML), so they aren't reading the CSV file directly as CSV, all you need is a template, written by the data consumer, and a tool that processes templates.k.; it should be something like
>>>
>>> 	<something <something> {colname}^^xsd:{{datatype}}
>>>
>>> where '{{datatype}}' is my ad-hoc syntax to denote the _value_ of the property "datatype". Actually, it may become more complicated insofar as the datatype value should probably not be taken verbatim, ie, if it says 'number', than it should be translated to its xml schema counterpart (either we include an if-then-else into the template language or we have to write down a specification on how exactly the template processor works for each field and its properties). Another example is the 'separator' field; if a field includes a 'separator' property, then the result of the template expansion may become something like
>>>
>>> 	<something> <something> (l1 l2 l3 l4) .
>>>
>>> It all can be done of course. But, unless we keep the templates completely disjoint from the metadata (which I think would be a mistake) we have quite some work to do reconciling the templates with the metadata definition:-( Did you have any thought on that already?
>>>
>>> Ivan
>>>
>>> P.S. Sorry, I am off-line at the moment due to a power outage, I cannot check Gregg's older document; maybe he did deal with these.
>>>
>>>
>>>
>>> On 21 May 2014, at 19:46 , Andy Seaborne <andy@apache.org> wrote:
>>>
>>>> I have written up more on graph templates:
>>>>
>>>> https://github.com/w3c/csvw/blob/gh-pages/examples/graph-templating.md
>>>>
>>>> 	Andy
>>>>
>>>
>>>
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> GPG: 0x343F1A3D
>>> WebID: http://www.ivan-herman.net/foaf#me
>
>
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> GPG: 0x343F1A3D
> WebID: http://www.ivan-herman.net/foaf#me
>
>
>
>
>
Received on Tuesday, 27 May 2014 21:30:27 UTC