- From: Ivan Herman <ivan@w3.org>
- Date: Wed, 21 May 2014 12:35:37 +0200
- To: Andy Seaborne <andy@apache.org>
- Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
- Message-Id: <A291B85E-2ABD-478E-8E24-19FFD8028945@w3.org>
On 20 May 2014, at 23:00 , Andy Seaborne <andy@apache.org> wrote:
> On 20/05/14 11:59, Ivan Herman wrote:
>>
>> On 20 May 2014, at 12:16 , Andy Seaborne <andy@apache.org> wrote:
>>
>>> On 20/05/14 05:52, Ivan Herman wrote:
>>>> But also... If my application needs (forgive me:-) RDF/XML, but
>>>> the author of the metadata has put in the row-level template
>>>> using JSON-LD as a base syntax, then I need a JSON-LD parser to
>>>> make any sense of it, right? In other words, the field-level
>>>> template approach is RDF syntax independent. That seems to be
>>>> another major difference, too...
>>>>
>>>
>>> We're defining the correct output of a conversion process when the
>>> input is the metadata (without any user templates). We aren't
>>> requiring the processor does exactly and only those steps. It
>>> outputs whatever format(s) it supports.
>>>
>>> Adding user templates is 'advanced' and if we want to allow
>>> control of the shape of the RDF emitted (c.f. Jeremy's example) we
>>> do need to have a language for describing shape. However, that's
>>> not the required mechanism for implementation of metadata\templates
>>> to RDF.
>>>
>>
>> I am still trying to turn my head around it; sorry if I am slow...
>> Is this so that (at least conceptually for the user):
>>
>> - The 'field level templates', essentially as I described and used
>> in [1] can be used essentially as described there (what templates
>> exactly do is something that we still have to define, but I guess we
>> have an idea about a simple mechanism, like the one in R2RML)
> > - There is, _additionally_, the possibility to define a 'shape', ie, a
>> row level template; if present, that replaces the mechanism described
>> in [1]
>
> Yes.
>
Great! At least we have a common understanding:-)
> 'field level templates' has another, different dimension that {col} simply isn't enough to generate output (URI construction, transformation of values e.g. upepr/lower case, trim, extracting part of a field, ... and all the ETL-like themes).
Yes, and I think I used the term 'template' in a kind of generic (and-to-be-defined) way. Maybe 'transformation' may be a better term, and it may include some common features that are widely used and implemented:
- simple text replacement, like {...} for field names
- regular expression based replacement
- upper/lower case
In the metadata scheme one would probably have something like
"transformation" : [
{
"type" : "template",
"value" : "..."
},
{
"type" : "regex",
"value" :...
}
]
and the execution would be serially done on the field.
>
> Templating for shape only used uses field values (that needs to be tested - it might be insufficient).
>
>> (Specification-wise, one can of course turn things upside down,
>> describe the 'shape' template mechanism and, if, for a specific
>> data, no shape is defined, one could virtually generate such a shape
>> from the metadata. But that is for specification writers and,
>> possibly, for implementers.)
>
> That is what I am suggesting.
>
> It means there is smooth progression from simple to shape-based conversion.
Again, good we understand one another:-)
>
>>
>> I think that this, technically, works indeed. But I am not sold on
>> it...
>>
>> - I have the impression that the generic shape mechanism is more
>> complicated to understand for a user and more complex to implement
>
> ?? The user does not see it unless they want advanced translation goes beyond what can be expressed in the basic field level conversion.
True.
>
>> - Although I forgot to add this to [1] (and we were not sure whether
>> that should go into the metadata spec in the first place) we did say
>> that we can assign, say, an XSLT script for XML, or a SPARQL
>> CONSTRUCT pattern for RDF that would be executed on the result of the
>> RDF generation; such an extra step could take care of Jeremy's
>> example, right?
>
> It is something that has been suggested but no one has worked through
> the details.
>
> Certainly possible in XSLT, but SPARQL CONSTRUCT isn't as powerful as XSLT. Greeg has made suggestion for CSV-LD. The XML publishing world commonly has XSLT. Other communities don't necessary have the same degreee of conversion pipelines.
But all communities have something; at the minimum, one can refer back to a javascript of python or whatever processing...
>
> See Jeni's
> http://lists.w3.org/Archives/Public/public-csv-wg/2014May/0063.html
> want for conditionality and filed level processing.
>
> (where do you stand on that msg?)
It makes me scared. "In all the real-life conversions I’ve ever done I’ve always ended up needing conditional statements of some sort". Do we really want to go there?
For the RDF world, I do not see why plugging in either an http URI for a specific SPARQL engine call using CONSTRUCT, or a textual literal with SPARQL CONSTRUCT would not work to massage the output. After all, the SPIN people have already done things like that...
I am wary going down the line of defining the a complex pattern language. That is my problem. And Jeni's mail indicates that a simple replacement of {...} may not be enough. (Put it another way, even if we do use a template language, users will end up using SPARQL...)
>
> If the output required is JSON-LD, I'd expect the CSV->JSON conversion would be a better starting point because it has control over the JSON.
This is a different issue, but I would hope that the RDF conversion and the JSON conversion would be in synchrony such that the difference between the two, when using JSON, is the presence or not of a @context. But Gregg should be the one telling us whether this is possible.
>
>> It is, of course, a bit more complex to do this than
>> with shapes, but how frequently do I have to do this?
>
> Having looked at all the conversions we (Epimorphics) have been involved in, the basic level of CSV -> simple RDF is not sufficient. One conversion (LandRegistry, 400e6 triples) is actually SPARQL Update not Turtle.
>
Showing the SPARQL works:-)
> Do we have a real example where is simple is the required output? Jeremy's example needs reshaping. Reshaping is putting knowledg/semantics/information into the data that wasn't completely theer in the input. A typical knowledge capture exercise.
>
> A question I have is whether complete tables are the common case of whether there is commonly multi-row structure in tables. e.g. repeated fields or empty to present tree.
>
> We need to ground out the requirements.
+1
>
>> - I still do not see how you can get around the fact that the shape
>> is very language specific, ie, I am not sure how you would define
>> metadata that RDF serialization syntax independent and, even more,
>> independent on whether the target is RDF, JSON, or XML (which works
>> much more easily with the scheme in [1])
>
> RDF serialization syntax independence is your issue not mime.
>
> As far as I'm concerned, the metadata can provide a turtle template for Turtle.
>
> If the output required is JSON-LD, I'd expect the CSV->JSON conversion would be a better starting point because it has control over JSON.
>
> If RDF/XML is required, converting RDF formats isn't hard at least not in that direction. Managing the XML namespaces might mean the CSv to XML is a better route.
>
> The weakness of the post-process argument is if the conversion is sosimple that it becomes a common need to reshape then you are asking the end user to get involved with skills they may not have. It's only half a standard from consumers POV.
>
I do see that point. The question is whether the simple 'transformation' would be enough or not.
Ivan
> Andy
>
>>
>> Cheers
>>
>> Ivan
>>
>> [1]
>> http://htmlpreview.github.io/?https://github.com/w3c/csvw/blob/rdfconversion-ivan/csv2rdf/index.html
>>
>>
>>
>>
>>> Andy
>>>
>>>> Ivan
>>
>>
>> ---- Ivan Herman, W3C Digital Publishing Activity Lead Home:
>> http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D
>> WebID: http://www.ivan-herman.net/foaf#me
----
Ivan Herman, W3C
Digital Publishing Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
GPG: 0x343F1A3D
WebID: http://www.ivan-herman.net/foaf#me
Received on Wednesday, 21 May 2014 10:36:11 UTC