- From: Ivan Herman <ivan@w3.org>
- Date: Wed, 21 May 2014 12:35:37 +0200
- To: Andy Seaborne <andy@apache.org>
- Cc: W3C CSV on the Web Working Group <public-csv-wg@w3.org>
- Message-Id: <A291B85E-2ABD-478E-8E24-19FFD8028945@w3.org>
On 20 May 2014, at 23:00 , Andy Seaborne <andy@apache.org> wrote: > On 20/05/14 11:59, Ivan Herman wrote: >> >> On 20 May 2014, at 12:16 , Andy Seaborne <andy@apache.org> wrote: >> >>> On 20/05/14 05:52, Ivan Herman wrote: >>>> But also... If my application needs (forgive me:-) RDF/XML, but >>>> the author of the metadata has put in the row-level template >>>> using JSON-LD as a base syntax, then I need a JSON-LD parser to >>>> make any sense of it, right? In other words, the field-level >>>> template approach is RDF syntax independent. That seems to be >>>> another major difference, too... >>>> >>> >>> We're defining the correct output of a conversion process when the >>> input is the metadata (without any user templates). We aren't >>> requiring the processor does exactly and only those steps. It >>> outputs whatever format(s) it supports. >>> >>> Adding user templates is 'advanced' and if we want to allow >>> control of the shape of the RDF emitted (c.f. Jeremy's example) we >>> do need to have a language for describing shape. However, that's >>> not the required mechanism for implementation of metadata\templates >>> to RDF. >>> >> >> I am still trying to turn my head around it; sorry if I am slow... >> Is this so that (at least conceptually for the user): >> >> - The 'field level templates', essentially as I described and used >> in [1] can be used essentially as described there (what templates >> exactly do is something that we still have to define, but I guess we >> have an idea about a simple mechanism, like the one in R2RML) > > - There is, _additionally_, the possibility to define a 'shape', ie, a >> row level template; if present, that replaces the mechanism described >> in [1] > > Yes. > Great! At least we have a common understanding:-) > 'field level templates' has another, different dimension that {col} simply isn't enough to generate output (URI construction, transformation of values e.g. upepr/lower case, trim, extracting part of a field, ... and all the ETL-like themes). Yes, and I think I used the term 'template' in a kind of generic (and-to-be-defined) way. Maybe 'transformation' may be a better term, and it may include some common features that are widely used and implemented: - simple text replacement, like {...} for field names - regular expression based replacement - upper/lower case In the metadata scheme one would probably have something like "transformation" : [ { "type" : "template", "value" : "..." }, { "type" : "regex", "value" :... } ] and the execution would be serially done on the field. > > Templating for shape only used uses field values (that needs to be tested - it might be insufficient). > >> (Specification-wise, one can of course turn things upside down, >> describe the 'shape' template mechanism and, if, for a specific >> data, no shape is defined, one could virtually generate such a shape >> from the metadata. But that is for specification writers and, >> possibly, for implementers.) > > That is what I am suggesting. > > It means there is smooth progression from simple to shape-based conversion. Again, good we understand one another:-) > >> >> I think that this, technically, works indeed. But I am not sold on >> it... >> >> - I have the impression that the generic shape mechanism is more >> complicated to understand for a user and more complex to implement > > ?? The user does not see it unless they want advanced translation goes beyond what can be expressed in the basic field level conversion. True. > >> - Although I forgot to add this to [1] (and we were not sure whether >> that should go into the metadata spec in the first place) we did say >> that we can assign, say, an XSLT script for XML, or a SPARQL >> CONSTRUCT pattern for RDF that would be executed on the result of the >> RDF generation; such an extra step could take care of Jeremy's >> example, right? > > It is something that has been suggested but no one has worked through > the details. > > Certainly possible in XSLT, but SPARQL CONSTRUCT isn't as powerful as XSLT. Greeg has made suggestion for CSV-LD. The XML publishing world commonly has XSLT. Other communities don't necessary have the same degreee of conversion pipelines. But all communities have something; at the minimum, one can refer back to a javascript of python or whatever processing... > > See Jeni's > http://lists.w3.org/Archives/Public/public-csv-wg/2014May/0063.html > want for conditionality and filed level processing. > > (where do you stand on that msg?) It makes me scared. "In all the real-life conversions I’ve ever done I’ve always ended up needing conditional statements of some sort". Do we really want to go there? For the RDF world, I do not see why plugging in either an http URI for a specific SPARQL engine call using CONSTRUCT, or a textual literal with SPARQL CONSTRUCT would not work to massage the output. After all, the SPIN people have already done things like that... I am wary going down the line of defining the a complex pattern language. That is my problem. And Jeni's mail indicates that a simple replacement of {...} may not be enough. (Put it another way, even if we do use a template language, users will end up using SPARQL...) > > If the output required is JSON-LD, I'd expect the CSV->JSON conversion would be a better starting point because it has control over the JSON. This is a different issue, but I would hope that the RDF conversion and the JSON conversion would be in synchrony such that the difference between the two, when using JSON, is the presence or not of a @context. But Gregg should be the one telling us whether this is possible. > >> It is, of course, a bit more complex to do this than >> with shapes, but how frequently do I have to do this? > > Having looked at all the conversions we (Epimorphics) have been involved in, the basic level of CSV -> simple RDF is not sufficient. One conversion (LandRegistry, 400e6 triples) is actually SPARQL Update not Turtle. > Showing the SPARQL works:-) > Do we have a real example where is simple is the required output? Jeremy's example needs reshaping. Reshaping is putting knowledg/semantics/information into the data that wasn't completely theer in the input. A typical knowledge capture exercise. > > A question I have is whether complete tables are the common case of whether there is commonly multi-row structure in tables. e.g. repeated fields or empty to present tree. > > We need to ground out the requirements. +1 > >> - I still do not see how you can get around the fact that the shape >> is very language specific, ie, I am not sure how you would define >> metadata that RDF serialization syntax independent and, even more, >> independent on whether the target is RDF, JSON, or XML (which works >> much more easily with the scheme in [1]) > > RDF serialization syntax independence is your issue not mime. > > As far as I'm concerned, the metadata can provide a turtle template for Turtle. > > If the output required is JSON-LD, I'd expect the CSV->JSON conversion would be a better starting point because it has control over JSON. > > If RDF/XML is required, converting RDF formats isn't hard at least not in that direction. Managing the XML namespaces might mean the CSv to XML is a better route. > > The weakness of the post-process argument is if the conversion is sosimple that it becomes a common need to reshape then you are asking the end user to get involved with skills they may not have. It's only half a standard from consumers POV. > I do see that point. The question is whether the simple 'transformation' would be enough or not. Ivan > Andy > >> >> Cheers >> >> Ivan >> >> [1] >> http://htmlpreview.github.io/?https://github.com/w3c/csvw/blob/rdfconversion-ivan/csv2rdf/index.html >> >> >> >> >>> Andy >>> >>>> Ivan >> >> >> ---- Ivan Herman, W3C Digital Publishing Activity Lead Home: >> http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D >> WebID: http://www.ivan-herman.net/foaf#me ---- Ivan Herman, W3C Digital Publishing Activity Lead Home: http://www.w3.org/People/Ivan/ mobile: +31-641044153 GPG: 0x343F1A3D WebID: http://www.ivan-herman.net/foaf#me
Received on Wednesday, 21 May 2014 10:36:11 UTC