- From: Gregg Kellogg <gregg@greggkellogg.net>
- Date: Wed, 21 May 2014 16:48:17 -0700
- To: Ivan Herman <ivan@w3.org>
- Cc: Andy Seaborne <andy@apache.org>, W3C CSV on the Web Working Group <public-csv-wg@w3.org>
On May 19, 2014, at 10:09 AM, Ivan Herman <ivan@w3.org> wrote:
> Ok, now I understand the difference, thanks. Indeed, I use templates for one term; again, just as R2RML does.
>
> I am a little bit afraid of the potential complexity of that approach. The one-term-template is pretty straightforward both for the implementation and the user, is syntax independent and can be easily re-used for XML or JSON, too. The per-row-template seems to be syntax dependent and more complex though, clearly, much more powerful. I have to think about it...
I think it's really pretty simple; I implemented something similar for another project I'm doing. In Ruby, it takes advantage of the ability to use "gsub" and pass it a block:
csv.each do |line|
result = csvm.gsub(/"[^"]*\{[^"]*"/) { |match|
match.gsub(/\{[^\}]*\}/) { |field_ref|
...
}
}
end
In this case, because JSON uses braces in it's basic syntax, I look for braces contained within double-quotes; the example Andy and I use for Turtle are consistent with this approach.
For the non-Ruby literate, it basically says match anything including an opening curly brace ("{") surrounded by double quotes and replace it with the result of the block/callback. Each of these looks for field references such as {...}. Note that the field reference may contain some RFC6570 processing elements in addition to the variable/column name, but these should only be performed if we've determined that the column type is IRI.
Gregg
> Ivan
>
>
>
> On 19 May 2014, at 18:16 , Andy Seaborne <andy@apache.org> wrote:
>
>> On 19/05/14 15:23, Ivan Herman wrote:
>>> Let me try to see if I understand what you mean...
>>>
>>> If there is no metadata assigned to the data then (at least conceptually) we say that we generate a metadata of, roughly, the form:
>>>
>>> {
>>> "@id" : "URI OF THE DATA",
>>> "columns" : [{
>>> "name" : "col1",
>>> "template" : "{col1},
>>> },{
>>> "name" : "col2",
>>> "template" : "{col2},
>>> }]
>>> }
>>
>> Where we seem to differ is "template" - that's a template for one term (the object of a triple).
>>
>> The template I have in mind is a complete row:
>>
>> Taking from:
>>
>> https://github.com/w3c/csvw/blob/gh-pages/examples/simple-weather-observation.md
>>
>> Date-time, Air temperature (Cel), Dew-point temperature (Cel)
>> 2013-12-13T08:00:00Z, 11.2, 10.2
>>
>>
>> <site/22580943/date-time/20131213T0800Z>
>> a ssn:Observation ;
>> ssn:observationSamplingTime
>> [ time:inXSDDateTime "2013-12-13T08:00:00Z"^^xsd:dateTime ] ;
>> ssn:observationResult [
>> a ssn:SensorOutput ;
>> def-op:airTemperature_C
>> [ qudt:numericValue "11.2"^^xsd:double ] ;
>> def-op:dewPointTemperature_C
>> [ qudt:numericValue "10.2"^^xsd:double ] ] .
>>
>> That could be created with a template like:
>>
>> ----------------------------------------------
>> Columns:
>>
>> "columns" : [{
>> "name" : "date-time"
>> },{
>> "name" : "air-temperature"
>> },{
>> "name" : "dew-point"
>> }]
>>
>>
>> ----------------------------------------------
>> <site/22580943/date-time/{date-time}>
>> a ssn:Observation ;
>> ssn:observationSamplingTime
>> [ time:inXSDDateTime "{date-time}"^^xsd:dateTime ] ;
>> ssn:observationResult [
>> a ssn:SensorOutput ;
>> def-op:airTemperature_C
>> [ qudt:numericValue "{air-temperature}"^^xsd:double ] ;
>> def-op:dewPointTemperature_C
>> [ qudt:numericValue "{dew-point}"^^xsd:double ] ] .
>> ----------------------------------------------
>>
>> skipping over the conversion of 2013-12-13T08:00:00Z to 20131213T0800Z
>>
>> Andy
>>
>>>
>>> And, by doing that, we have only one generation algorithm instead of two branches like in my document now.
>>>
>>> Yes, this works, I guess. It certainly makes the specification simpler and avoids getting out of sync. I am slightly worried that the end-user would be a bit screwed up, but that may have to go into a separate, tutorial-like text. So it may be worth doing it indeed...
>>>
>>> (Would need a rewrite of the text I produced, but that is probably relatively easy; just that I would not do it today or tomorrow...)
>>>
>>> Ivan
>>>
>>>
>>>
>>> On 19 May 2014, at 16:14 , Andy Seaborne <andy@apache.org> wrote:
>>>
>>>> On 19/05/14 15:00, Ivan Herman wrote:
>>>>>>> Generating a template, if none provided, would keep the user-template driven mechanism and metadata-gdefineeneated template mechanism in-step. It would be clear that they aren't alternatives with (potentially) capabilities in the direct roue not in the template route. You could get the generated template and tweak it, for example.
>>>>>>>
>>>>> I would need an example to understand what you mean...
>>>>>
>>>>
>>>> If the columns are "foo" and "bar" and no template is in the metadata then we define the process to be to create and use:
>>>>
>>>> -------------------------
>>>> [
>>>> :foo "{foo}" .
>>>> :bar "{bar}" .
>>>> ]
>>>> -------------------------
>>>>
>>>> Andy
>>>>
>>>
>>>
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> GPG: 0x343F1A3D
>>> WebID: http://www.ivan-herman.net/foaf#me
>
>
> ----
> Ivan Herman, W3C
> Digital Publishing Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> GPG: 0x343F1A3D
> WebID: http://www.ivan-herman.net/foaf#me
>
>
>
>
>
Received on Wednesday, 21 May 2014 23:48:53 UTC