Re: Lang and dt in the graph. Was: Dumb SPARQL query problem from Hugh Glaser on 2013-12-01 (public-lod@w3.org from December 2013)

From: Hugh Glaser <hugh@glasers.org>
Date: Sun, 1 Dec 2013 23:02:32 +0000
To: Andy Seaborne <andy.seaborne@epimorphics.com>
Cc: public-lod community <public-lod@w3.org>
Message-Id: <AC7EAA9F-91B0-4DD9-9771-8D844EAABBAE@glasers.org>
Hi.
Thanks.
A bit of help please :-)
On 1 Dec 2013, at 17:36, Andy Seaborne <andy.seaborne@epimorphics.com> wrote:

> 
> 
> On 01/12/13 12:25, Tim Berners-Lee wrote:
>> 
>> On 2013-11 -23, at 12:21, Andy Seaborne wrote:
>> 
>>> 
>>> 
>>> On 23/11/13 17:01, David Booth wrote:
>>>> [...]
>>>> This would have been fixed if the RDF model had been changed to
>>>> represent the language tag as an additional triple, but whether this
>>>> would have been a net benefit to the community is still an open
>>>> question, as it would add the complexity of additional triples.
>>> 
>>> Different.  Maybe better, maybe worse.
>>> 
>>> 
>>> Do you want all your "abc" to be the same language?
>>> 
>>>   "abc" rdf:lang "en"
>>> 
>>> or multiple languages:
>>> 
>>>   "abc" rdf:lang "cy" .
>>>   "abc" rdf:lang "en" .
>>> 
>>> 
>>> ?
>>> 
>>> Unlikely - so it's bnode time ...
>>> 
>>> :x :p [ rdf:value "abc" ; rdf:lang "en" ] .
>> 
>> The nice thing about this in a n3rules-like system (where FILTER and WHERE clauses are not distinct and some properties are just builtins)   is that rdf:value and rdf:lang can be made builtins so a datatypes literal can behave just like a bnode with two properties if you want to.
>> 
>> But I have always preferred it with not 2 extra triples, just one:
>> 
>> 	:x  :p [ lang:en "cat" ]
>> 
>> which allows you also to write things like
>> 
>> 	:x :p  [ lang:en "cat"] , [ lang:fr "chat" ].
>> 
>> or if you use the  ^  back-path syntax of N3 (which was not taken up in turtle),
>> 
>> 	:x :p "cat"^lang:en,  "chat"^lang:fr .
>> 
>> You can do the same with datatypes:
>> 
>> 	:x :q   "2013-11-25"^xsd:date .
>> 
>> instead of
>> 
>> 	:x :q   "2013-11-25"^xsd:date .
> 
> This seems to bring it it's own issues.  These bnodes seem to be like untidy literals as considered in RDF-2004 WG.
> 
> :x  :p [ lang:en "cat" ]
> :x  :p [ lang:en "cat" ]
> :x  :p [ lang:en "cat" ]
> 
> is 6 triples.
> 
> :x :p :q .
> :x :p :q .
> :x :p :q .
> 
> is 1 triple.  Repeated read in same file - this already causes confusion.
> 
> :x :p "cat" .
> :x :p "cat" .
> :x :p "cat" .
> 
> is 1 triple or is it 3 triples because it's really
Is it not 1 triple if you take the first view or 6 triples if you take the second?
Or probably I don’t understand bnodes properly!?
> 
> :x :p [ xsd:string "cat" ].
> 
> :x :p 123 .
> :x :p 123 .
> :x :p 123 .
> 
> It makes it hard to ask "do X and Y have the same value for :p?" - it gets messy to consider all the cases of triple patterns that arise and I would not want to push that burden back onto the application writer. Why can't the app writer say "find me all things which a property value less than 45?
I see it makes it hard, but I don’t see it as any harder than what we have now, with multiple patterns that do and don’t have ^^xsd:String
As I said before, with the ^^xsd you need to consider a bunch of patterns to do the query - again, it is messy, but is it messier?

Actually I find
 { ?s1 ?p [ xsd:string ?str ] . ?s2 ?p [ xsd:string ?str ] . }
with a possible also
 { ?s1 ?p ?str . ?s2 ?p ?str . }
much easier to work with than something that has this stuff optionally tacked on the end of literals, that isn’t really part of the string but isn’t part of RDF either.
Or maybe it is part of the literal but not the string? Surely that should be clear to me?

I just don’t see there is a difference in complexity for querying - it is just that the current situation is genuinely messier for consumers because there are two notations in play, whereas if RDF is so good we should have everything in RDF.
Not that I would say anything should change :-) it ain’t actually broken, but it could get fixed.

(Oh dear, Hugh showing his ignorance of the fancy stuff again)

Best
Hugh
> 
> To give that, if we add interpretation of bNodes used in this value form (datatype properties vs object properties ?), so you can ask about shared values, we have made them tidy again.  But then it is little different from structured literals with @lang and ^^datatype.
> 
> Having the data model and the access model different does not gain anything.  The data model should reflect the way the data is accessed.
> 
> Like RDF lists, or seq/alt/bag, encoding values in triples is attractive in its uniformity but the "triples" nature always shows through somewhere, making something else complicated.
> 
> 	Andy
> 
> PS Graph leaning does not help because you can't add data incrementally if leaning is applied at each addition.
> 
>> I suggested way back these properties as a way of putting the info into the graph
>> but my suggestion was not adopted.  I think it would have made the model
>> more complete which would have been a good think, though
>> SPARQL would need to have language-independent query matching as a  special case -- but
>> it does now too really.
>> 
>> (These are interpretation properties.  I must really update
>> http://www.w3.org/DesignIssues/InterpretationProperties.html)
>> 
>> Units are fun as properties too. http://www.w3.org/2007/ont/unit
>> 
>> Tim
>> 
>>> 
>>> 	Andy

-- 
Hugh Glaser
   20 Portchester Rise
   Eastleigh
   SO50 4QS
Mobile: +44 75 9533 4155, Home: +44 23 8061 5652
Received on Sunday, 1 December 2013 23:03:00 UTC