Re: handling bare literals and PS a Q. about lang tags from Brian McBride on 2002-09-06 (w3c-rdfcore-wg@w3.org from September 2002)

From: Brian McBride <bwm@hplb.hpl.hp.com>
Date: Fri, 06 Sep 2002 10:54:08 +0100
To: pat hayes <phayes@ai.uwf.edu>, w3c-rdfcore-wg@w3.org, Patrick Stickler <patrick.stickler@nokia.com>
Message-Id: <5.1.0.14.0.20020906105222.0cbd83a0@0-mail-1.hpl.hp.com>
Cluck. Cluck. Cluck.

Pat, please hold that thought.  Right now I'm trying to encourage the WG to 
get the datatyped literals sorted out and then we can come back to looking 
at the other sort.

Mother Hen

At 01:01 06/09/2002 -0700, pat hayes wrote:

>Patrick's document up to section 6 describes how to handle datatyped 
>literals (and does it fine), but I think that its rather strict rejection 
>of 'bare' (ie un-datatyped) literals is rather extreme. I have a 
>suggestion for how to handle bare literals which is entirely compatible 
>(in fact, I think more compatible) with the treatment of datatyped 
>literals, and has a few other advantages as well.
>
>In brief, the idea is to treat a bare literal as being a datatyped literal 
>with a bnode as its datatype. That is, it means 'some datatyping of this 
>literal but we don't know which one right now'. This allows one to use 
>bare literals coherently in any context where the datatype can be 
>specified in some other way: in particular, it allows classical long-range 
>datatyping for bare literals. It also can be used to make various 
>different 'datatyping assumptions' to be made explicit by the use of 
>common or distinct bnodes.
>
>It also can be used to simplify the syntax. Right now, with the 
>Guha/Stickler datatyping scheme, we have basically four kinds of node in 
>an RDF graph: urirefs, bnodes, bare literals and datatyped literals. I 
>suggest factoring this four into two cases plus an option, as follows. A 
>basic node is either a uriref or an bnode. Urirefs refer to things and 
>bnodes are existentially quantified, as usual.  A node in an RDF graph is 
>either a basic node, or a basic node plus a literal, called a literal 
>node. All these node types are tidy (where identity of a literal node 
>depends on identity of both the basic node and the literal.) There is one 
>special semantic rule for literal nodes: the simple node part of it always 
>denotes a datatype, and the literal node denotes the value of the literal 
>string under the datatype mapping.
>
>That is pretty much all there is to it. When the datatype is referred to 
>by a uriref, this is the Guha/Stickler idea.  When its a bnode, we get 
>what was once referred to as 'semantically untidy' literals (but they 
>aren't untidy any more :-). If we combine the semantic conditions for 
>bnodes and for datatype literals, then (_:x, "10") is basically a variable 
>ranging over possible things that "10" could be mapped to by a datatype. 
>This means that it is OK to impose some particular datatype by some other 
>means, eg by a range:
>
>ex:age rdfs:range xsd:integer .
>Jenny ex:age (_:x, "10") .
>
>says Jenny is ten: in fact, we can fix the MT so that this entails
>
>Jenny ex:age (xsd:integer, "10") .
>
>Having the bnode made explicit allows some other useful ways of attaching 
>datatypes, however. Someone might for example have some rules which can 
>assign a datatype based on the form of the literal (if its a numeral, 
>assume its an integer, otherwise a string....) and these could be 
>expressed as conditions on the bnode itself.
>
>These inner bnodes inside literal nodes act just like regular bnodes, by 
>the way: the same entailments apply to them and for the same reasons, eg
>
>Jenny ex:age (xsd:integer, "10") .
>entails
>Jenny ex:age (_:y, "10") .
>
>and they can occur in other triples elsewhere in the graph, so that we are 
>able to say things like
>
>Jenny ex:age (_:y, "10") .
>_:y rdf:type rdf:Datatype .
>
>Dan wanted to be able to check literal identity by string-matching. Well, 
>he can, provided he is careful about his bnodes. On this scheme, a bare 
>literal is simply illegal, so any bare literal has to be parsed into a 
>literal node, presumably by inserting a bnode (assuming, that is, that we 
>don't have datatyping information available). Now, there are several 
>strategies for how to do this. The 'safest' strategy would be to insert a 
>unique bnode into each bare literal node. This basically allows any future 
>information about datatyping to be compatible, but on the other hand it 
>rather weakens the RDF graph. Notice that this might require a tidy 
>bare-literal graph to have some of its nodes 'split' into distinct 
>bnode-datatyped-literal nodes; an alternative strategy would be to not 
>allow this splitting; that effectively would preserve the graph structure 
>and treat the bare literals as semantically tidy.) A stronger strategy 
>would be to insert the same bnode into every literal in a document: that 
>would effectively assert that they all have to be datatyped the same way. 
>Finally, what might be called a Connolly strategy would be to use the same 
>bnode for every literal in the entire world; that would insist that ALL 
>bare literals have the same datatype, and then literal node matching would 
>amount to string matching on the literal strings. The nice thing about 
>this is that the resulting graphs would make explicit in their node 
>structure whatever assumptions were being made, so that once all literals 
>were b-node-ified, the graphs could be merged safely and the usual RDF 
>rules would keep the different assumptions intact and detect any 
>incompatibilities by the usual inference processes.
>
>Finally, one alternative is to do all the above but ALSO allow bare 
>literals as legal nodes, and require them to follow Dan's preferences and 
>denote themselves. Then *bare* literals provide a way to refer to lexical 
>forms, and datatyped literals (including those with bnodes in them) allow 
>us to use literals to refer to values. In this case all nodes would be 
>tidy also, but a bare literal would never be the same as a literal 
>node.  In many ways this conforms to everyone's wishes, I think: literals 
>always refer to themselves, datatyped literal nodes always refer to 
>values, all the usual semantic rules apply uniformly, and we can say 
>anything. The only real cost will be to legacy systems which use bare 
>literals to refer to values, but they will need to be changed, probably, 
>whatever we do. And we can discuss the translation strategies in the 
>previous paragraph, with their pros and cons, for use by conversion 
>implementers.
>
>If the WG feels this is worth bothering with, I could wrote this up as a 
>draft in a few days.
>
>Pat
>
>PS. Catching up on last weeks emails....re. the xml:lang tag, might there 
>be some interaction between this and the lexical forms for datatyping? Eg 
>I gather than in Germany, commas are used for a decimal point, so we might 
>for example have
>(xsd:real, "10,5") -en .
>being invalid but
>(xsd:real, "10,5") -gr .
>refers to ten and a half.
>
>
>
>
>--
>---------------------------------------------------------------------
>IHMC                                    (850)434 8903   home
>40 South Alcaniz St.                    (850)202 4416   office
>Pensacola,  FL 32501                    (850)202 4440   fax
>phayes@ai.uwf.edu http://www.coginst.uwf.edu/~phayes
Received on Friday, 6 September 2002 07:08:06 UTC