Re: xml:lang [was Re: Outstanding Issues ] from Patrick Stickler on 2002-03-01 (w3c-rdfcore-wg@w3.org from March 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Fri, 01 Mar 2002 08:49:48 +0100
To: Dan Connolly <connolly@w3.org>, Pat Hayes <phayes@ai.uwf.edu>
CC: Brian McBride <bwm@hplb.hpl.hp.com>, RDF Core <w3c-rdfcore-wg@w3.org>
Message-ID: <B8A4F12F.FC09%patrick.stickler@nokia.com>

On 2002-02-27 17:41, "ext Dan Connolly" <connolly@w3.org> wrote:

> ("abc", 'en') ->    "abc"-en
> ("abc",  none) ->    "abc"
> ("abc", 'fr') ->    "abc"-fr
> 
> Also, for XML literals, we'll have xml("canonical-form...", "en").
> 
> The point is: the literal is syntactically evident in the RDF document.

This seems reasonable.

>> when the 
>> datatyping discussion was in full progress, all predicated on the
>> assumption (and indeed the frequent explicit assertion, to which
>> nobody raised the slightest objection) that literals were strings. If
>> literals are not strings, then we have to go and do all that again,
>> because NONE of it makes any sense at all. What is the result of
>> applying the lexical-to-value mapping of xsd:number to the pair
>> ("34", "french") ?
> 
> same as the result of applying it to "blarf": you lose.

Hmmm... this may be the example I have been looking for to
explain the difference between literal and lexical form ;-)

Literal:       <0,"35",en>
Lexical Form:  "35"

The lexical form is that portion of the literal which is significant
to datatyping. Thus, if we paired the above literal with the datatype
xsd:integer, the pairing that is passed to the extra-RDF application is
(xsd:integer, "35") and not (xsd:integer, <0,"35",en>) because parseType
and language tags are irrelevant to datatyping and are superfluous insofar
as the mapping is concerned.

Consider the following two entailment examples, the first which takes
the earlier (apparently incorrect) view that literals are just strings,
and the second which takes literals as triples. The complete data for
each is also attached.

Example 1: Unstructured Literals (from rdfdt.n3)

{  
   :Bob :age "35" .
   :age rdfs:range xsd:integer .
   xsd:integer rdf:type rdfs:Datatype .
}  
log:implies
{  
   :Bob :age _:1 .
   _:1 xsd:integer "35" .
   _:1 rdf:type xsd:integer .
   _:1 rdf:type rdfs:DatatypeValue .
   _:1 rdfs:isMemberOfValueSpaceOf xsd:integer .
   _:1 rdfs:hasLexicalRepresentationAs "35" .
   "35" rdf:type xsd:integer .
   "35" rdf:type rdfs:DatatypeLexicalForm .
   "35" rdfs:isLexicalRepresentationOf _:1 .
   "35" rdfs:isMemberOfLexicalSpaceOf xsd:integer .
   xsd:integer rdfs:hasValueSpace :V .
   xsd:integer rdfs:hasLexicalSpace :L .
   :V rdf:type rdfs:DatatypeValueSpace .
   :V rdfs:subClassOf xsd:integer .
   :L rdf:type rdfs:DatatypeLexicalSpace .
   :L rdfs:subClassOf xsd:integer .
   _:1 rdf:type :V .
   "35" rdf:type :L .
} .
# ultimate, unambiguous (extra-RDF) interpretation:
# (xsd:integer,"35") = 35   i.e. Bob is 35 years old.

--

Example 2: Structured Literals (from rdfdt-SL.n3)

(I'm using the above suggested notation for language qualified
literals and presuming the presence of xml:lang="fi")

{
   :Bob :age "35"-fi .
   :age rdfs:range xsd:integer .
   xsd:integer rdf:type rdfs:Datatype .
}
log:implies
{
   :Bob :age _:1 .
   _:1 xsd:integer "35"-fi .
   _:1 rdf:type xsd:integer .
   _:1 rdf:type rdfs:DatatypeValue .
   _:1 rdfs:isMemberOfValueSpaceOf xsd:integer .
   _:1 rdfs:hasLexicalRepresentationIn "35"-fi .
   "35"-fi rdf:type xsd:integer .
   "35"-fi rdf:type rdfs:DatatypeLiteral .
   "35"-fi rdfs:containsLexicalRepresentationOf _:1 .
   "35"-fi rdfs:containsMemberOfLexicalSpaceOf xsd:integer .
   xsd:integer rdfs:hasValueSpace :V .
   xsd:integer rdfs:hasLexicalSpace :L .
   :V rdf:type rdfs:DatatypeValueSpace .
   :V rdfs:subClassOf xsd:integer .
   :L rdf:type rdfs:DatatypeLexicalSpace .
   :L rdfs:subClassOf xsd:integer .
   _:1 rdf:type :V .
#  "35"-fi rdf:type :L .  # Can't say this!
} .
# ultimate, unambiguous (extra-RDF) interpretation:
# (xsd:integer,"35") = 35   i.e. Bob is 35 years old.

--

The second example illustrates the one drawback of structured
literals (though I think we can still live with it) that insofar
as the graph syntax is concerned, it hides the actual lexical
form. It is true that one can parse the literal to extract it,
but that is IMO the same as parsing a URI to get at some
component of the URI. Literals should be equally opaque to
RDFS logic as are URIs.

If there were any way (and I'm just musing, not calling for
any change to the status quo ;-) that parseType and language
knowledge could be maintained for literals in some other
way than a structured literal, we would be able to express a
more useful and complete body of knowledge about lexical
forms. Of course, since literals aren't subjects anyway, and
we can't actually express the statement "35" rdf:type :L
perhaps it doesn't really matter.

Something to think about, though...

Patrick

--

Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com

Attachments

application/octet-stream attachment: rdfdt.n3
application/octet-stream attachment: rdfdt-SL.n3

Received on Friday, 1 March 2002 02:48:21 UTC