- From: Graham Klyne <gk@ninebynine.org>
- Date: Thu, 18 Sep 2003 12:21:34 +0100
- To: pat hayes <phayes@ihmc.us>, w3c-rdfcore-wg@w3.org
Continuing in the spirit of airing alternative designs, not proposing them...
I think Pat's approach is elegant and quite effective, and is in
substantial concurrence with earlier thoughts expressed by DanC [1] and
myself [2]. The main difference that I see is the proposal to represent
language tags in the graph rather than as part of a literal.
[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Oct/0031.html
[2] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Nov/0635.html
I'm wondering if the suggestion to translate
>aaa ppp "sss"@ttt .
>-->>
>aaa ppp _:x .
>_:x xsd:string "sss" .
>_:x rdf:langTag "ttt" .
might be problematic in its use of xsd:string, in that this would mean that:
aaa ppp "sss"@ttt .
entails
aaa ppp "sss" .
for which there is no corresponding entailment in the current
design. Maybe a simple way to avoid this is to apply the "i-default" tag
(per RFC2277 - http://www.ietf.org/rfc/rfc2277.txt); e.g. so that
aaa ppp "sss" .
-->>
aaa ppp _:x .
_:x xsd:string "sss" .
_:x rdf:langTag "i-default" .
Thus blocking the above entailment. Hmmm, i-default is not a good choice
because it suggests a human readable language, but I think a variation on
this could work.
..
I'm not sure that I fully concur with Pat's proposed handling of
parseType=Literal, in that I don't see that, in terms of graph formation,
there needs to be any different treatment from ordinary plain literals ...
that is, parseType=Literal makes sense as a purely syntactic directive for
processing of RDF/XML content to plain literal form. I don't think this is
inconsistent with Pat's proposal, I just don't see why the
parseType=Literal case needs to be drawn out specially in this way. One of
the things I least like about the current design is the way that syntactic
processing is not kept distinct from datatype semantics. Pat's proposal
discuses treatment of rdf:XMLLiteral as a pure datatype, which seems
sensible to me.
Concerning:
>_:x rdfs:Literal "10" .
>
>would say that _:x was some value which has "10" as a lexical form, but we
>don't (yet) know which one. Or, we could not do this.
Would this be a reasonable interpretation for rdf:value, consistent with
existing usage?
#g
--
At 20:16 17/09/03 -0500, pat hayes wrote:
>Greetings.
>
>Y'all are going to just LOVE me for this, but thinking about the i18n
>desireables for XML has led me to the observation that one of our old and
>abandoned designs for handling datatypes would handle this stuff quite
>smoothly. The key point is that terms denoting datatype values are allowed
>in the subject position, so attributes like language tags and lexical
>'type' can be described as RDF properties. We gave up on this on the
>grounds largely of triple-bloat, a concern which now seems curiously
>irrelevant when one contemplates what OWL will look like. Anyway, in the
>spirit of Brian's comment,
>
>>I've tried to be careful not to describe it as a proposal. This is an
>>alternative design. I'm not proposing it, just describing it.
>
>here's the design.
>
>Plain literals are just strings, and they denote themselves. There are no
>typed literals. Datatypes are indicated by class/property names. Datatype
>values are typically indicated by bnodes, so instead of
>
>aaa ppp "sss"^^ddd .
>
>we write
>
>aaa ppp _:x .
>_:x ddd "sss" .
>
>where the _:x denotes the datatype value. You could use URIs in some
>cases, eg
>
>ex:PIto5places xsd:number "3.14162" .
>
>There is a general D-entailment
>
>aaa ddd "sss" .
>|=
>aaa rdf:type ddd .
>
>when sss is a legal lexical form for the datatype ddd; the version of this
>for XML is an RDF entailment (though see later).
>
>This design, unlike our present one, has subject terms denoting datatype
>values, so lang tags can be considered to be *properties of datatype
>values*, and the tags themselves can be encoded as simple literals, so we
>just write an assertion:
>
>_:x rdf:langTag "en" .
>
>and our current design translates thus:
>
>aaa ppp "sss"@ttt .
>-->>
>aaa ppp _:x .
>_:x xsd:string "sss" .
>_:x rdf:langTag "ttt" .
>
>Note that xsd:string is the appropriate datatype for simple literals,
>providing a way to in effect put a simple literal string in the subject
>position (encoded as a bnode). In fact, in this design, xsd:string is in
>effect owl:sameAs applied to literals.
>
>----
>
>This way of handling lang tags allows us to associate lang tags with XML
>literals without putting the tag into the lexical space of the literal, so
>allows XML literal to be a normal datatype, just as it is right now
>(though read on) while also handling one of Martin's requirements. The
>parsing of parseType="Literal" needs to include the asserting of an
>appropriate rdf:langTag assertion in the graph, according to the XML
>rules, but that seems straightforward. This design also allows sub-XML
>datatypes to automatically inherit language tagging, since they will be
>members of subClasses of rdf:XMLLiteral and hence of rdf:XMLliteral
>itself, and hence the members of these classes will still have any
>properties they had previously. Notice that the property is of the literal
>*value*, rather than syntactically attached to the literal, so rdf:langTag
>only makes intuitive sense for self-denoting literals, or at any rate
>those which denote textual kinds of thing rather than mathematical kinds
>of thing. However, there is no need to have special rules to 'ignore' lang
>tags on non-textual datatypes such as numbers: an assertion like
>
>_:x xsd:integer "25" .
>_:x rdf:langTag "en" .
>
>is semantically vacuous but harmless, or can be considered harmless as far
>as RDF is concerned. (A lang-tag-savvy app might complain about things
>like this.) Also we don't need lang tags as a syntactic attachment to
>plain literals; the same trick works for plain literals.
>
>There isn't any general semantics for rdf:langTag, but for particular
>cases it can be defined, eg we can define it for simple literals - simple
>literal *values* can be pairs just as they are right now, and so
>IEXT(I(rdf:langTag)) is all pairs of the form <<sss, tag>, tag> , and
>IEXT(I(xsd:string)) is all pairs <<sss, tag>, sss> - and for XML literals.
>
>Here's the MT for the datatyping, re-done in a more up-todate style: D is
>a datatype map, as usual.
>If <uri, ddd> is in D then:
>I(uri)=ddd;
>ddd is in ICEXT(I(rdf:Datatype));
>for any string sss, sss is in the lexical space of ddd iff
><L2V(ddd)(sss),sss> is in IEXT(ddd);
>If sss is in the lexical space of ddd then
>L2V(ddd)(sss) is in ICEXT(ddd)
>
>Note that being in the class is necessary but not sufficient for the
>datatyping rule to apply; this avoids some of the snags we had with this
>design previously involving subtypes. For example, we can have
>ex:octal rdfs:subClassOf xsd:integer .
>_:x ex:octal "10" .
>
>and _:x unambiguously denotes eight; in fact
>
>_:x owl:sameAs _:y .
>_:y xsd:integer "8" .
>
>The lexical typing only gets invoked by the datatype property; the class
>membership has to do with the values. Alternative lexical forms give no
>problem either:
>
>_:x xsd:integer "2" .
>_:x xsd:integer "0002" .
>
>BTW, we could now use rdfs:Literal as a generic superproperty of all
>datatype properties, as well as a superclass of all datatype values, so that
>
>_:x rdfs:Literal "10" .
>
>would say that _:x was some value which has "10" as a lexical form, but we
>don't (yet) know which one. Or, we could not do this.
>
>-----
>
>This would be a major change and would probably effect several
>implementations.
>
>In order to change our current design to this we would need to:
>1. remove typed literals (or, treat them as an abbreviations for the
>two-triple form, maybe?)
>2. remove lang tags from plain literals (or treat these as an
>abbreviation, similarly)
>3. introduce rdf:langTag (or whatever) and add prose discussing the use of
>lang tags as properties
>4. modify the datatype semantics, as above
>5. redefine the XML parsing rules for parseType="Literal"
>6. rewrite the Lbase translation appropriately
>
>I think this would mean changes to every document; it would be a fairly
>horrendous editing task at this stage.
>
>On the other hand, it does have a certain elegance. There is only one kind
>of literal, and literals are genuinely simple, both syntactically and
>semantically, and always denote themselves in all contexts (remember
>non-tidy graphs?); and it uses RDF as a descriptive language rather than
>extending the syntax in an XML-idiosyncratic way.
>
>We abandoned this design, as I recall, for three reasons. First, it seemed
>too 'indirect' and like triple-bloat. However, in our current design we
>have to specify the same information, and we can infer the bnode:
>
>aaa ppp "10"^^xsd:integer .
>|=
>aaa ppp _:x .
>
>compare
>
>aaa ppp _:x .
>_:x xsd:integer "10" .
>
>an in any case in this post-OWL era, triple-bloat seems to be rampant. I
>note that it would be harmless to allow the current typed-literal form as
>an abbreviation for the two-triple form, by the way; or even as an
>alternative, with inference rules to convert them back and forth. The
>feeling of being 'indirect' came, as I recall, from a feeling that we
>*ought* to be able, dammit, to write things like
>ex:Jill ex:age "10"
>rather have to go through a bnode:
>ex:Jill ex:age _:x .
>_:x xsd:integer "10" .
>This feeling now seems to me to have been overly naive, however, with the
>benefit of hindsight.
>
>Second, it seemed unintuitive to some folk to have a property and a class
>with the same name. I never had this trouble myself, and it seems to me to
>be a good illustration of the usefulness of the intensional semantics that
>RDF provides: if you've got it, flaunt it. [*see PS] However, the design
>could be modified by allowing systematic variants for the property or
>class names, eg using xsd:integer for the property and xsd:Integer for the
>class. Or we could do without the datatype classes altogether, since
>
>aaa rdf:type xsd:integer .
> (read: aaa is an integer)
>
>and
>
>aaa xsd:integer _:x .
>(read: aaa is something denoted by a numeral)
>
>convey the exact same information in {xsd:integer}-interpretations.
>
>Third, as I recall, there were some issues arising from the long-range
>datatyping getting too complicated. OK, Im not suggesting re-opening that
>particular can of worms. (Though I would note that when it does get
>re-opened in the future, I bet this design will be a lot more tractable
>than our current design, which will have to be simply shelved.)
>
>----
>
>The other i18n issue involved treating XML literals without markup as
>being plain text. Assuming that 'plain text' means a character string, I
>now think we can do that by a bit of semantic sleight of hand as follows.
>First, observe that any piece of XML can be encoded as a character string,
>but XML imposes extra equivalence (identity) conditions, such as
>identifying "<br />" with "<br></br>". So, consider the set of legal XML
>texts, considered as Unicode strings, and define an equivalence relation
>on this set by saying that strings with the same XML normal form are
>equivalent; then say that any such string denotes its equivalence class,
>and then in a familiar abuse of notation say that singleton classes are
>identical to their members. Now, any piece of XML text without any markup
>in it denotes itself, just as a plain literal does. (There may be some
>whitespace issues which make " " (two spaces) equivalent to " " (one
>space); if so, this will need to be stated more carefully, eg by applying
>the normalization only to stuff inside <->.) If we say that this is the
>value space of rdf:XMLLiteral, rather than the non-text 'structural' sets
>we have at present, then Martin might be happier.
>
>On the other hand, this supports a number of hard-to-state RDF
>entailments, such as intersubstituting "sss"^^xsd:string and
>"sss"^^rdf:XMLLiteral under circumstances which can only be recognized by
>an XML parser, which seems *very* ugly to include in basic RDF, so I would
>argue that if we do something like this then we treat rdf:XMLLiteral as a
>genuine datatype so that these entailments are restricted to
>D-interpretations and are not valid in simple RDF; and it also means that
>XML *with* markup denotes something very like a character string; in
>particular,
>"<"^^rdf:XMLLiteral
>on this proposal, has got absolutely nothing in common with
>"<"^^xsd:string. So maybe Martin might not be so happy after all.
>
>Anyway, thought I'd just mention it in passing.
>
>Pat
>
>PS. I thought of an interesting analogy. Literals are a kind of name, and
>in a simple extensional logic they would have a fixed denotation, eg
>numerals denote numbers, I("10")=10 (ie, ten) and so on, end of
>story. But RDF is intensional, and datatypes treat literals like
>intensional names. Seen in this way, the literal always denotes itself, ie
>I(literal)=literal; but it has a variable extension, *determined by the
>datatype context*. In other words, the datatype lexical-to-value map is a
>kind of extension mapping, like IEXT for properties and ICEXT for
>classes. Call it ILEXT-d where d is the datatype; then the 'meaning' of a
>literal string sss in a datatype context defined by d would be
>ILEXT-d(I(sss)) - compare IEXT(I(p)) or ICEXT(I(a)) where p is a property
>uri and a is a uri or bnode - which since I(sss) = sss is just
>ILEXT-d(sss), i.e. L2V(d)(sss). This is exactly what the subject bnode
>denotes in a datatype triple; in other words, we are using the datatype
>property name as a kind of explicit extension mapping on literal strings.
>On this view, then, what a datatype does is to fix the extension mapping
>for literals, considered as intensional names. The universal
>superproperty rdfs:Literal works the same way but refuses to supply a
>context, so letting the extension mapping be anything.
>
>
>--
>---------------------------------------------------------------------
>IHMC (850)434 8903 or (650)494 3973 home
>40 South Alcaniz St. (850)202 4416 office
>Pensacola (850)202 4440 fax
>FL 32501 (850)291 0667 cell
>phayes@ihmc.us http://www.ihmc.us/users/phayes
------------
Graham Klyne
GK@NineByNine.org
Received on Thursday, 18 September 2003 08:06:27 UTC