- From: pat hayes <phayes@ihmc.us>
- Date: Thu, 18 Sep 2003 12:37:02 -0500
- To: Graham Klyne <gk@ninebynine.org>
- Cc: w3c-rdfcore-wg@w3.org
>Continuing in the spirit of airing alternative designs, not proposing them...
>
>I think Pat's approach is elegant and quite effective, and is in
>substantial concurrence with earlier thoughts expressed by DanC [1]
>and myself [2]. The main difference that I see is the proposal to
>represent language tags in the graph rather than as part of a
>literal.
>
>[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Oct/0031.html
>
>[2] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Nov/0635.html
>
>I'm wondering if the suggestion to translate
>
>>aaa ppp "sss"@ttt .
>>-->>
>>aaa ppp _:x .
>>_:x xsd:string "sss" .
>>_:x rdf:langTag "ttt" .
>
>might be problematic in its use of xsd:string, in that this would mean that:
>
> aaa ppp "sss"@ttt .
>entails
> aaa ppp "sss" .
>
>for which there is no corresponding entailment in the current design.
Ah, indeed I had not noticed that. I think this will happen with or
without xsd:string, actually: it has to do with the fact that the tag
is now a property, so can be omitted in the description, so a
description of a simple literal without a tag is indistinguishable
from an incomplete description of a simple literal with an unknown
tag. That is ugly and may be fatal.
> Maybe a simple way to avoid this is to apply the "i-default" tag
>(per RFC2277 - http://www.ietf.org/rfc/rfc2277.txt); e.g. so that
>
>aaa ppp "sss" .
>-->>
>aaa ppp _:x .
>_:x xsd:string "sss" .
>_:x rdf:langTag "i-default" .
>
>Thus blocking the above entailment. Hmmm, i-default is not a good
>choice because it suggests a human readable language, but I think a
>variation on this could work.
Technically, but it makes the whole thing unworkable, I think. If the
tag assertion is compulsory then the tags will break conventional
datatyping and we would be better off with the current design.
Pat
>..
>
>I'm not sure that I fully concur with Pat's proposed handling of
>parseType=Literal, in that I don't see that, in terms of graph
>formation, there needs to be any different treatment from ordinary
>plain literals ... that is, parseType=Literal makes sense as a
>purely syntactic directive for processing of RDF/XML content to
>plain literal form. I don't think this is inconsistent with Pat's
>proposal, I just don't see why the parseType=Literal case needs to
>be drawn out specially in this way. One of the things I least like
>about the current design is the way that syntactic processing is not
>kept distinct from datatype semantics. Pat's proposal discuses
>treatment of rdf:XMLLiteral as a pure datatype, which seems sensible
>to me.
>
>Concerning:
>>_:x rdfs:Literal "10" .
>>
>>would say that _:x was some value which has "10" as a lexical form,
>>but we don't (yet) know which one. Or, we could not do this.
>
>Would this be a reasonable interpretation for rdf:value, consistent
>with existing usage?
>
>#g
>--
>
>At 20:16 17/09/03 -0500, pat hayes wrote:
>
>>Greetings.
>>
>>Y'all are going to just LOVE me for this, but thinking about the
>>i18n desireables for XML has led me to the observation that one of
>>our old and abandoned designs for handling datatypes would handle
>>this stuff quite smoothly. The key point is that terms denoting
>>datatype values are allowed in the subject position, so attributes
>>like language tags and lexical 'type' can be described as RDF
>>properties. We gave up on this on the grounds largely of
>>triple-bloat, a concern which now seems curiously irrelevant when
>>one contemplates what OWL will look like. Anyway, in the spirit of
>>Brian's comment,
>>
>>>I've tried to be careful not to describe it as a proposal. This is an
>>>alternative design. I'm not proposing it, just describing it.
>>
>>here's the design.
>>
>>Plain literals are just strings, and they denote themselves. There
>>are no typed literals. Datatypes are indicated by class/property
>>names. Datatype values are typically indicated by bnodes, so
>>instead of
>>
>>aaa ppp "sss"^^ddd .
>>
>>we write
>>
>>aaa ppp _:x .
>>_:x ddd "sss" .
>>
>>where the _:x denotes the datatype value. You could use URIs in
>>some cases, eg
>>
>>ex:PIto5places xsd:number "3.14162" .
>>
>>There is a general D-entailment
>>
>>aaa ddd "sss" .
>>|=
>>aaa rdf:type ddd .
>>
>>when sss is a legal lexical form for the datatype ddd; the version
>>of this for XML is an RDF entailment (though see later).
>>
>>This design, unlike our present one, has subject terms denoting
>>datatype values, so lang tags can be considered to be *properties
>>of datatype values*, and the tags themselves can be encoded as
>>simple literals, so we just write an assertion:
>>
>>_:x rdf:langTag "en" .
>>
>>and our current design translates thus:
>>
>>aaa ppp "sss"@ttt .
>>-->>
>>aaa ppp _:x .
>>_:x xsd:string "sss" .
>>_:x rdf:langTag "ttt" .
>>
>>Note that xsd:string is the appropriate datatype for simple
>>literals, providing a way to in effect put a simple literal string
>>in the subject position (encoded as a bnode). In fact, in this
>>design, xsd:string is in effect owl:sameAs applied to literals.
>>
>>----
>>
>>This way of handling lang tags allows us to associate lang tags
>>with XML literals without putting the tag into the lexical space of
>>the literal, so allows XML literal to be a normal datatype, just as
>>it is right now (though read on) while also handling one of
>>Martin's requirements. The parsing of parseType="Literal" needs to
>>include the asserting of an appropriate rdf:langTag assertion in
>>the graph, according to the XML rules, but that seems
>>straightforward. This design also allows sub-XML datatypes to
>>automatically inherit language tagging, since they will be members
>>of subClasses of rdf:XMLLiteral and hence of rdf:XMLliteral itself,
>>and hence the members of these classes will still have any
>>properties they had previously. Notice that the property is of the
>>literal *value*, rather than syntactically attached to the literal,
>>so rdf:langTag only makes intuitive sense for self-denoting
>>literals, or at any rate those which denote textual kinds of thing
>>rather than mathematical kinds of thing. However, there is no need
>>to have special rules to 'ignore' lang tags on non-textual
>>datatypes such as numbers: an assertion like
>>
>>_:x xsd:integer "25" .
>>_:x rdf:langTag "en" .
>>
>>is semantically vacuous but harmless, or can be considered harmless
>>as far as RDF is concerned. (A lang-tag-savvy app might complain
>>about things like this.) Also we don't need lang tags as a
>>syntactic attachment to plain literals; the same trick works for
>>plain literals.
>>
>>There isn't any general semantics for rdf:langTag, but for
>>particular cases it can be defined, eg we can define it for simple
>>literals - simple literal *values* can be pairs just as they are
>>right now, and so IEXT(I(rdf:langTag)) is all pairs of the form
>><<sss, tag>, tag> , and IEXT(I(xsd:string)) is all pairs <<sss,
>>tag>, sss> - and for XML literals.
>>
>>Here's the MT for the datatyping, re-done in a more up-todate
>>style: D is a datatype map, as usual.
>>If <uri, ddd> is in D then:
>>I(uri)=ddd;
>>ddd is in ICEXT(I(rdf:Datatype));
>>for any string sss, sss is in the lexical space of ddd iff
>><L2V(ddd)(sss),sss> is in IEXT(ddd);
>>If sss is in the lexical space of ddd then
>>L2V(ddd)(sss) is in ICEXT(ddd)
>>
>>Note that being in the class is necessary but not sufficient for
>>the datatyping rule to apply; this avoids some of the snags we had
>>with this design previously involving subtypes. For example, we can
>>have
>>ex:octal rdfs:subClassOf xsd:integer .
>>_:x ex:octal "10" .
>>
>>and _:x unambiguously denotes eight; in fact
>>
>>_:x owl:sameAs _:y .
>>_:y xsd:integer "8" .
>>
>>The lexical typing only gets invoked by the datatype property; the
>>class membership has to do with the values. Alternative lexical
>>forms give no problem either:
>>
>>_:x xsd:integer "2" .
>>_:x xsd:integer "0002" .
>>
>>BTW, we could now use rdfs:Literal as a generic superproperty of
>>all datatype properties, as well as a superclass of all datatype
>>values, so that
>>
>>_:x rdfs:Literal "10" .
>>
>>would say that _:x was some value which has "10" as a lexical form,
>>but we don't (yet) know which one. Or, we could not do this.
>>
>>-----
>>
>>This would be a major change and would probably effect several
>>implementations.
>>
>>In order to change our current design to this we would need to:
>>1. remove typed literals (or, treat them as an abbreviations for
>>the two-triple form, maybe?)
>>2. remove lang tags from plain literals (or treat these as an
>>abbreviation, similarly)
>>3. introduce rdf:langTag (or whatever) and add prose discussing the
>>use of lang tags as properties
>>4. modify the datatype semantics, as above
>>5. redefine the XML parsing rules for parseType="Literal"
>>6. rewrite the Lbase translation appropriately
>>
>>I think this would mean changes to every document; it would be a
>>fairly horrendous editing task at this stage.
>>
>>On the other hand, it does have a certain elegance. There is only
>>one kind of literal, and literals are genuinely simple, both
>>syntactically and semantically, and always denote themselves in all
>>contexts (remember non-tidy graphs?); and it uses RDF as a
>>descriptive language rather than extending the syntax in an
>>XML-idiosyncratic way.
>>
>>We abandoned this design, as I recall, for three reasons. First, it
>>seemed too 'indirect' and like triple-bloat. However, in our
>>current design we have to specify the same information, and we can
>>infer the bnode:
>>
>>aaa ppp "10"^^xsd:integer .
>>|=
>>aaa ppp _:x .
>>
>>compare
>>
>>aaa ppp _:x .
>>_:x xsd:integer "10" .
>>
>>an in any case in this post-OWL era, triple-bloat seems to be
>>rampant. I note that it would be harmless to allow the current
>>typed-literal form as an abbreviation for the two-triple form, by
>>the way; or even as an alternative, with inference rules to convert
>>them back and forth. The feeling of being 'indirect' came, as I
>>recall, from a feeling that we *ought* to be able, dammit, to write
>>things like
>>ex:Jill ex:age "10"
>>rather have to go through a bnode:
>>ex:Jill ex:age _:x .
>>_:x xsd:integer "10" .
>>This feeling now seems to me to have been overly naive, however,
>>with the benefit of hindsight.
>>
>>Second, it seemed unintuitive to some folk to have a property and a
>>class with the same name. I never had this trouble myself, and it
>>seems to me to be a good illustration of the usefulness of the
>>intensional semantics that RDF provides: if you've got it, flaunt
>>it. [*see PS] However, the design could be modified by allowing
>>systematic variants for the property or class names, eg using
>>xsd:integer for the property and xsd:Integer for the class. Or we
>>could do without the datatype classes altogether, since
>>
>>aaa rdf:type xsd:integer .
>> (read: aaa is an integer)
>>
>>and
>>
>>aaa xsd:integer _:x .
>>(read: aaa is something denoted by a numeral)
>>
>>convey the exact same information in {xsd:integer}-interpretations.
>>
>>Third, as I recall, there were some issues arising from the
>>long-range datatyping getting too complicated. OK, Im not
>>suggesting re-opening that particular can of worms. (Though I would
>>note that when it does get re-opened in the future, I bet this
>>design will be a lot more tractable than our current design, which
>>will have to be simply shelved.)
>>
>>----
>>
>>The other i18n issue involved treating XML literals without markup
>>as being plain text. Assuming that 'plain text' means a character
>>string, I now think we can do that by a bit of semantic sleight of
>>hand as follows. First, observe that any piece of XML can be
>>encoded as a character string, but XML imposes extra equivalence
>>(identity) conditions, such as identifying "<br />" with
>>"<br></br>". So, consider the set of legal XML texts, considered as
>>Unicode strings, and define an equivalence relation on this set by
>>saying that strings with the same XML normal form are equivalent;
>>then say that any such string denotes its equivalence class, and
>>then in a familiar abuse of notation say that singleton classes are
>>identical to their members. Now, any piece of XML text without any
>>markup in it denotes itself, just as a plain literal does. (There
>>may be some whitespace issues which make " " (two spaces)
>>equivalent to " " (one space); if so, this will need to be stated
>>more carefully, eg by applying the normalization only to stuff
>>inside <->.) If we say that this is the value space of
>>rdf:XMLLiteral, rather than the non-text 'structural' sets we have
>>at present, then Martin might be happier.
>>
>>On the other hand, this supports a number of hard-to-state RDF
>>entailments, such as intersubstituting "sss"^^xsd:string and
>>"sss"^^rdf:XMLLiteral under circumstances which can only be
>>recognized by an XML parser, which seems *very* ugly to include in
>>basic RDF, so I would argue that if we do something like this then
>>we treat rdf:XMLLiteral as a genuine datatype so that these
>>entailments are restricted to D-interpretations and are not valid
>>in simple RDF; and it also means that XML *with* markup denotes
>>something very like a character string; in particular,
>>"<"^^rdf:XMLLiteral
>>on this proposal, has got absolutely nothing in common with
>>"<"^^xsd:string. So maybe Martin might not be so happy after all.
>>
>>Anyway, thought I'd just mention it in passing.
>>
>>Pat
>>
>>PS. I thought of an interesting analogy. Literals are a kind of
>>name, and in a simple extensional logic they would have a fixed
>>denotation, eg numerals denote numbers, I("10")=10 (ie, ten) and so
>>on, end of story. But RDF is intensional, and datatypes treat
>>literals like intensional names. Seen in this way, the literal
>>always denotes itself, ie I(literal)=literal; but it has a variable
>>extension, *determined by the datatype context*. In other words,
>>the datatype lexical-to-value map is a kind of extension mapping,
>>like IEXT for properties and ICEXT for classes. Call it ILEXT-d
>>where d is the datatype; then the 'meaning' of a literal string sss
>>in a datatype context defined by d would be ILEXT-d(I(sss)) -
>>compare IEXT(I(p)) or ICEXT(I(a)) where p is a property uri and a
>>is a uri or bnode - which since I(sss) = sss is just ILEXT-d(sss),
>>i.e. L2V(d)(sss). This is exactly what the subject bnode denotes
>>in a datatype triple; in other words, we are using the datatype
>>property name as a kind of explicit extension mapping on literal
>>strings. On this view, then, what a datatype does is to fix the
>>extension mapping for literals, considered as intensional names.
>>The universal superproperty rdfs:Literal works the same way but
>>refuses to supply a context, so letting the extension mapping be
>>anything.
>>
>>
>>--
>>---------------------------------------------------------------------
>>IHMC (850)434 8903 or (650)494 3973 home
>>40 South Alcaniz St. (850)202 4416 office
>>Pensacola (850)202 4440 fax
>>FL 32501 (850)291 0667 cell
>>phayes@ihmc.us http://www.ihmc.us/users/phayes
>
>------------
>Graham Klyne
>GK@NineByNine.org
--
---------------------------------------------------------------------
IHMC (850)434 8903 or (650)494 3973 home
40 South Alcaniz St. (850)202 4416 office
Pensacola (850)202 4440 fax
FL 32501 (850)291 0667 cell
phayes@ihmc.us http://www.ihmc.us/users/phayes
Received on Thursday, 18 September 2003 13:37:05 UTC