- From: Graham Klyne <gk@ninebynine.org>
- Date: Fri, 19 Sep 2003 19:57:30 +0100
- To: pat hayes <phayes@ihmc.us>
- Cc: w3c-rdfcore-wg@w3.org
At 12:37 18/09/03 -0500, pat hayes wrote:
>Technically, but it makes the whole thing unworkable, I think. If the tag
>assertion is compulsory then the tags will break conventional datatyping
>and we would be better off with the current design.
OK, I was having doubts about having language as an additional
property. But I think other aspects of your design may stand up to
examination, even if the language tag reverts to being part of the abstract
syntax for a literal.
#g
--
At 12:37 18/09/03 -0500, pat hayes wrote:
>>Continuing in the spirit of airing alternative designs, not proposing them...
>>
>>I think Pat's approach is elegant and quite effective, and is in
>>substantial concurrence with earlier thoughts expressed by DanC [1] and
>>myself [2]. The main difference that I see is the proposal to represent
>>language tags in the graph rather than as part of a literal.
>>
>>[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Oct/0031.html
>>
>>[2] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Nov/0635.html
>>
>>I'm wondering if the suggestion to translate
>>
>>>aaa ppp "sss"@ttt .
>>>-->>
>>>aaa ppp _:x .
>>>_:x xsd:string "sss" .
>>>_:x rdf:langTag "ttt" .
>>
>>might be problematic in its use of xsd:string, in that this would mean that:
>>
>> aaa ppp "sss"@ttt .
>>entails
>> aaa ppp "sss" .
>>
>>for which there is no corresponding entailment in the current design.
>
>Ah, indeed I had not noticed that. I think this will happen with or
>without xsd:string, actually: it has to do with the fact that the tag is
>now a property, so can be omitted in the description, so a description of
>a simple literal without a tag is indistinguishable from an incomplete
>description of a simple literal with an unknown tag. That is ugly and may
>be fatal.
>
>> Maybe a simple way to avoid this is to apply the "i-default" tag (per
>> RFC2277 - http://www.ietf.org/rfc/rfc2277.txt); e.g. so that
>>
>>aaa ppp "sss" .
>>-->>
>>aaa ppp _:x .
>>_:x xsd:string "sss" .
>>_:x rdf:langTag "i-default" .
>>
>>Thus blocking the above entailment. Hmmm, i-default is not a good choice
>>because it suggests a human readable language, but I think a variation on
>>this could work.
>
>Technically, but it makes the whole thing unworkable, I think. If the tag
>assertion is compulsory then the tags will break conventional datatyping
>and we would be better off with the current design.
>
>Pat
>
>>..
>>
>>I'm not sure that I fully concur with Pat's proposed handling of
>>parseType=Literal, in that I don't see that, in terms of graph formation,
>>there needs to be any different treatment from ordinary plain literals
>>... that is, parseType=Literal makes sense as a purely syntactic
>>directive for processing of RDF/XML content to plain literal form. I
>>don't think this is inconsistent with Pat's proposal, I just don't see
>>why the parseType=Literal case needs to be drawn out specially in this
>>way. One of the things I least like about the current design is the way
>>that syntactic processing is not kept distinct from datatype
>>semantics. Pat's proposal discuses treatment of rdf:XMLLiteral as a pure
>>datatype, which seems sensible to me.
>>
>>Concerning:
>>>_:x rdfs:Literal "10" .
>>>
>>>would say that _:x was some value which has "10" as a lexical form, but
>>>we don't (yet) know which one. Or, we could not do this.
>>
>>Would this be a reasonable interpretation for rdf:value, consistent with
>>existing usage?
>>
>>#g
>>--
>>
>>At 20:16 17/09/03 -0500, pat hayes wrote:
>>
>>>Greetings.
>>>
>>>Y'all are going to just LOVE me for this, but thinking about the i18n
>>>desireables for XML has led me to the observation that one of our old
>>>and abandoned designs for handling datatypes would handle this stuff
>>>quite smoothly. The key point is that terms denoting datatype values are
>>>allowed in the subject position, so attributes like language tags and
>>>lexical 'type' can be described as RDF properties. We gave up on this on
>>>the grounds largely of triple-bloat, a concern which now seems curiously
>>>irrelevant when one contemplates what OWL will look like. Anyway, in
>>>the spirit of Brian's comment,
>>>
>>>>I've tried to be careful not to describe it as a proposal. This is an
>>>>alternative design. I'm not proposing it, just describing it.
>>>
>>>here's the design.
>>>
>>>Plain literals are just strings, and they denote themselves. There are
>>>no typed literals. Datatypes are indicated by class/property names.
>>>Datatype values are typically indicated by bnodes, so instead of
>>>
>>>aaa ppp "sss"^^ddd .
>>>
>>>we write
>>>
>>>aaa ppp _:x .
>>>_:x ddd "sss" .
>>>
>>>where the _:x denotes the datatype value. You could use URIs in some
>>>cases, eg
>>>
>>>ex:PIto5places xsd:number "3.14162" .
>>>
>>>There is a general D-entailment
>>>
>>>aaa ddd "sss" .
>>>|=
>>>aaa rdf:type ddd .
>>>
>>>when sss is a legal lexical form for the datatype ddd; the version of
>>>this for XML is an RDF entailment (though see later).
>>>
>>>This design, unlike our present one, has subject terms denoting datatype
>>>values, so lang tags can be considered to be *properties of datatype
>>>values*, and the tags themselves can be encoded as simple literals, so
>>>we just write an assertion:
>>>
>>>_:x rdf:langTag "en" .
>>>
>>>and our current design translates thus:
>>>
>>>aaa ppp "sss"@ttt .
>>>-->>
>>>aaa ppp _:x .
>>>_:x xsd:string "sss" .
>>>_:x rdf:langTag "ttt" .
>>>
>>>Note that xsd:string is the appropriate datatype for simple literals,
>>>providing a way to in effect put a simple literal string in the subject
>>>position (encoded as a bnode). In fact, in this design, xsd:string is in
>>>effect owl:sameAs applied to literals.
>>>
>>>----
>>>
>>>This way of handling lang tags allows us to associate lang tags with XML
>>>literals without putting the tag into the lexical space of the literal,
>>>so allows XML literal to be a normal datatype, just as it is right now
>>>(though read on) while also handling one of Martin's requirements. The
>>>parsing of parseType="Literal" needs to include the asserting of an
>>>appropriate rdf:langTag assertion in the graph, according to the XML
>>>rules, but that seems straightforward. This design also allows sub-XML
>>>datatypes to automatically inherit language tagging, since they will be
>>>members of subClasses of rdf:XMLLiteral and hence of rdf:XMLliteral
>>>itself, and hence the members of these classes will still have any
>>>properties they had previously. Notice that the property is of the
>>>literal *value*, rather than syntactically attached to the literal, so
>>>rdf:langTag only makes intuitive sense for self-denoting literals, or at
>>>any rate those which denote textual kinds of thing rather than
>>>mathematical kinds of thing. However, there is no need to have special
>>>rules to 'ignore' lang tags on non-textual datatypes such as numbers: an
>>>assertion like
>>>
>>>_:x xsd:integer "25" .
>>>_:x rdf:langTag "en" .
>>>
>>>is semantically vacuous but harmless, or can be considered harmless as
>>>far as RDF is concerned. (A lang-tag-savvy app might complain about
>>>things like this.) Also we don't need lang tags as a syntactic
>>>attachment to plain literals; the same trick works for plain literals.
>>>
>>>There isn't any general semantics for rdf:langTag, but for particular
>>>cases it can be defined, eg we can define it for simple literals -
>>>simple literal *values* can be pairs just as they are right now, and so
>>>IEXT(I(rdf:langTag)) is all pairs of the form <<sss, tag>, tag> , and
>>>IEXT(I(xsd:string)) is all pairs <<sss, tag>, sss> - and for XML literals.
>>>
>>>Here's the MT for the datatyping, re-done in a more up-todate style: D
>>>is a datatype map, as usual.
>>>If <uri, ddd> is in D then:
>>>I(uri)=ddd;
>>>ddd is in ICEXT(I(rdf:Datatype));
>>>for any string sss, sss is in the lexical space of ddd iff
>>><L2V(ddd)(sss),sss> is in IEXT(ddd);
>>>If sss is in the lexical space of ddd then
>>>L2V(ddd)(sss) is in ICEXT(ddd)
>>>
>>>Note that being in the class is necessary but not sufficient for the
>>>datatyping rule to apply; this avoids some of the snags we had with this
>>>design previously involving subtypes. For example, we can have
>>>ex:octal rdfs:subClassOf xsd:integer .
>>>_:x ex:octal "10" .
>>>
>>>and _:x unambiguously denotes eight; in fact
>>>
>>>_:x owl:sameAs _:y .
>>>_:y xsd:integer "8" .
>>>
>>>The lexical typing only gets invoked by the datatype property; the class
>>>membership has to do with the values. Alternative lexical forms give no
>>>problem either:
>>>
>>>_:x xsd:integer "2" .
>>>_:x xsd:integer "0002" .
>>>
>>>BTW, we could now use rdfs:Literal as a generic superproperty of all
>>>datatype properties, as well as a superclass of all datatype values, so that
>>>
>>>_:x rdfs:Literal "10" .
>>>
>>>would say that _:x was some value which has "10" as a lexical form, but
>>>we don't (yet) know which one. Or, we could not do this.
>>>
>>>-----
>>>
>>>This would be a major change and would probably effect several
>>>implementations.
>>>
>>>In order to change our current design to this we would need to:
>>>1. remove typed literals (or, treat them as an abbreviations for the
>>>two-triple form, maybe?)
>>>2. remove lang tags from plain literals (or treat these as an
>>>abbreviation, similarly)
>>>3. introduce rdf:langTag (or whatever) and add prose discussing the use
>>>of lang tags as properties
>>>4. modify the datatype semantics, as above
>>>5. redefine the XML parsing rules for parseType="Literal"
>>>6. rewrite the Lbase translation appropriately
>>>
>>>I think this would mean changes to every document; it would be a fairly
>>>horrendous editing task at this stage.
>>>
>>>On the other hand, it does have a certain elegance. There is only one
>>>kind of literal, and literals are genuinely simple, both syntactically
>>>and semantically, and always denote themselves in all contexts (remember
>>>non-tidy graphs?); and it uses RDF as a descriptive language rather than
>>>extending the syntax in an XML-idiosyncratic way.
>>>
>>>We abandoned this design, as I recall, for three reasons. First, it
>>>seemed too 'indirect' and like triple-bloat. However, in our current
>>>design we have to specify the same information, and we can infer the bnode:
>>>
>>>aaa ppp "10"^^xsd:integer .
>>>|=
>>>aaa ppp _:x .
>>>
>>>compare
>>>
>>>aaa ppp _:x .
>>>_:x xsd:integer "10" .
>>>
>>>an in any case in this post-OWL era, triple-bloat seems to be rampant. I
>>>note that it would be harmless to allow the current typed-literal form
>>>as an abbreviation for the two-triple form, by the way; or even as an
>>>alternative, with inference rules to convert them back and forth. The
>>>feeling of being 'indirect' came, as I recall, from a feeling that we
>>>*ought* to be able, dammit, to write things like
>>>ex:Jill ex:age "10"
>>>rather have to go through a bnode:
>>>ex:Jill ex:age _:x .
>>>_:x xsd:integer "10" .
>>>This feeling now seems to me to have been overly naive, however, with
>>>the benefit of hindsight.
>>>
>>>Second, it seemed unintuitive to some folk to have a property and a
>>>class with the same name. I never had this trouble myself, and it seems
>>>to me to be a good illustration of the usefulness of the intensional
>>>semantics that RDF provides: if you've got it, flaunt it. [*see PS]
>>>However, the design could be modified by allowing systematic variants
>>>for the property or class names, eg using xsd:integer for the property
>>>and xsd:Integer for the class. Or we could do without the datatype
>>>classes altogether, since
>>>
>>>aaa rdf:type xsd:integer .
>>> (read: aaa is an integer)
>>>
>>>and
>>>
>>>aaa xsd:integer _:x .
>>>(read: aaa is something denoted by a numeral)
>>>
>>>convey the exact same information in {xsd:integer}-interpretations.
>>>
>>>Third, as I recall, there were some issues arising from the long-range
>>>datatyping getting too complicated. OK, Im not suggesting re-opening
>>>that particular can of worms. (Though I would note that when it does get
>>>re-opened in the future, I bet this design will be a lot more tractable
>>>than our current design, which will have to be simply shelved.)
>>>
>>>----
>>>
>>>The other i18n issue involved treating XML literals without markup as
>>>being plain text. Assuming that 'plain text' means a character string,
>>>I now think we can do that by a bit of semantic sleight of hand as
>>>follows. First, observe that any piece of XML can be encoded as a
>>>character string, but XML imposes extra equivalence (identity)
>>>conditions, such as identifying "<br />" with "<br></br>". So, consider
>>>the set of legal XML texts, considered as Unicode strings, and define an
>>>equivalence relation on this set by saying that strings with the same
>>>XML normal form are equivalent; then say that any such string denotes
>>>its equivalence class, and then in a familiar abuse of notation say that
>>>singleton classes are identical to their members. Now, any piece of XML
>>>text without any markup in it denotes itself, just as a plain literal
>>>does. (There may be some whitespace issues which make " " (two spaces)
>>>equivalent to " " (one space); if so, this will need to be stated more
>>>carefully, eg by applying the normalization only to stuff inside <->.)
>>>If we say that this is the value space of rdf:XMLLiteral, rather than
>>>the non-text 'structural' sets we have at present, then Martin might be
>>>happier.
>>>
>>>On the other hand, this supports a number of hard-to-state RDF
>>>entailments, such as intersubstituting "sss"^^xsd:string and
>>>"sss"^^rdf:XMLLiteral under circumstances which can only be recognized
>>>by an XML parser, which seems *very* ugly to include in basic RDF, so I
>>>would argue that if we do something like this then we treat
>>>rdf:XMLLiteral as a genuine datatype so that these entailments are
>>>restricted to D-interpretations and are not valid in simple RDF; and it
>>>also means that XML *with* markup denotes something very like a
>>>character string; in particular,
>>>"<"^^rdf:XMLLiteral
>>>on this proposal, has got absolutely nothing in common with
>>>"<"^^xsd:string. So maybe Martin might not be so happy after all.
>>>
>>>Anyway, thought I'd just mention it in passing.
>>>
>>>Pat
>>>
>>>PS. I thought of an interesting analogy. Literals are a kind of name,
>>>and in a simple extensional logic they would have a fixed denotation, eg
>>>numerals denote numbers, I("10")=10 (ie, ten) and so on, end of
>>>story. But RDF is intensional, and datatypes treat literals like
>>>intensional names. Seen in this way, the literal always denotes itself,
>>>ie I(literal)=literal; but it has a variable extension, *determined by
>>>the datatype context*. In other words, the datatype lexical-to-value map
>>>is a kind of extension mapping, like IEXT for properties and ICEXT for
>>>classes. Call it ILEXT-d where d is the datatype; then the 'meaning' of
>>>a literal string sss in a datatype context defined by d would be
>>>ILEXT-d(I(sss)) - compare IEXT(I(p)) or ICEXT(I(a)) where p is a
>>>property uri and a is a uri or bnode - which since I(sss) = sss is just
>>>ILEXT-d(sss), i.e. L2V(d)(sss). This is exactly what the subject bnode
>>>denotes in a datatype triple; in other words, we are using the datatype
>>>property name as a kind of explicit extension mapping on literal
>>>strings. On this view, then, what a datatype does is to fix the
>>>extension mapping for literals, considered as intensional names. The
>>>universal superproperty rdfs:Literal works the same way but refuses to
>>>supply a context, so letting the extension mapping be anything.
>>>
>>>
>>>--
>>>---------------------------------------------------------------------
>>>IHMC (850)434 8903 or (650)494 3973 home
>>>40 South Alcaniz St. (850)202 4416 office
>>>Pensacola (850)202 4440 fax
>>>FL 32501 (850)291 0667 cell
>>>phayes@ihmc.us http://www.ihmc.us/users/phayes
>>
>>------------
>>Graham Klyne
>>GK@NineByNine.org
>
>
>--
>---------------------------------------------------------------------
>IHMC (850)434 8903 or (650)494 3973 home
>40 South Alcaniz St. (850)202 4416 office
>Pensacola (850)202 4440 fax
>FL 32501 (850)291 0667 cell
>phayes@ihmc.us http://www.ihmc.us/users/phayes
------------
Graham Klyne
GK@NineByNine.org
Received on Saturday, 20 September 2003 05:49:04 UTC