- From: pat hayes <phayes@ihmc.us>
- Date: Thu, 18 Sep 2003 12:37:02 -0500
- To: Graham Klyne <gk@ninebynine.org>
- Cc: w3c-rdfcore-wg@w3.org
>Continuing in the spirit of airing alternative designs, not proposing them... > >I think Pat's approach is elegant and quite effective, and is in >substantial concurrence with earlier thoughts expressed by DanC [1] >and myself [2]. The main difference that I see is the proposal to >represent language tags in the graph rather than as part of a >literal. > >[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Oct/0031.html > >[2] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Nov/0635.html > >I'm wondering if the suggestion to translate > >>aaa ppp "sss"@ttt . >>-->> >>aaa ppp _:x . >>_:x xsd:string "sss" . >>_:x rdf:langTag "ttt" . > >might be problematic in its use of xsd:string, in that this would mean that: > > aaa ppp "sss"@ttt . >entails > aaa ppp "sss" . > >for which there is no corresponding entailment in the current design. Ah, indeed I had not noticed that. I think this will happen with or without xsd:string, actually: it has to do with the fact that the tag is now a property, so can be omitted in the description, so a description of a simple literal without a tag is indistinguishable from an incomplete description of a simple literal with an unknown tag. That is ugly and may be fatal. > Maybe a simple way to avoid this is to apply the "i-default" tag >(per RFC2277 - http://www.ietf.org/rfc/rfc2277.txt); e.g. so that > >aaa ppp "sss" . >-->> >aaa ppp _:x . >_:x xsd:string "sss" . >_:x rdf:langTag "i-default" . > >Thus blocking the above entailment. Hmmm, i-default is not a good >choice because it suggests a human readable language, but I think a >variation on this could work. Technically, but it makes the whole thing unworkable, I think. If the tag assertion is compulsory then the tags will break conventional datatyping and we would be better off with the current design. Pat >.. > >I'm not sure that I fully concur with Pat's proposed handling of >parseType=Literal, in that I don't see that, in terms of graph >formation, there needs to be any different treatment from ordinary >plain literals ... that is, parseType=Literal makes sense as a >purely syntactic directive for processing of RDF/XML content to >plain literal form. I don't think this is inconsistent with Pat's >proposal, I just don't see why the parseType=Literal case needs to >be drawn out specially in this way. One of the things I least like >about the current design is the way that syntactic processing is not >kept distinct from datatype semantics. Pat's proposal discuses >treatment of rdf:XMLLiteral as a pure datatype, which seems sensible >to me. > >Concerning: >>_:x rdfs:Literal "10" . >> >>would say that _:x was some value which has "10" as a lexical form, >>but we don't (yet) know which one. Or, we could not do this. > >Would this be a reasonable interpretation for rdf:value, consistent >with existing usage? > >#g >-- > >At 20:16 17/09/03 -0500, pat hayes wrote: > >>Greetings. >> >>Y'all are going to just LOVE me for this, but thinking about the >>i18n desireables for XML has led me to the observation that one of >>our old and abandoned designs for handling datatypes would handle >>this stuff quite smoothly. The key point is that terms denoting >>datatype values are allowed in the subject position, so attributes >>like language tags and lexical 'type' can be described as RDF >>properties. We gave up on this on the grounds largely of >>triple-bloat, a concern which now seems curiously irrelevant when >>one contemplates what OWL will look like. Anyway, in the spirit of >>Brian's comment, >> >>>I've tried to be careful not to describe it as a proposal. This is an >>>alternative design. I'm not proposing it, just describing it. >> >>here's the design. >> >>Plain literals are just strings, and they denote themselves. There >>are no typed literals. Datatypes are indicated by class/property >>names. Datatype values are typically indicated by bnodes, so >>instead of >> >>aaa ppp "sss"^^ddd . >> >>we write >> >>aaa ppp _:x . >>_:x ddd "sss" . >> >>where the _:x denotes the datatype value. You could use URIs in >>some cases, eg >> >>ex:PIto5places xsd:number "3.14162" . >> >>There is a general D-entailment >> >>aaa ddd "sss" . >>|= >>aaa rdf:type ddd . >> >>when sss is a legal lexical form for the datatype ddd; the version >>of this for XML is an RDF entailment (though see later). >> >>This design, unlike our present one, has subject terms denoting >>datatype values, so lang tags can be considered to be *properties >>of datatype values*, and the tags themselves can be encoded as >>simple literals, so we just write an assertion: >> >>_:x rdf:langTag "en" . >> >>and our current design translates thus: >> >>aaa ppp "sss"@ttt . >>-->> >>aaa ppp _:x . >>_:x xsd:string "sss" . >>_:x rdf:langTag "ttt" . >> >>Note that xsd:string is the appropriate datatype for simple >>literals, providing a way to in effect put a simple literal string >>in the subject position (encoded as a bnode). In fact, in this >>design, xsd:string is in effect owl:sameAs applied to literals. >> >>---- >> >>This way of handling lang tags allows us to associate lang tags >>with XML literals without putting the tag into the lexical space of >>the literal, so allows XML literal to be a normal datatype, just as >>it is right now (though read on) while also handling one of >>Martin's requirements. The parsing of parseType="Literal" needs to >>include the asserting of an appropriate rdf:langTag assertion in >>the graph, according to the XML rules, but that seems >>straightforward. This design also allows sub-XML datatypes to >>automatically inherit language tagging, since they will be members >>of subClasses of rdf:XMLLiteral and hence of rdf:XMLliteral itself, >>and hence the members of these classes will still have any >>properties they had previously. Notice that the property is of the >>literal *value*, rather than syntactically attached to the literal, >>so rdf:langTag only makes intuitive sense for self-denoting >>literals, or at any rate those which denote textual kinds of thing >>rather than mathematical kinds of thing. However, there is no need >>to have special rules to 'ignore' lang tags on non-textual >>datatypes such as numbers: an assertion like >> >>_:x xsd:integer "25" . >>_:x rdf:langTag "en" . >> >>is semantically vacuous but harmless, or can be considered harmless >>as far as RDF is concerned. (A lang-tag-savvy app might complain >>about things like this.) Also we don't need lang tags as a >>syntactic attachment to plain literals; the same trick works for >>plain literals. >> >>There isn't any general semantics for rdf:langTag, but for >>particular cases it can be defined, eg we can define it for simple >>literals - simple literal *values* can be pairs just as they are >>right now, and so IEXT(I(rdf:langTag)) is all pairs of the form >><<sss, tag>, tag> , and IEXT(I(xsd:string)) is all pairs <<sss, >>tag>, sss> - and for XML literals. >> >>Here's the MT for the datatyping, re-done in a more up-todate >>style: D is a datatype map, as usual. >>If <uri, ddd> is in D then: >>I(uri)=ddd; >>ddd is in ICEXT(I(rdf:Datatype)); >>for any string sss, sss is in the lexical space of ddd iff >><L2V(ddd)(sss),sss> is in IEXT(ddd); >>If sss is in the lexical space of ddd then >>L2V(ddd)(sss) is in ICEXT(ddd) >> >>Note that being in the class is necessary but not sufficient for >>the datatyping rule to apply; this avoids some of the snags we had >>with this design previously involving subtypes. For example, we can >>have >>ex:octal rdfs:subClassOf xsd:integer . >>_:x ex:octal "10" . >> >>and _:x unambiguously denotes eight; in fact >> >>_:x owl:sameAs _:y . >>_:y xsd:integer "8" . >> >>The lexical typing only gets invoked by the datatype property; the >>class membership has to do with the values. Alternative lexical >>forms give no problem either: >> >>_:x xsd:integer "2" . >>_:x xsd:integer "0002" . >> >>BTW, we could now use rdfs:Literal as a generic superproperty of >>all datatype properties, as well as a superclass of all datatype >>values, so that >> >>_:x rdfs:Literal "10" . >> >>would say that _:x was some value which has "10" as a lexical form, >>but we don't (yet) know which one. Or, we could not do this. >> >>----- >> >>This would be a major change and would probably effect several >>implementations. >> >>In order to change our current design to this we would need to: >>1. remove typed literals (or, treat them as an abbreviations for >>the two-triple form, maybe?) >>2. remove lang tags from plain literals (or treat these as an >>abbreviation, similarly) >>3. introduce rdf:langTag (or whatever) and add prose discussing the >>use of lang tags as properties >>4. modify the datatype semantics, as above >>5. redefine the XML parsing rules for parseType="Literal" >>6. rewrite the Lbase translation appropriately >> >>I think this would mean changes to every document; it would be a >>fairly horrendous editing task at this stage. >> >>On the other hand, it does have a certain elegance. There is only >>one kind of literal, and literals are genuinely simple, both >>syntactically and semantically, and always denote themselves in all >>contexts (remember non-tidy graphs?); and it uses RDF as a >>descriptive language rather than extending the syntax in an >>XML-idiosyncratic way. >> >>We abandoned this design, as I recall, for three reasons. First, it >>seemed too 'indirect' and like triple-bloat. However, in our >>current design we have to specify the same information, and we can >>infer the bnode: >> >>aaa ppp "10"^^xsd:integer . >>|= >>aaa ppp _:x . >> >>compare >> >>aaa ppp _:x . >>_:x xsd:integer "10" . >> >>an in any case in this post-OWL era, triple-bloat seems to be >>rampant. I note that it would be harmless to allow the current >>typed-literal form as an abbreviation for the two-triple form, by >>the way; or even as an alternative, with inference rules to convert >>them back and forth. The feeling of being 'indirect' came, as I >>recall, from a feeling that we *ought* to be able, dammit, to write >>things like >>ex:Jill ex:age "10" >>rather have to go through a bnode: >>ex:Jill ex:age _:x . >>_:x xsd:integer "10" . >>This feeling now seems to me to have been overly naive, however, >>with the benefit of hindsight. >> >>Second, it seemed unintuitive to some folk to have a property and a >>class with the same name. I never had this trouble myself, and it >>seems to me to be a good illustration of the usefulness of the >>intensional semantics that RDF provides: if you've got it, flaunt >>it. [*see PS] However, the design could be modified by allowing >>systematic variants for the property or class names, eg using >>xsd:integer for the property and xsd:Integer for the class. Or we >>could do without the datatype classes altogether, since >> >>aaa rdf:type xsd:integer . >> (read: aaa is an integer) >> >>and >> >>aaa xsd:integer _:x . >>(read: aaa is something denoted by a numeral) >> >>convey the exact same information in {xsd:integer}-interpretations. >> >>Third, as I recall, there were some issues arising from the >>long-range datatyping getting too complicated. OK, Im not >>suggesting re-opening that particular can of worms. (Though I would >>note that when it does get re-opened in the future, I bet this >>design will be a lot more tractable than our current design, which >>will have to be simply shelved.) >> >>---- >> >>The other i18n issue involved treating XML literals without markup >>as being plain text. Assuming that 'plain text' means a character >>string, I now think we can do that by a bit of semantic sleight of >>hand as follows. First, observe that any piece of XML can be >>encoded as a character string, but XML imposes extra equivalence >>(identity) conditions, such as identifying "<br />" with >>"<br></br>". So, consider the set of legal XML texts, considered as >>Unicode strings, and define an equivalence relation on this set by >>saying that strings with the same XML normal form are equivalent; >>then say that any such string denotes its equivalence class, and >>then in a familiar abuse of notation say that singleton classes are >>identical to their members. Now, any piece of XML text without any >>markup in it denotes itself, just as a plain literal does. (There >>may be some whitespace issues which make " " (two spaces) >>equivalent to " " (one space); if so, this will need to be stated >>more carefully, eg by applying the normalization only to stuff >>inside <->.) If we say that this is the value space of >>rdf:XMLLiteral, rather than the non-text 'structural' sets we have >>at present, then Martin might be happier. >> >>On the other hand, this supports a number of hard-to-state RDF >>entailments, such as intersubstituting "sss"^^xsd:string and >>"sss"^^rdf:XMLLiteral under circumstances which can only be >>recognized by an XML parser, which seems *very* ugly to include in >>basic RDF, so I would argue that if we do something like this then >>we treat rdf:XMLLiteral as a genuine datatype so that these >>entailments are restricted to D-interpretations and are not valid >>in simple RDF; and it also means that XML *with* markup denotes >>something very like a character string; in particular, >>"<"^^rdf:XMLLiteral >>on this proposal, has got absolutely nothing in common with >>"<"^^xsd:string. So maybe Martin might not be so happy after all. >> >>Anyway, thought I'd just mention it in passing. >> >>Pat >> >>PS. I thought of an interesting analogy. Literals are a kind of >>name, and in a simple extensional logic they would have a fixed >>denotation, eg numerals denote numbers, I("10")=10 (ie, ten) and so >>on, end of story. But RDF is intensional, and datatypes treat >>literals like intensional names. Seen in this way, the literal >>always denotes itself, ie I(literal)=literal; but it has a variable >>extension, *determined by the datatype context*. In other words, >>the datatype lexical-to-value map is a kind of extension mapping, >>like IEXT for properties and ICEXT for classes. Call it ILEXT-d >>where d is the datatype; then the 'meaning' of a literal string sss >>in a datatype context defined by d would be ILEXT-d(I(sss)) - >>compare IEXT(I(p)) or ICEXT(I(a)) where p is a property uri and a >>is a uri or bnode - which since I(sss) = sss is just ILEXT-d(sss), >>i.e. L2V(d)(sss). This is exactly what the subject bnode denotes >>in a datatype triple; in other words, we are using the datatype >>property name as a kind of explicit extension mapping on literal >>strings. On this view, then, what a datatype does is to fix the >>extension mapping for literals, considered as intensional names. >>The universal superproperty rdfs:Literal works the same way but >>refuses to supply a context, so letting the extension mapping be >>anything. >> >> >>-- >>--------------------------------------------------------------------- >>IHMC (850)434 8903 or (650)494 3973 home >>40 South Alcaniz St. (850)202 4416 office >>Pensacola (850)202 4440 fax >>FL 32501 (850)291 0667 cell >>phayes@ihmc.us http://www.ihmc.us/users/phayes > >------------ >Graham Klyne >GK@NineByNine.org -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32501 (850)291 0667 cell phayes@ihmc.us http://www.ihmc.us/users/phayes
Received on Thursday, 18 September 2003 13:37:05 UTC