- From: Graham Klyne <gk@ninebynine.org>
- Date: Thu, 18 Sep 2003 12:21:34 +0100
- To: pat hayes <phayes@ihmc.us>, w3c-rdfcore-wg@w3.org
Continuing in the spirit of airing alternative designs, not proposing them... I think Pat's approach is elegant and quite effective, and is in substantial concurrence with earlier thoughts expressed by DanC [1] and myself [2]. The main difference that I see is the proposal to represent language tags in the graph rather than as part of a literal. [1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Oct/0031.html [2] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Nov/0635.html I'm wondering if the suggestion to translate >aaa ppp "sss"@ttt . >-->> >aaa ppp _:x . >_:x xsd:string "sss" . >_:x rdf:langTag "ttt" . might be problematic in its use of xsd:string, in that this would mean that: aaa ppp "sss"@ttt . entails aaa ppp "sss" . for which there is no corresponding entailment in the current design. Maybe a simple way to avoid this is to apply the "i-default" tag (per RFC2277 - http://www.ietf.org/rfc/rfc2277.txt); e.g. so that aaa ppp "sss" . -->> aaa ppp _:x . _:x xsd:string "sss" . _:x rdf:langTag "i-default" . Thus blocking the above entailment. Hmmm, i-default is not a good choice because it suggests a human readable language, but I think a variation on this could work. .. I'm not sure that I fully concur with Pat's proposed handling of parseType=Literal, in that I don't see that, in terms of graph formation, there needs to be any different treatment from ordinary plain literals ... that is, parseType=Literal makes sense as a purely syntactic directive for processing of RDF/XML content to plain literal form. I don't think this is inconsistent with Pat's proposal, I just don't see why the parseType=Literal case needs to be drawn out specially in this way. One of the things I least like about the current design is the way that syntactic processing is not kept distinct from datatype semantics. Pat's proposal discuses treatment of rdf:XMLLiteral as a pure datatype, which seems sensible to me. Concerning: >_:x rdfs:Literal "10" . > >would say that _:x was some value which has "10" as a lexical form, but we >don't (yet) know which one. Or, we could not do this. Would this be a reasonable interpretation for rdf:value, consistent with existing usage? #g -- At 20:16 17/09/03 -0500, pat hayes wrote: >Greetings. > >Y'all are going to just LOVE me for this, but thinking about the i18n >desireables for XML has led me to the observation that one of our old and >abandoned designs for handling datatypes would handle this stuff quite >smoothly. The key point is that terms denoting datatype values are allowed >in the subject position, so attributes like language tags and lexical >'type' can be described as RDF properties. We gave up on this on the >grounds largely of triple-bloat, a concern which now seems curiously >irrelevant when one contemplates what OWL will look like. Anyway, in the >spirit of Brian's comment, > >>I've tried to be careful not to describe it as a proposal. This is an >>alternative design. I'm not proposing it, just describing it. > >here's the design. > >Plain literals are just strings, and they denote themselves. There are no >typed literals. Datatypes are indicated by class/property names. Datatype >values are typically indicated by bnodes, so instead of > >aaa ppp "sss"^^ddd . > >we write > >aaa ppp _:x . >_:x ddd "sss" . > >where the _:x denotes the datatype value. You could use URIs in some >cases, eg > >ex:PIto5places xsd:number "3.14162" . > >There is a general D-entailment > >aaa ddd "sss" . >|= >aaa rdf:type ddd . > >when sss is a legal lexical form for the datatype ddd; the version of this >for XML is an RDF entailment (though see later). > >This design, unlike our present one, has subject terms denoting datatype >values, so lang tags can be considered to be *properties of datatype >values*, and the tags themselves can be encoded as simple literals, so we >just write an assertion: > >_:x rdf:langTag "en" . > >and our current design translates thus: > >aaa ppp "sss"@ttt . >-->> >aaa ppp _:x . >_:x xsd:string "sss" . >_:x rdf:langTag "ttt" . > >Note that xsd:string is the appropriate datatype for simple literals, >providing a way to in effect put a simple literal string in the subject >position (encoded as a bnode). In fact, in this design, xsd:string is in >effect owl:sameAs applied to literals. > >---- > >This way of handling lang tags allows us to associate lang tags with XML >literals without putting the tag into the lexical space of the literal, so >allows XML literal to be a normal datatype, just as it is right now >(though read on) while also handling one of Martin's requirements. The >parsing of parseType="Literal" needs to include the asserting of an >appropriate rdf:langTag assertion in the graph, according to the XML >rules, but that seems straightforward. This design also allows sub-XML >datatypes to automatically inherit language tagging, since they will be >members of subClasses of rdf:XMLLiteral and hence of rdf:XMLliteral >itself, and hence the members of these classes will still have any >properties they had previously. Notice that the property is of the literal >*value*, rather than syntactically attached to the literal, so rdf:langTag >only makes intuitive sense for self-denoting literals, or at any rate >those which denote textual kinds of thing rather than mathematical kinds >of thing. However, there is no need to have special rules to 'ignore' lang >tags on non-textual datatypes such as numbers: an assertion like > >_:x xsd:integer "25" . >_:x rdf:langTag "en" . > >is semantically vacuous but harmless, or can be considered harmless as far >as RDF is concerned. (A lang-tag-savvy app might complain about things >like this.) Also we don't need lang tags as a syntactic attachment to >plain literals; the same trick works for plain literals. > >There isn't any general semantics for rdf:langTag, but for particular >cases it can be defined, eg we can define it for simple literals - simple >literal *values* can be pairs just as they are right now, and so >IEXT(I(rdf:langTag)) is all pairs of the form <<sss, tag>, tag> , and >IEXT(I(xsd:string)) is all pairs <<sss, tag>, sss> - and for XML literals. > >Here's the MT for the datatyping, re-done in a more up-todate style: D is >a datatype map, as usual. >If <uri, ddd> is in D then: >I(uri)=ddd; >ddd is in ICEXT(I(rdf:Datatype)); >for any string sss, sss is in the lexical space of ddd iff ><L2V(ddd)(sss),sss> is in IEXT(ddd); >If sss is in the lexical space of ddd then >L2V(ddd)(sss) is in ICEXT(ddd) > >Note that being in the class is necessary but not sufficient for the >datatyping rule to apply; this avoids some of the snags we had with this >design previously involving subtypes. For example, we can have >ex:octal rdfs:subClassOf xsd:integer . >_:x ex:octal "10" . > >and _:x unambiguously denotes eight; in fact > >_:x owl:sameAs _:y . >_:y xsd:integer "8" . > >The lexical typing only gets invoked by the datatype property; the class >membership has to do with the values. Alternative lexical forms give no >problem either: > >_:x xsd:integer "2" . >_:x xsd:integer "0002" . > >BTW, we could now use rdfs:Literal as a generic superproperty of all >datatype properties, as well as a superclass of all datatype values, so that > >_:x rdfs:Literal "10" . > >would say that _:x was some value which has "10" as a lexical form, but we >don't (yet) know which one. Or, we could not do this. > >----- > >This would be a major change and would probably effect several >implementations. > >In order to change our current design to this we would need to: >1. remove typed literals (or, treat them as an abbreviations for the >two-triple form, maybe?) >2. remove lang tags from plain literals (or treat these as an >abbreviation, similarly) >3. introduce rdf:langTag (or whatever) and add prose discussing the use of >lang tags as properties >4. modify the datatype semantics, as above >5. redefine the XML parsing rules for parseType="Literal" >6. rewrite the Lbase translation appropriately > >I think this would mean changes to every document; it would be a fairly >horrendous editing task at this stage. > >On the other hand, it does have a certain elegance. There is only one kind >of literal, and literals are genuinely simple, both syntactically and >semantically, and always denote themselves in all contexts (remember >non-tidy graphs?); and it uses RDF as a descriptive language rather than >extending the syntax in an XML-idiosyncratic way. > >We abandoned this design, as I recall, for three reasons. First, it seemed >too 'indirect' and like triple-bloat. However, in our current design we >have to specify the same information, and we can infer the bnode: > >aaa ppp "10"^^xsd:integer . >|= >aaa ppp _:x . > >compare > >aaa ppp _:x . >_:x xsd:integer "10" . > >an in any case in this post-OWL era, triple-bloat seems to be rampant. I >note that it would be harmless to allow the current typed-literal form as >an abbreviation for the two-triple form, by the way; or even as an >alternative, with inference rules to convert them back and forth. The >feeling of being 'indirect' came, as I recall, from a feeling that we >*ought* to be able, dammit, to write things like >ex:Jill ex:age "10" >rather have to go through a bnode: >ex:Jill ex:age _:x . >_:x xsd:integer "10" . >This feeling now seems to me to have been overly naive, however, with the >benefit of hindsight. > >Second, it seemed unintuitive to some folk to have a property and a class >with the same name. I never had this trouble myself, and it seems to me to >be a good illustration of the usefulness of the intensional semantics that >RDF provides: if you've got it, flaunt it. [*see PS] However, the design >could be modified by allowing systematic variants for the property or >class names, eg using xsd:integer for the property and xsd:Integer for the >class. Or we could do without the datatype classes altogether, since > >aaa rdf:type xsd:integer . > (read: aaa is an integer) > >and > >aaa xsd:integer _:x . >(read: aaa is something denoted by a numeral) > >convey the exact same information in {xsd:integer}-interpretations. > >Third, as I recall, there were some issues arising from the long-range >datatyping getting too complicated. OK, Im not suggesting re-opening that >particular can of worms. (Though I would note that when it does get >re-opened in the future, I bet this design will be a lot more tractable >than our current design, which will have to be simply shelved.) > >---- > >The other i18n issue involved treating XML literals without markup as >being plain text. Assuming that 'plain text' means a character string, I >now think we can do that by a bit of semantic sleight of hand as follows. >First, observe that any piece of XML can be encoded as a character string, >but XML imposes extra equivalence (identity) conditions, such as >identifying "<br />" with "<br></br>". So, consider the set of legal XML >texts, considered as Unicode strings, and define an equivalence relation >on this set by saying that strings with the same XML normal form are >equivalent; then say that any such string denotes its equivalence class, >and then in a familiar abuse of notation say that singleton classes are >identical to their members. Now, any piece of XML text without any markup >in it denotes itself, just as a plain literal does. (There may be some >whitespace issues which make " " (two spaces) equivalent to " " (one >space); if so, this will need to be stated more carefully, eg by applying >the normalization only to stuff inside <->.) If we say that this is the >value space of rdf:XMLLiteral, rather than the non-text 'structural' sets >we have at present, then Martin might be happier. > >On the other hand, this supports a number of hard-to-state RDF >entailments, such as intersubstituting "sss"^^xsd:string and >"sss"^^rdf:XMLLiteral under circumstances which can only be recognized by >an XML parser, which seems *very* ugly to include in basic RDF, so I would >argue that if we do something like this then we treat rdf:XMLLiteral as a >genuine datatype so that these entailments are restricted to >D-interpretations and are not valid in simple RDF; and it also means that >XML *with* markup denotes something very like a character string; in >particular, >"<"^^rdf:XMLLiteral >on this proposal, has got absolutely nothing in common with >"<"^^xsd:string. So maybe Martin might not be so happy after all. > >Anyway, thought I'd just mention it in passing. > >Pat > >PS. I thought of an interesting analogy. Literals are a kind of name, and >in a simple extensional logic they would have a fixed denotation, eg >numerals denote numbers, I("10")=10 (ie, ten) and so on, end of >story. But RDF is intensional, and datatypes treat literals like >intensional names. Seen in this way, the literal always denotes itself, ie >I(literal)=literal; but it has a variable extension, *determined by the >datatype context*. In other words, the datatype lexical-to-value map is a >kind of extension mapping, like IEXT for properties and ICEXT for >classes. Call it ILEXT-d where d is the datatype; then the 'meaning' of a >literal string sss in a datatype context defined by d would be >ILEXT-d(I(sss)) - compare IEXT(I(p)) or ICEXT(I(a)) where p is a property >uri and a is a uri or bnode - which since I(sss) = sss is just >ILEXT-d(sss), i.e. L2V(d)(sss). This is exactly what the subject bnode >denotes in a datatype triple; in other words, we are using the datatype >property name as a kind of explicit extension mapping on literal strings. >On this view, then, what a datatype does is to fix the extension mapping >for literals, considered as intensional names. The universal >superproperty rdfs:Literal works the same way but refuses to supply a >context, so letting the extension mapping be anything. > > >-- >--------------------------------------------------------------------- >IHMC (850)434 8903 or (650)494 3973 home >40 South Alcaniz St. (850)202 4416 office >Pensacola (850)202 4440 fax >FL 32501 (850)291 0667 cell >phayes@ihmc.us http://www.ihmc.us/users/phayes ------------ Graham Klyne GK@NineByNine.org
Received on Thursday, 18 September 2003 08:06:27 UTC