- From: Graham Klyne <gk@ninebynine.org>
- Date: Fri, 19 Sep 2003 19:57:30 +0100
- To: pat hayes <phayes@ihmc.us>
- Cc: w3c-rdfcore-wg@w3.org
At 12:37 18/09/03 -0500, pat hayes wrote: >Technically, but it makes the whole thing unworkable, I think. If the tag >assertion is compulsory then the tags will break conventional datatyping >and we would be better off with the current design. OK, I was having doubts about having language as an additional property. But I think other aspects of your design may stand up to examination, even if the language tag reverts to being part of the abstract syntax for a literal. #g -- At 12:37 18/09/03 -0500, pat hayes wrote: >>Continuing in the spirit of airing alternative designs, not proposing them... >> >>I think Pat's approach is elegant and quite effective, and is in >>substantial concurrence with earlier thoughts expressed by DanC [1] and >>myself [2]. The main difference that I see is the proposal to represent >>language tags in the graph rather than as part of a literal. >> >>[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Oct/0031.html >> >>[2] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Nov/0635.html >> >>I'm wondering if the suggestion to translate >> >>>aaa ppp "sss"@ttt . >>>-->> >>>aaa ppp _:x . >>>_:x xsd:string "sss" . >>>_:x rdf:langTag "ttt" . >> >>might be problematic in its use of xsd:string, in that this would mean that: >> >> aaa ppp "sss"@ttt . >>entails >> aaa ppp "sss" . >> >>for which there is no corresponding entailment in the current design. > >Ah, indeed I had not noticed that. I think this will happen with or >without xsd:string, actually: it has to do with the fact that the tag is >now a property, so can be omitted in the description, so a description of >a simple literal without a tag is indistinguishable from an incomplete >description of a simple literal with an unknown tag. That is ugly and may >be fatal. > >> Maybe a simple way to avoid this is to apply the "i-default" tag (per >> RFC2277 - http://www.ietf.org/rfc/rfc2277.txt); e.g. so that >> >>aaa ppp "sss" . >>-->> >>aaa ppp _:x . >>_:x xsd:string "sss" . >>_:x rdf:langTag "i-default" . >> >>Thus blocking the above entailment. Hmmm, i-default is not a good choice >>because it suggests a human readable language, but I think a variation on >>this could work. > >Technically, but it makes the whole thing unworkable, I think. If the tag >assertion is compulsory then the tags will break conventional datatyping >and we would be better off with the current design. > >Pat > >>.. >> >>I'm not sure that I fully concur with Pat's proposed handling of >>parseType=Literal, in that I don't see that, in terms of graph formation, >>there needs to be any different treatment from ordinary plain literals >>... that is, parseType=Literal makes sense as a purely syntactic >>directive for processing of RDF/XML content to plain literal form. I >>don't think this is inconsistent with Pat's proposal, I just don't see >>why the parseType=Literal case needs to be drawn out specially in this >>way. One of the things I least like about the current design is the way >>that syntactic processing is not kept distinct from datatype >>semantics. Pat's proposal discuses treatment of rdf:XMLLiteral as a pure >>datatype, which seems sensible to me. >> >>Concerning: >>>_:x rdfs:Literal "10" . >>> >>>would say that _:x was some value which has "10" as a lexical form, but >>>we don't (yet) know which one. Or, we could not do this. >> >>Would this be a reasonable interpretation for rdf:value, consistent with >>existing usage? >> >>#g >>-- >> >>At 20:16 17/09/03 -0500, pat hayes wrote: >> >>>Greetings. >>> >>>Y'all are going to just LOVE me for this, but thinking about the i18n >>>desireables for XML has led me to the observation that one of our old >>>and abandoned designs for handling datatypes would handle this stuff >>>quite smoothly. The key point is that terms denoting datatype values are >>>allowed in the subject position, so attributes like language tags and >>>lexical 'type' can be described as RDF properties. We gave up on this on >>>the grounds largely of triple-bloat, a concern which now seems curiously >>>irrelevant when one contemplates what OWL will look like. Anyway, in >>>the spirit of Brian's comment, >>> >>>>I've tried to be careful not to describe it as a proposal. This is an >>>>alternative design. I'm not proposing it, just describing it. >>> >>>here's the design. >>> >>>Plain literals are just strings, and they denote themselves. There are >>>no typed literals. Datatypes are indicated by class/property names. >>>Datatype values are typically indicated by bnodes, so instead of >>> >>>aaa ppp "sss"^^ddd . >>> >>>we write >>> >>>aaa ppp _:x . >>>_:x ddd "sss" . >>> >>>where the _:x denotes the datatype value. You could use URIs in some >>>cases, eg >>> >>>ex:PIto5places xsd:number "3.14162" . >>> >>>There is a general D-entailment >>> >>>aaa ddd "sss" . >>>|= >>>aaa rdf:type ddd . >>> >>>when sss is a legal lexical form for the datatype ddd; the version of >>>this for XML is an RDF entailment (though see later). >>> >>>This design, unlike our present one, has subject terms denoting datatype >>>values, so lang tags can be considered to be *properties of datatype >>>values*, and the tags themselves can be encoded as simple literals, so >>>we just write an assertion: >>> >>>_:x rdf:langTag "en" . >>> >>>and our current design translates thus: >>> >>>aaa ppp "sss"@ttt . >>>-->> >>>aaa ppp _:x . >>>_:x xsd:string "sss" . >>>_:x rdf:langTag "ttt" . >>> >>>Note that xsd:string is the appropriate datatype for simple literals, >>>providing a way to in effect put a simple literal string in the subject >>>position (encoded as a bnode). In fact, in this design, xsd:string is in >>>effect owl:sameAs applied to literals. >>> >>>---- >>> >>>This way of handling lang tags allows us to associate lang tags with XML >>>literals without putting the tag into the lexical space of the literal, >>>so allows XML literal to be a normal datatype, just as it is right now >>>(though read on) while also handling one of Martin's requirements. The >>>parsing of parseType="Literal" needs to include the asserting of an >>>appropriate rdf:langTag assertion in the graph, according to the XML >>>rules, but that seems straightforward. This design also allows sub-XML >>>datatypes to automatically inherit language tagging, since they will be >>>members of subClasses of rdf:XMLLiteral and hence of rdf:XMLliteral >>>itself, and hence the members of these classes will still have any >>>properties they had previously. Notice that the property is of the >>>literal *value*, rather than syntactically attached to the literal, so >>>rdf:langTag only makes intuitive sense for self-denoting literals, or at >>>any rate those which denote textual kinds of thing rather than >>>mathematical kinds of thing. However, there is no need to have special >>>rules to 'ignore' lang tags on non-textual datatypes such as numbers: an >>>assertion like >>> >>>_:x xsd:integer "25" . >>>_:x rdf:langTag "en" . >>> >>>is semantically vacuous but harmless, or can be considered harmless as >>>far as RDF is concerned. (A lang-tag-savvy app might complain about >>>things like this.) Also we don't need lang tags as a syntactic >>>attachment to plain literals; the same trick works for plain literals. >>> >>>There isn't any general semantics for rdf:langTag, but for particular >>>cases it can be defined, eg we can define it for simple literals - >>>simple literal *values* can be pairs just as they are right now, and so >>>IEXT(I(rdf:langTag)) is all pairs of the form <<sss, tag>, tag> , and >>>IEXT(I(xsd:string)) is all pairs <<sss, tag>, sss> - and for XML literals. >>> >>>Here's the MT for the datatyping, re-done in a more up-todate style: D >>>is a datatype map, as usual. >>>If <uri, ddd> is in D then: >>>I(uri)=ddd; >>>ddd is in ICEXT(I(rdf:Datatype)); >>>for any string sss, sss is in the lexical space of ddd iff >>><L2V(ddd)(sss),sss> is in IEXT(ddd); >>>If sss is in the lexical space of ddd then >>>L2V(ddd)(sss) is in ICEXT(ddd) >>> >>>Note that being in the class is necessary but not sufficient for the >>>datatyping rule to apply; this avoids some of the snags we had with this >>>design previously involving subtypes. For example, we can have >>>ex:octal rdfs:subClassOf xsd:integer . >>>_:x ex:octal "10" . >>> >>>and _:x unambiguously denotes eight; in fact >>> >>>_:x owl:sameAs _:y . >>>_:y xsd:integer "8" . >>> >>>The lexical typing only gets invoked by the datatype property; the class >>>membership has to do with the values. Alternative lexical forms give no >>>problem either: >>> >>>_:x xsd:integer "2" . >>>_:x xsd:integer "0002" . >>> >>>BTW, we could now use rdfs:Literal as a generic superproperty of all >>>datatype properties, as well as a superclass of all datatype values, so that >>> >>>_:x rdfs:Literal "10" . >>> >>>would say that _:x was some value which has "10" as a lexical form, but >>>we don't (yet) know which one. Or, we could not do this. >>> >>>----- >>> >>>This would be a major change and would probably effect several >>>implementations. >>> >>>In order to change our current design to this we would need to: >>>1. remove typed literals (or, treat them as an abbreviations for the >>>two-triple form, maybe?) >>>2. remove lang tags from plain literals (or treat these as an >>>abbreviation, similarly) >>>3. introduce rdf:langTag (or whatever) and add prose discussing the use >>>of lang tags as properties >>>4. modify the datatype semantics, as above >>>5. redefine the XML parsing rules for parseType="Literal" >>>6. rewrite the Lbase translation appropriately >>> >>>I think this would mean changes to every document; it would be a fairly >>>horrendous editing task at this stage. >>> >>>On the other hand, it does have a certain elegance. There is only one >>>kind of literal, and literals are genuinely simple, both syntactically >>>and semantically, and always denote themselves in all contexts (remember >>>non-tidy graphs?); and it uses RDF as a descriptive language rather than >>>extending the syntax in an XML-idiosyncratic way. >>> >>>We abandoned this design, as I recall, for three reasons. First, it >>>seemed too 'indirect' and like triple-bloat. However, in our current >>>design we have to specify the same information, and we can infer the bnode: >>> >>>aaa ppp "10"^^xsd:integer . >>>|= >>>aaa ppp _:x . >>> >>>compare >>> >>>aaa ppp _:x . >>>_:x xsd:integer "10" . >>> >>>an in any case in this post-OWL era, triple-bloat seems to be rampant. I >>>note that it would be harmless to allow the current typed-literal form >>>as an abbreviation for the two-triple form, by the way; or even as an >>>alternative, with inference rules to convert them back and forth. The >>>feeling of being 'indirect' came, as I recall, from a feeling that we >>>*ought* to be able, dammit, to write things like >>>ex:Jill ex:age "10" >>>rather have to go through a bnode: >>>ex:Jill ex:age _:x . >>>_:x xsd:integer "10" . >>>This feeling now seems to me to have been overly naive, however, with >>>the benefit of hindsight. >>> >>>Second, it seemed unintuitive to some folk to have a property and a >>>class with the same name. I never had this trouble myself, and it seems >>>to me to be a good illustration of the usefulness of the intensional >>>semantics that RDF provides: if you've got it, flaunt it. [*see PS] >>>However, the design could be modified by allowing systematic variants >>>for the property or class names, eg using xsd:integer for the property >>>and xsd:Integer for the class. Or we could do without the datatype >>>classes altogether, since >>> >>>aaa rdf:type xsd:integer . >>> (read: aaa is an integer) >>> >>>and >>> >>>aaa xsd:integer _:x . >>>(read: aaa is something denoted by a numeral) >>> >>>convey the exact same information in {xsd:integer}-interpretations. >>> >>>Third, as I recall, there were some issues arising from the long-range >>>datatyping getting too complicated. OK, Im not suggesting re-opening >>>that particular can of worms. (Though I would note that when it does get >>>re-opened in the future, I bet this design will be a lot more tractable >>>than our current design, which will have to be simply shelved.) >>> >>>---- >>> >>>The other i18n issue involved treating XML literals without markup as >>>being plain text. Assuming that 'plain text' means a character string, >>>I now think we can do that by a bit of semantic sleight of hand as >>>follows. First, observe that any piece of XML can be encoded as a >>>character string, but XML imposes extra equivalence (identity) >>>conditions, such as identifying "<br />" with "<br></br>". So, consider >>>the set of legal XML texts, considered as Unicode strings, and define an >>>equivalence relation on this set by saying that strings with the same >>>XML normal form are equivalent; then say that any such string denotes >>>its equivalence class, and then in a familiar abuse of notation say that >>>singleton classes are identical to their members. Now, any piece of XML >>>text without any markup in it denotes itself, just as a plain literal >>>does. (There may be some whitespace issues which make " " (two spaces) >>>equivalent to " " (one space); if so, this will need to be stated more >>>carefully, eg by applying the normalization only to stuff inside <->.) >>>If we say that this is the value space of rdf:XMLLiteral, rather than >>>the non-text 'structural' sets we have at present, then Martin might be >>>happier. >>> >>>On the other hand, this supports a number of hard-to-state RDF >>>entailments, such as intersubstituting "sss"^^xsd:string and >>>"sss"^^rdf:XMLLiteral under circumstances which can only be recognized >>>by an XML parser, which seems *very* ugly to include in basic RDF, so I >>>would argue that if we do something like this then we treat >>>rdf:XMLLiteral as a genuine datatype so that these entailments are >>>restricted to D-interpretations and are not valid in simple RDF; and it >>>also means that XML *with* markup denotes something very like a >>>character string; in particular, >>>"<"^^rdf:XMLLiteral >>>on this proposal, has got absolutely nothing in common with >>>"<"^^xsd:string. So maybe Martin might not be so happy after all. >>> >>>Anyway, thought I'd just mention it in passing. >>> >>>Pat >>> >>>PS. I thought of an interesting analogy. Literals are a kind of name, >>>and in a simple extensional logic they would have a fixed denotation, eg >>>numerals denote numbers, I("10")=10 (ie, ten) and so on, end of >>>story. But RDF is intensional, and datatypes treat literals like >>>intensional names. Seen in this way, the literal always denotes itself, >>>ie I(literal)=literal; but it has a variable extension, *determined by >>>the datatype context*. In other words, the datatype lexical-to-value map >>>is a kind of extension mapping, like IEXT for properties and ICEXT for >>>classes. Call it ILEXT-d where d is the datatype; then the 'meaning' of >>>a literal string sss in a datatype context defined by d would be >>>ILEXT-d(I(sss)) - compare IEXT(I(p)) or ICEXT(I(a)) where p is a >>>property uri and a is a uri or bnode - which since I(sss) = sss is just >>>ILEXT-d(sss), i.e. L2V(d)(sss). This is exactly what the subject bnode >>>denotes in a datatype triple; in other words, we are using the datatype >>>property name as a kind of explicit extension mapping on literal >>>strings. On this view, then, what a datatype does is to fix the >>>extension mapping for literals, considered as intensional names. The >>>universal superproperty rdfs:Literal works the same way but refuses to >>>supply a context, so letting the extension mapping be anything. >>> >>> >>>-- >>>--------------------------------------------------------------------- >>>IHMC (850)434 8903 or (650)494 3973 home >>>40 South Alcaniz St. (850)202 4416 office >>>Pensacola (850)202 4440 fax >>>FL 32501 (850)291 0667 cell >>>phayes@ihmc.us http://www.ihmc.us/users/phayes >> >>------------ >>Graham Klyne >>GK@NineByNine.org > > >-- >--------------------------------------------------------------------- >IHMC (850)434 8903 or (650)494 3973 home >40 South Alcaniz St. (850)202 4416 office >Pensacola (850)202 4440 fax >FL 32501 (850)291 0667 cell >phayes@ihmc.us http://www.ihmc.us/users/phayes ------------ Graham Klyne GK@NineByNine.org
Received on Saturday, 20 September 2003 05:49:04 UTC