Re: I18N Issue alternative: a passing thought. from Graham Klyne on 2003-09-19 (w3c-rdfcore-wg@w3.org from September 2003)

From: Graham Klyne <gk@ninebynine.org>
Date: Fri, 19 Sep 2003 19:57:30 +0100
To: pat hayes <phayes@ihmc.us>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <5.1.0.14.2.20030919195519.021b3870@127.0.0.1>
At 12:37 18/09/03 -0500, pat hayes wrote:
>Technically, but it makes the whole thing unworkable, I think. If the tag 
>assertion is compulsory then the tags will break conventional datatyping 
>and we would be better off with the current design.

OK, I was having doubts about having language as an additional 
property.  But I think other aspects of your design may stand up to 
examination, even if the language tag reverts to being part of the abstract 
syntax for a literal.

#g
--


At 12:37 18/09/03 -0500, pat hayes wrote:
>>Continuing in the spirit of airing alternative designs, not proposing them...
>>
>>I think Pat's approach is elegant and quite effective, and is in 
>>substantial concurrence with earlier thoughts expressed by DanC [1] and 
>>myself [2].  The main difference that I see is the proposal to represent 
>>language tags in the graph rather than as part of a literal.
>>
>>[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Oct/0031.html
>>
>>[2] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Nov/0635.html
>>
>>I'm wondering if the suggestion to translate
>>
>>>aaa ppp "sss"@ttt .
>>>-->>
>>>aaa ppp _:x .
>>>_:x xsd:string "sss" .
>>>_:x rdf:langTag "ttt" .
>>
>>might be problematic in its use of xsd:string, in that this would mean that:
>>
>>   aaa ppp "sss"@ttt .
>>entails
>>   aaa ppp "sss" .
>>
>>for which there is no corresponding entailment in the current design.
>
>Ah, indeed I had not noticed that.  I think this will happen with or 
>without xsd:string, actually: it has to do with the fact that the tag is 
>now a property, so can be omitted in the description, so a description of 
>a simple literal without a tag is indistinguishable from an incomplete 
>description of a simple literal with an unknown tag. That is ugly and may 
>be fatal.
>
>>  Maybe a simple way to avoid this is to apply the "i-default" tag (per 
>> RFC2277 - http://www.ietf.org/rfc/rfc2277.txt); e.g. so that
>>
>>aaa ppp "sss" .
>>-->>
>>aaa ppp _:x .
>>_:x xsd:string "sss" .
>>_:x rdf:langTag "i-default" .
>>
>>Thus blocking the above entailment.  Hmmm, i-default is not a good choice 
>>because it suggests a human readable language, but I think a variation on 
>>this could work.
>
>Technically, but it makes the whole thing unworkable, I think. If the tag 
>assertion is compulsory then the tags will break conventional datatyping 
>and we would be better off with the current design.
>
>Pat
>
>>..
>>
>>I'm not sure that I fully concur with Pat's proposed handling of 
>>parseType=Literal, in that I don't see that, in terms of graph formation, 
>>there needs to be any different treatment from ordinary plain literals 
>>... that is, parseType=Literal makes sense as a purely syntactic 
>>directive for processing of RDF/XML content to plain literal form.  I 
>>don't think this is inconsistent with Pat's proposal, I just don't see 
>>why the parseType=Literal case needs to be drawn out specially in this 
>>way.  One of the things I least like about the current design is the way 
>>that syntactic processing is not kept distinct from datatype 
>>semantics.  Pat's proposal discuses treatment of rdf:XMLLiteral as a pure 
>>datatype, which seems sensible to me.
>>
>>Concerning:
>>>_:x rdfs:Literal "10" .
>>>
>>>would say that _:x was some value which has "10" as a lexical form, but 
>>>we don't (yet) know which one. Or, we could not do this.
>>
>>Would this be a reasonable interpretation for rdf:value, consistent with 
>>existing usage?
>>
>>#g
>>--
>>
>>At 20:16 17/09/03 -0500, pat hayes wrote:
>>
>>>Greetings.
>>>
>>>Y'all are going to just LOVE me for this, but thinking about the i18n 
>>>desireables for XML has led me to the observation that one of our old 
>>>and abandoned designs for handling datatypes would handle this stuff 
>>>quite smoothly. The key point is that terms denoting datatype values are 
>>>allowed in the subject position, so attributes like language tags and 
>>>lexical 'type' can be described as RDF properties. We gave up on this on 
>>>the grounds largely of triple-bloat, a concern which now seems curiously 
>>>irrelevant when one contemplates what OWL will look like.  Anyway, in 
>>>the spirit of Brian's comment,
>>>
>>>>I've tried to be careful not to describe it as a proposal.  This is an
>>>>alternative design.  I'm not proposing it, just describing it.
>>>
>>>here's the design.
>>>
>>>Plain literals are just strings, and they denote themselves. There are 
>>>no typed literals. Datatypes are indicated by class/property names. 
>>>Datatype values are typically indicated by bnodes, so instead of
>>>
>>>aaa ppp "sss"^^ddd .
>>>
>>>we write
>>>
>>>aaa ppp _:x .
>>>_:x ddd "sss" .
>>>
>>>where the _:x denotes the datatype value.  You could use URIs in some 
>>>cases, eg
>>>
>>>ex:PIto5places xsd:number "3.14162"  .
>>>
>>>There is a general D-entailment
>>>
>>>aaa ddd "sss" .
>>>|=
>>>aaa rdf:type ddd .
>>>
>>>when sss is a legal lexical form for the datatype ddd; the version of 
>>>this for XML is an RDF entailment (though see later).
>>>
>>>This design, unlike our present one, has subject terms denoting datatype 
>>>values, so lang tags can be considered to be *properties of datatype 
>>>values*, and the tags themselves can be encoded as simple literals, so 
>>>we just write an assertion:
>>>
>>>_:x rdf:langTag "en" .
>>>
>>>and our current design translates thus:
>>>
>>>aaa ppp "sss"@ttt .
>>>-->>
>>>aaa ppp _:x .
>>>_:x xsd:string "sss" .
>>>_:x rdf:langTag "ttt" .
>>>
>>>Note that xsd:string is the appropriate datatype for simple literals, 
>>>providing a way to in effect put a simple literal string in the subject 
>>>position (encoded as a bnode). In fact, in this design, xsd:string is in 
>>>effect owl:sameAs applied to literals.
>>>
>>>----
>>>
>>>This way of handling lang tags allows us to associate lang tags with XML 
>>>literals without putting the tag into the lexical space of the literal, 
>>>so allows XML literal to be a normal datatype, just as it is right now 
>>>(though read on) while also handling one of Martin's requirements. The 
>>>parsing of parseType="Literal" needs to include the asserting of an 
>>>appropriate rdf:langTag assertion in the graph, according to the XML 
>>>rules, but that seems straightforward. This design also allows sub-XML 
>>>datatypes to automatically inherit language tagging, since they will be 
>>>members of subClasses of rdf:XMLLiteral and hence of rdf:XMLliteral 
>>>itself, and hence the members of these classes will still have any 
>>>properties they had previously. Notice that the property is of the 
>>>literal *value*, rather than syntactically attached to the literal, so 
>>>rdf:langTag only makes intuitive sense for self-denoting literals, or at 
>>>any rate those which denote textual kinds of thing rather than 
>>>mathematical kinds of thing. However, there is no need to have special 
>>>rules to 'ignore' lang tags on non-textual datatypes such as numbers: an 
>>>assertion like
>>>
>>>_:x xsd:integer "25" .
>>>_:x rdf:langTag "en" .
>>>
>>>is semantically vacuous but harmless, or can be considered harmless as 
>>>far as RDF is concerned. (A lang-tag-savvy app might complain about 
>>>things like this.)  Also we don't need lang tags as a syntactic 
>>>attachment to plain literals; the same trick works for plain literals.
>>>
>>>There isn't any general semantics for rdf:langTag, but for particular 
>>>cases it can be defined, eg we can define it for simple literals - 
>>>simple literal *values* can be pairs just as they are right now, and so 
>>>IEXT(I(rdf:langTag)) is all pairs of the form <<sss, tag>, tag> , and 
>>>IEXT(I(xsd:string)) is all pairs <<sss, tag>, sss> -  and for XML literals.
>>>
>>>Here's the MT for the datatyping, re-done in a more up-todate style: D 
>>>is a datatype map, as usual.
>>>If <uri, ddd> is in D then:
>>>I(uri)=ddd;
>>>ddd is in ICEXT(I(rdf:Datatype));
>>>for any string sss,  sss is in the lexical space of ddd iff
>>><L2V(ddd)(sss),sss> is in IEXT(ddd);
>>>If sss is in the lexical space of ddd then
>>>L2V(ddd)(sss) is in ICEXT(ddd)
>>>
>>>Note that being in the class is necessary but not sufficient for the 
>>>datatyping rule to apply; this avoids some of the snags we had with this 
>>>design previously involving subtypes. For example, we can have
>>>ex:octal rdfs:subClassOf xsd:integer .
>>>_:x ex:octal "10" .
>>>
>>>and _:x unambiguously denotes eight; in fact
>>>
>>>_:x owl:sameAs _:y .
>>>_:y  xsd:integer "8" .
>>>
>>>The lexical typing only gets invoked by the datatype property; the class 
>>>membership has to do with the values. Alternative lexical forms give no 
>>>problem either:
>>>
>>>_:x xsd:integer "2" .
>>>_:x xsd:integer "0002" .
>>>
>>>BTW, we could now use rdfs:Literal as a generic superproperty of all 
>>>datatype properties, as well as a superclass of all datatype values, so that
>>>
>>>_:x rdfs:Literal "10" .
>>>
>>>would say that _:x was some value which has "10" as a lexical form, but 
>>>we don't (yet) know which one. Or, we could not do this.
>>>
>>>-----
>>>
>>>This would be a major change and would probably effect several 
>>>implementations.
>>>
>>>In order to change our current design to this we would need to:
>>>1. remove typed literals (or, treat them as an abbreviations for the 
>>>two-triple form, maybe?)
>>>2. remove lang tags from plain literals (or treat these as an 
>>>abbreviation, similarly)
>>>3. introduce rdf:langTag (or whatever) and add prose discussing the use 
>>>of lang tags as properties
>>>4. modify the datatype semantics, as above
>>>5. redefine the XML parsing rules for parseType="Literal"
>>>6. rewrite the Lbase translation appropriately
>>>
>>>I think this would mean changes to every document; it would be a fairly 
>>>horrendous editing task at this stage.
>>>
>>>On the other hand, it does have a certain elegance. There is only one 
>>>kind of literal, and literals are genuinely simple, both syntactically 
>>>and semantically, and always denote themselves in all contexts (remember 
>>>non-tidy graphs?); and it uses RDF as a descriptive language rather than 
>>>extending the syntax in an XML-idiosyncratic way.
>>>
>>>We abandoned this design, as I recall, for three reasons. First, it 
>>>seemed too 'indirect' and like triple-bloat. However, in our current 
>>>design we have to specify the same information, and we can infer the bnode:
>>>
>>>aaa ppp "10"^^xsd:integer .
>>>|=
>>>aaa ppp _:x .
>>>
>>>compare
>>>
>>>aaa ppp _:x .
>>>_:x xsd:integer "10" .
>>>
>>>an in any case in this post-OWL era, triple-bloat seems to be rampant. I 
>>>note that it would be harmless to allow the current typed-literal form 
>>>as an abbreviation for the two-triple form, by the way; or even as an 
>>>alternative, with inference rules to convert them back and forth. The 
>>>feeling of being 'indirect' came, as I recall, from a feeling that we 
>>>*ought* to be able, dammit, to write things like
>>>ex:Jill ex:age "10"
>>>rather have to go through a bnode:
>>>ex:Jill ex:age _:x .
>>>_:x xsd:integer "10" .
>>>This feeling now seems to me to have been overly naive, however, with 
>>>the benefit of hindsight.
>>>
>>>Second, it seemed unintuitive to some folk to have a property and a 
>>>class with the same name. I never had this trouble myself, and it seems 
>>>to me to be a good illustration of the usefulness of the intensional 
>>>semantics that RDF provides: if you've got it, flaunt it. [*see PS] 
>>>However, the design could be modified by allowing systematic variants 
>>>for the property or class names, eg using xsd:integer for the property 
>>>and xsd:Integer for the class.  Or we could do without the datatype 
>>>classes altogether, since
>>>
>>>aaa rdf:type xsd:integer .
>>>  (read: aaa is an integer)
>>>
>>>and
>>>
>>>aaa xsd:integer _:x .
>>>(read: aaa is something denoted by a numeral)
>>>
>>>convey the exact same information in {xsd:integer}-interpretations.
>>>
>>>Third, as I recall, there were some issues arising from the long-range 
>>>datatyping getting too complicated. OK, Im not suggesting re-opening 
>>>that particular can of worms. (Though I would note that when it does get 
>>>re-opened in the future, I bet this design will be a lot more tractable 
>>>than our current design, which will have to be simply shelved.)
>>>
>>>----
>>>
>>>The other i18n issue involved treating XML literals without markup as 
>>>being  plain text. Assuming that 'plain text' means a character string, 
>>>I now think we can do that by a bit of semantic sleight of hand as 
>>>follows. First, observe that any piece of XML can be encoded as a 
>>>character string, but XML imposes extra equivalence (identity) 
>>>conditions, such as identifying "<br />" with "<br></br>". So, consider 
>>>the set of legal XML texts, considered as Unicode strings, and define an 
>>>equivalence relation on this set by saying that strings with the same 
>>>XML normal form are equivalent; then say that any such string denotes 
>>>its equivalence class, and then in a familiar abuse of notation say that 
>>>singleton classes are identical to their members. Now, any piece of XML 
>>>text without any markup in it denotes itself, just as a plain literal 
>>>does. (There may be some whitespace issues which make "  " (two spaces) 
>>>equivalent to " " (one space); if so, this will need to be stated more 
>>>carefully, eg by applying the normalization only to stuff inside <->.) 
>>>If we say that this is the value space of rdf:XMLLiteral, rather than 
>>>the non-text 'structural' sets we have at present, then Martin might be 
>>>happier.
>>>
>>>On the other hand, this supports a number of hard-to-state RDF 
>>>entailments, such as intersubstituting "sss"^^xsd:string and 
>>>"sss"^^rdf:XMLLiteral  under circumstances which can only be recognized 
>>>by an XML parser, which seems *very* ugly to include in basic RDF, so I 
>>>would argue that if we do something like this then we treat 
>>>rdf:XMLLiteral as a genuine datatype so that these entailments are 
>>>restricted to D-interpretations and are not valid in simple RDF; and it 
>>>also means that XML *with* markup denotes something very like a 
>>>character string; in particular,
>>>"&lt;"^^rdf:XMLLiteral
>>>on this proposal, has got absolutely nothing in common with
>>>"<"^^xsd:string.  So maybe Martin might not be so happy after all.
>>>
>>>Anyway, thought I'd just mention it in passing.
>>>
>>>Pat
>>>
>>>PS.  I thought of an interesting analogy. Literals are a kind of name, 
>>>and in a simple extensional logic they would have a fixed denotation, eg 
>>>numerals denote numbers, I("10")=10 (ie, ten) and so on, end of 
>>>story.  But RDF is intensional, and datatypes treat literals like 
>>>intensional names. Seen in this way, the literal always denotes itself, 
>>>ie I(literal)=literal; but it has a variable extension, *determined by 
>>>the datatype context*. In other words, the datatype lexical-to-value map 
>>>is a kind of extension mapping, like IEXT for properties and ICEXT for 
>>>classes.  Call it ILEXT-d where d is the datatype; then the 'meaning' of 
>>>a literal string sss in a datatype context defined by d would be 
>>>ILEXT-d(I(sss)) - compare IEXT(I(p)) or ICEXT(I(a)) where p is a 
>>>property uri and a is a uri or bnode - which since I(sss) = sss is just 
>>>ILEXT-d(sss), i.e. L2V(d)(sss).  This is exactly what the subject bnode 
>>>denotes in a datatype triple; in other words, we are using the datatype 
>>>property name as a kind of explicit extension mapping on literal 
>>>strings. On this view, then, what a datatype does is to fix the 
>>>extension mapping for literals, considered as intensional names. The 
>>>universal superproperty rdfs:Literal works the same way but refuses to 
>>>supply a context, so letting the extension mapping be anything.
>>>
>>>
>>>--
>>>---------------------------------------------------------------------
>>>IHMC    (850)434 8903 or (650)494 3973   home
>>>40 South Alcaniz St.    (850)202 4416   office
>>>Pensacola                       (850)202 4440   fax
>>>FL 32501                        (850)291 0667    cell
>>>phayes@ihmc.us       http://www.ihmc.us/users/phayes
>>
>>------------
>>Graham Klyne
>>GK@NineByNine.org
>
>
>--
>---------------------------------------------------------------------
>IHMC    (850)434 8903 or (650)494 3973   home
>40 South Alcaniz St.    (850)202 4416   office
>Pensacola                       (850)202 4440   fax
>FL 32501                        (850)291 0667    cell
>phayes@ihmc.us       http://www.ihmc.us/users/phayes

------------
Graham Klyne
GK@NineByNine.org
Received on Saturday, 20 September 2003 05:49:04 UTC