- From: pat hayes <phayes@ihmc.us>
- Date: Mon, 8 Sep 2003 12:45:22 -0700
- To: Brian McBride <bwm@hplb.hpl.hp.com>
- Cc: Martin Duerst <duerst@w3.org>, "Richard Ishida" <ishida@w3.org>, i18n <w3c-i18n-ig@w3.org>, w3c-rdfcore-wg@w3.org
(sorry, sent twice: forgot some of the CCs the first time). >After discussing this informally over lunch, Danbri asked me to send >it to the list to make our consideration of it explicit. > >This is an alternative design for literals. The idea is to drop >the rdf:XMLLiteral datatype and allow plain literals to contain >markup. Allow, or require? That is, if they happen to contain the symbol '<', is that *required* to be considered to be XML markup? > Two test cases illustrate: > ><rdf:Description> > <eg:prop>foo <br /> bar</eg:prop> ></rdf:Description > >parses to: > >_:a eg:prop "foo <br /> bar" . I take it that this is intended to illustrate that without the parseType, the literal string is rendered exactly as it is (?) How about <rdf:Description> <eg:prop><<</eg:prop> </rdf:Description> does that parse to :_a eg:prop "<<" . ? > ><rdf:Description> > <eg:prop rdf:parseType="Literal"><br /></eg:prop> ></rdf:Description> > >parses to: > >_:a eg:prop "foo <br></br> bar" . ?? Eh? Where do the foo and bar come from? > >The definition of a plain literal changes. The lexical space of >plain literal becomes the lexical space of rdf:XMLLiteral, i.e. is >restricted to (the unicode representation of) canonicalised well >formed balanced xml markup. That is unacceptable right there. Applications may want to have plain literals that are not XML, eg the Reuters applications where literals are used to capture free-text paragraphs. > The denotation of a plain literal remains - it is a sequence of >unicode characters - permitting string comparison for equality >testing. ?? So this amounts to a proposal to get rid of plain literals, in effect, and to just not mention the 'XMLLiteral' type explicitly? > >Advantages: > >I think this provides everything that Martin has been asking for: > > - no discontinuity between plain and xml literals Indeed, but I do not want us to concede this point to Martin. He is WRONG about this, and we should refuse to let him (or i18n) bully us into conceding this issue. Plain text is not the same as XML without markup; that view only makes sense in a completely XML-centric view of the entire world of lexicography and notation. Most of the world's languages and notations are not dialects of XML. SCL, for one, is not XML without markup. Virtually every piece of program code ever written is not XML without markup. The mathematical statement (I quote) "2<3" is not XML without markup, and it certainly isn't XML with markup. And "2<3" is just gibberish. > - lang on mixed content > - no use of datatypes > >Disadvantages: > >- a bigger change than alternatives >- builds XML into the core of the RDF model Which is a fatal objection. If RDF does not outlive XML then it will have been a failure. >- breaks current implementations (but see below) > >Ameliorating the Disadvantages - implementation strategy > >The above design says that e.g. "<" is not in the lexical space of >plain literals Wait a minute. If plain literals do not have a datatype, then this 'lexical space' terminology is meaningless. Are they typed or not? >, and many (all?) current implementations will store >"<" in their representation of a graph. The point to note is that >implementations are free to represent literals any way they please. >Thus "<" is just the way this implementation represents the literal >"<". But that is INSANE. Here I am wanting to write an RDF ontology about mathematical expressions, say, and I want to refer to pieces of mathematical text like "2<3" (two is less than three). But now I can't do that because this is short for "2<3", which is completely meaningless. The basic point is that the RDF machinery is not intended to be restricted to only REFER to XML text. It is required to be encodable in XML, but it is that already. This proposal makes it impossible to refer to non-XML text. >The implementation does need to distinguish between markup and plain text. No, the SEMANTICS needs to distinguish them. >To do this, it adds a single bit to literals to indicate whether >they are stored in escaped or unescaped form. The above example was >in unescaped form, which cannot represent markup. To represent >markup, the literal must be be stored in escaped form. > >Literal comparison becomes more complex - literals stored in >unescaped form should first be escaped and then canonicalized. >Various optimization strategies can be employed here. > >By this strategy, It may be possible to argue that this approach >does not break current implementations of plain literals. It simply >makes clearer what xml literals are. It completely destroys the idea of a plain literal. I think that my 'wet fish' proposal is better than this, if we have to accede to Martin. I would prefer not to accede, since Martin has not responded to the technical objections adequately, and has not given actual technical arguments for his requests. They amount to statements of opinion about the proper role of XML in semiotics, opinions with which one may have legitimate disagreement. Pat -- --------------------------------------------------------------------- IHMC (850)434 8903 or (650)494 3973 home 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32501 (850)291 0667 cell phayes@ihmc.us http://www.ihmc.us/users/phayes
Received on Monday, 8 September 2003 15:45:17 UTC