- From: pat hayes <phayes@ai.uwf.edu>
- Date: Fri, 28 Jun 2002 11:28:40 -0500
- To: Dan Connolly <connolly@w3.org>
- Cc: w3c-rdfcore-wg@w3.org
> > >> 1. Literals are not strings. > >You keep saying that, but what if you say: Literals are, >sometimes, strings. Sometimes they're not. Yuk. Having alternatives in a semantics is even worse. Couldnt we say that having no lang tag is actually having a lang tag of "none" ? >Think of Literal as a union of >String, String X Lang, XML-thing, XML-thing X Lang. >(XML-things happen to have string serializations) And can we say that Lang is a string? Then literals are pairs <x,y> where x is either a string or an XML structure and y is a string from a finite list of lang strings. Yuck but doable. > >Or think of String as a subsort of Literal, >if sorted logics is an appealing analogy... > >> 2. Literals are 3-tuples, consisting of a bit and two strings, the >> second of which is an XML lang tag. >> >> There are some constraints on these three parts of a literal. >> >> 3. If the bit is set (Im not sure if that is a one or a zero) then >> the second string is required to be well-formed XML. >> 4. If the bit is set, then the lang tag is understood to be the lang >> tag of the second string. >> 5. If the bit is not set, then the lang tag is understood to indicate >> that the second string is an expression of that language. > >The lang tag works the same way whether the bit is set or not: >it's just an optional language tag. OK, I was basing the comments on what was told me at the F2F. I guess it all depends on what 'language tag' is supposed to mean when attached to a string. Strict MT answer: nothing, you are saying. > > This has some curious consequences. >> >> First, (3) has the consequence that any RDF engine must include an >> XML parser, even if it is not expected to parse RDF/XML. > >Yes, this is very unaesthetic, to me. > >> The graph >> syntax has all of XML syntax incorporated into it. This seems to me >> to be the most serious problem. Under these circumstances it hardly >> seems worth having the graph syntax: it would be better to just >> identify RDF with RDF/XML and stop pretending. > >No, for the purposes that the graph serves, the fact that >literals can be XML-things is sorta like the fact that >strings are over the unicode alphabet; it's pretty opaque; >you don't need to consider the model-theoretic impact >of each part of the XML grammar any more than you need >to consider each unicode character separately. The model theory doesn't need to go into the details, but an actual RDF engine cannot recognize well-formed RDF graphs without using a full XML parser. Which means that it is now disingenuous to say that RDF/XML is a lexicalization syntax for RDF graphs; the XML encodes the graph which includes the XML, so its just a roundabout way of encoding XML. (This was brought home to me by Patrick's idea, expressed in a discussion at the F2F, to encode 'inner contexts' as XML/RDF literals. That seemed insane to me at the time, since I was thinking of literals as semantically opaque; but with this view of the RDF graph it makes perfect sense. By the way, it also provides a neat way to do reification better than by using reification. :-) > > Second, (4) and (5) interact in odd ways. Notice that the meaning of >> the lang tag changes according to the bit setting. Suppose that the >> second string is, in fact, well-formed XML and the lang tag is "FR". >> If the bit is set, then tout est bien; but if the bit is not set, >> then this combination is a disaster, since that well-formed XML is a >> mere string, its resemblance to XML a mere accident, and then the > > lang tag is claiming that it is a French text, which of course (being >> well-formed XML) it is not, in fact. I am not sure what an RDF engine >> is supposed to do at this point: would it be expected to reject this >> as an incoherent literal? > >No. Don't think of it as "french text"; think of it as >"unicode text with a french-language label hanging off the side". OK, got that now. <snip> > > ------ >> >> I gather that the motivation for this decision was twofold: the bit >> is there to record the parsetype being XML, to allow accurate XML > > round-tripping; and the lang tag is needed by some RDF users. (I find >> myself sceptical that the bit is actually needed: would it really be >> a disaster if a string which just happened to be legal XML, but >> hadn't been parsed from XML input, was accidentally misreported as >> being true XML? > >yes. Absolutely. Just out of interest, now, can you tell me why? Like, what would be an example of a Bad Thing that could happen here? (What are the odds of a piece of well-formed XML just arising accidentally ??) Pat -- --------------------------------------------------------------------- IHMC (850)434 8903 home 40 South Alcaniz St. (850)202 4416 office Pensacola, FL 32501 (850)202 4440 fax phayes@ai.uwf.edu http://www.coginst.uwf.edu/~phayes
Received on Friday, 28 June 2002 12:31:24 UTC