W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > June 2002

Re: literals, again.

From: pat hayes <phayes@ai.uwf.edu>
Date: Fri, 28 Jun 2002 11:28:40 -0500
Message-Id: <p05111b30b94237df2a6b@[65.217.30.113]>
To: Dan Connolly <connolly@w3.org>
Cc: w3c-rdfcore-wg@w3.org

>  >
>>  1. Literals are not strings.
>
>You keep saying that, but what if you say: Literals are,
>sometimes, strings. Sometimes they're not.

Yuk. Having alternatives in a semantics is even worse. Couldnt we say 
that having no lang tag is actually having a lang tag of "none" ?

>Think of Literal as a union of
>String, String X Lang, XML-thing, XML-thing X Lang.
>(XML-things happen to have string serializations)

And can we say that Lang is a string? Then literals are pairs <x,y> 
where x is either a string or an XML structure and y is a string from 
a finite list of lang strings. Yuck but doable.

>
>Or think of String as a subsort of Literal,
>if sorted logics is an appealing analogy...
>
>>  2. Literals are 3-tuples, consisting of a bit and two strings, the
>>  second of which is an XML lang tag.
>>
>>  There are some constraints on these three parts of a literal.
>>
>>  3. If the bit is set (Im not sure if that is a one or a zero) then
>>  the second string is required to be well-formed XML.
>>  4. If the bit is set, then the lang tag is understood to be the lang
>>  tag of the second string.
>>  5. If the bit is not set, then the lang tag is understood to indicate
>>  that the second string is an expression of that language.
>
>The lang tag works the same way whether the bit is set or not:
>it's just an optional language tag.

OK, I was basing the comments on what was told me at the F2F. I guess 
it all depends on what 'language tag' is supposed to mean when 
attached to a string. Strict MT answer: nothing, you are saying.

>  > This has some curious consequences.
>>
>>  First, (3) has the consequence that any RDF engine must include an
>>  XML parser, even if it is not expected to parse RDF/XML.
>
>Yes, this is very unaesthetic, to me.
>
>>  The graph
>>  syntax has all of XML syntax incorporated into it. This seems to me
>>  to be the most serious problem. Under these circumstances it hardly
>>  seems worth having the graph syntax: it would be better to just
>>  identify RDF with RDF/XML and stop pretending.
>
>No, for the purposes that the graph serves, the fact that
>literals can be XML-things is sorta like the fact that
>strings are over the unicode alphabet; it's pretty opaque;
>you don't need to consider the model-theoretic impact
>of each part of the XML grammar any more than you need
>to consider each unicode character separately.

The model theory doesn't need to go into the details, but an actual 
RDF engine cannot recognize well-formed RDF graphs without using a 
full XML parser. Which means that it is now disingenuous to say that 
RDF/XML is a lexicalization syntax for RDF graphs; the XML encodes 
the graph which includes the XML, so its just a roundabout way of 
encoding XML. (This was brought home to me by Patrick's idea, 
expressed in a discussion at the F2F, to encode 'inner contexts' as 
XML/RDF literals. That seemed insane to me at the time, since I was 
thinking of literals as semantically opaque; but with this view of 
the RDF graph it makes perfect sense. By the way, it also provides a 
neat way to do reification better than by using reification. :-)

>  > Second, (4) and (5) interact in odd ways. Notice that the meaning of
>>  the lang tag changes according to the bit setting. Suppose that the
>>  second string is, in fact, well-formed XML and the lang tag is "FR".
>>  If the bit is set, then tout est bien; but if the bit is not set,
>>  then this combination is a disaster, since that well-formed XML is a
>>  mere string, its resemblance to XML a mere accident, and then the
>  > lang tag is claiming that it is a French text, which of course (being
>>  well-formed XML) it is not, in fact. I am not sure what an RDF engine
>>  is supposed to do at this point: would it be expected to reject this
>>  as an incoherent literal?
>
>No. Don't think of it as "french text"; think of it as
>"unicode text with a french-language label hanging off the side".

OK, got that now.

<snip>
>  > ------
>>
>>  I gather that the motivation for this decision was twofold: the bit
>>  is there to record the parsetype being XML, to allow accurate XML
>  > round-tripping; and the lang tag is needed by some RDF users. (I find
>>  myself sceptical that the bit is actually needed: would it really be
>>  a disaster if a string which just happened to be legal XML, but
>>  hadn't been parsed from XML input, was accidentally misreported as
>>  being true XML?
>
>yes. Absolutely.

Just out of interest, now, can you tell me why? Like, what would be 
an example of a Bad Thing that could happen here? (What are the odds 
of a piece of well-formed XML just arising accidentally ??)

Pat
-- 
---------------------------------------------------------------------
IHMC					(850)434 8903   home
40 South Alcaniz St.			(850)202 4416   office
Pensacola,  FL 32501			(850)202 4440   fax
phayes@ai.uwf.edu 
http://www.coginst.uwf.edu/~phayes
Received on Friday, 28 June 2002 12:31:24 EDT

This archive was generated by hypermail pre-2.1.9 : Wednesday, 3 September 2003 09:49:28 EDT