W3C home > Mailing lists > Public > w3c-rdfcore-wg@w3.org > June 2002

Re: literals, again.

From: Dan Connolly <connolly@w3.org>
Date: 28 Jun 2002 01:53:12 -0500
To: pat hayes <phayes@ai.uwf.edu>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <1025247193.22162.212.camel@dirk>

In sum: I don't see any new information that motivates
re-opening the issue, though I do agree with pretty
much all of this. I do have a suggestion for how
to handle some specifics of literals in the
model theory.

With apologies for piping up in this thread but
not in
  # Bad job on literals? Sergey Melnik (Tue, May 21 2002)
 
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002May/thread.html#71

on to the details:

On Thu, 2002-06-27 at 13:23, pat hayes wrote:
> Apologies to all for reopening this business, but Ive been descending 
> into a black hole on this issue since the F2F. I really, really do 
> not like the situation we seem to be in here regarding literals, for 
> a whole lot of reasons. This message tries to enumerate them and to 
> suggest a possible solution. I know this has been done before and 
> apologize in advance for missing the obvious objections that have 
> probably already been raised.
> 
> Summary: literals ought to be strings, not 3-tuples. To achieve this, 
> literals should be allowed in subject position. If we do this, a 
> number of issues are clarified and the overall design of the language 
> is rationalized and simplified, at almost no cost. Also, 
> datastructuring suddenly gets easier.


That's my preference as well... I elaborated in:

  parseType="Literal" as syntactic sugar for infoset description
(#rdfms-literal-is-xml-structure)
  From: Dan Connolly (connolly@w3.org)
  Date: Wed, Oct 10 2001
  http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0153.html )

But I have come to grudgingly accept the status quo; let
me explain how I look at it; maybe you'll find it
acceptable too, then...

> --------
> 
> First let me summarize my understanding of the current state of play.
> 
> 1. Literals are not strings.

You keep saying that, but what if you say: Literals are,
sometimes, strings. Sometimes they're not.

Think of Literal as a union of
String, String X Lang, XML-thing, XML-thing X Lang.
(XML-things happen to have string serializations)

Or think of String as a subsort of Literal,
if sorted logics is an appealing analogy...

> 2. Literals are 3-tuples, consisting of a bit and two strings, the 
> second of which is an XML lang tag.
> 
> There are some constraints on these three parts of a literal.
> 
> 3. If the bit is set (Im not sure if that is a one or a zero) then 
> the second string is required to be well-formed XML. 
> 4. If the bit is set, then the lang tag is understood to be the lang 
> tag of the second string.
> 5. If the bit is not set, then the lang tag is understood to indicate 
> that the second string is an expression of that language.

The lang tag works the same way whether the bit is set or not:
it's just an optional language tag.


> This has some curious consequences.
> 
> First, (3) has the consequence that any RDF engine must include an 
> XML parser, even if it is not expected to parse RDF/XML.

Yes, this is very unaesthetic, to me.

> The graph 
> syntax has all of XML syntax incorporated into it. This seems to me 
> to be the most serious problem. Under these circumstances it hardly 
> seems worth having the graph syntax: it would be better to just 
> identify RDF with RDF/XML and stop pretending.

No, for the purposes that the graph serves, the fact that
literals can be XML-things is sorta like the fact that
strings are over the unicode alphabet; it's pretty opaque;
you don't need to consider the model-theoretic impact
of each part of the XML grammar any more than you need
to consider each unicode character separately.


> Second, (4) and (5) interact in odd ways. Notice that the meaning of 
> the lang tag changes according to the bit setting. Suppose that the 
> second string is, in fact, well-formed XML and the lang tag is "FR". 
> If the bit is set, then tout est bien; but if the bit is not set, 
> then this combination is a disaster, since that well-formed XML is a 
> mere string, its resemblance to XML a mere accident, and then the 
> lang tag is claiming that it is a French text, which of course (being 
> well-formed XML) it is not, in fact. I am not sure what an RDF engine 
> is supposed to do at this point: would it be expected to reject this 
> as an incoherent literal?

No. Don't think of it as "french text"; think of it as
"unicode text with a french-language label hanging off the side".


> Third, since we have decided that literals denote themselves, this 
> construction means that these ad-hoc 3-tuples must be in the semantic 
> domain of any RDF interpretation, and hence of any interpretation of 
> any language which extends RDF.

Ah; good point; I thought the difference between 3-tuples
and the union notion above was stylistic, so I didn't
make much noise about it previously, but here it
has teeth.

I suggest we say that IR includes
String, String X Lang, XML-thing, and XML-thing X Lang.

(and yes, I find it barely acceptable to include
String X Lang and XML-thing X Lang. Ugh! we can
use RDF properties for all sorts of things, except
language tags. Go figure.)

> This is at best an unattractive 
> consequence, and it might be disastrous if other languages expect to 
> handle literals differently (as I am sure they will). It seems 
> particularly weird to incorporate XML *syntax* into the semantic 
> domain of all web ontology languages.

Hmm... no more than including lists in all KIF domains, to me.
(well... not *much* more; lists are so much more obviously
minimal...)

As I said in my Oct 10 message, I'd prefer to build
XML-things out of RDF properties, but I can live
with just assuming they're in the domain.

> Notice that we do not have the 
> option of making these 3-tuples 'optional' once they are in the 
> semantic domain.
> 
> Further to the previous point, datatyping simply does not work. All 
> the datatyping proposals so far considered - ALL of them - have been 
> predicated on the assumption that literals are members of the lexical 
> space of XML datatypes. But 3-tuples consisting of a bit and two 
> strings are not in the lexical space of any XML datatype. So we 
> simply cannot do datatyping in RDF at present, seems to me, with 
> literals the way they are.

Seems easy enough to gloss over, by looking at the Literal
part of the domain as including Strings.


> Finally, there is a purely aesthetic reason which one might reject, 
> but it is worth mentioning: there is an obvious analogy between the 
> lang tag and a datatype, and it would be nice if the overall RDF 
> scheme of things preserved this analogy.

Quite. (i.e. no, this is not new information.)

> For all these reasons, I would suggest that the decision to make 
> literals (in the graph syntax) into things other than strings was a 
> very bad one, and should be reconsidered. (I would have said this a 
> long time ago if I had realized that this was in fact the decision 
> that had been taken, and Im sorry that I missed that in the weeks 
> following the Cannes meeting.)
> 
> ------
> 
> I gather that the motivation for this decision was twofold: the bit 
> is there to record the parsetype being XML, to allow accurate XML 
> round-tripping; and the lang tag is needed by some RDF users. (I find 
> myself sceptical that the bit is actually needed: would it really be 
> a disaster if a string which just happened to be legal XML, but 
> hadn't been parsed from XML input, was accidentally misreported as 
> being true XML?

yes. Absolutely.

> This seems to me like worrying about the possibility 
> that monkeys might accidentally type out some Milton sonnets. But 
> never mind.)
> 
> Now, it seems to me that both of these are examples of information 
> *about* a literal string which needs to be recorded in an unambiguous 
> form. Since RDF is itself a language for asserting things about 
> things, the obvious way to record information in an RDF graph is to 
> use triples to make such assertions; but the obvious problem in doing 
> that is that the literal in question would naturally be the subject 
> of the triple, and literals cannot be subjects. Damn.

Again, I agree; but I don't see any new information that should
cause folks to change their minds.

[...]

-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
Received on Friday, 28 June 2002 02:52:41 EDT

This archive was generated by hypermail pre-2.1.9 : Wednesday, 3 September 2003 09:49:27 EDT