Re: first pass parseType="Literal" text for primer from Martin Duerst on 2003-07-29 (w3c-rdfcore-wg@w3.org from July 2003)

From: Martin Duerst <duerst@w3.org>
Date: Tue, 29 Jul 2003 17:21:14 -0400
To: Brian McBride <bwm@hplb.hpl.hp.com>, Graham Klyne <gk@ninebynine.org>
Cc: rdf core <w3c-rdfcore-wg@w3.org>, i18n <w3c-i18n-ig@w3.org>
Message-Id: <4.2.0.58.J.20030729164148.068d9cf0@localhost>
Hello Brian,

I think it is important to stay as close as possible to the
actual specs, so I appreciate David's corrections, but I think
they are irrelevant to your main point at hand, and so I'm
answering to your original mail with the assumption that
you had already made these corrections (except that I have
changed it to <br>, because I agree it is good to have
something suggestive, but I think <em/> is a bad example
for an empty element). (and <br/> might actually be a way
to model book titles with multiple, clearly distinguished
lines).

At 10:45 03/07/29 +0100, Brian McBride wrote:

>On Tue, 2003-07-29 at 00:35, Graham Klyne wrote:
>
>[...]
>
> > >Their representations are different. But why do their denotations
> > >have to be different?
>
>[...]
>
> > >But note that we are not speaking about changing the interpretation
> > >of something by changing from plain literal to XML literal, we are
> > >speaking about two different representations ((1) and (2)) that
> > >could/should denote the same string of characters.
> >
> > I thought about that, but within the current scheme couldn't see any 
> way to
> > make it work, for the reason noted above.
> >
> > I don't think I've anything more constructive to add at this stage, and
> > should back off.  Maybe someone else can see a way past the block that I
> > perceive?
>
>I haven't followed this thread in detail, so may be off base, but in
>this last message I see something that may be relevant.
>
>Martin's use of the terms "interpretation" and denotation may be
>different to ours, as suggested elsewhere by Pat.

I have tried to address that in a reply to Pat.


>Perhaps we have not
>made clear, what I'll loosely call the substition rule.
>
>Martin, the issue is one of round tripping.  Given the following input:
>
><rdf:Description>
>   <eg:prop rdf:parseType="Literal"><br/></eg:prop>
></rdf:Description>
>
>If the xml literal *denotes* the character string "<br/>", and in,
>
><rdf:Description>
>   <eg:prop>&lt;br/&gt;</eg:prop>
></rdf:Description>
>
>the plain literal also denotes the character string "<br/>", then it is
>legal for an RDF processor to substitute one for another, e.g. an RDF
>copy program could read the first of these and write the second because
>their semantics, according to RDF, are exactly the same.
>
>I don't think anyone wants that.

I agree, I definitely wouldn't want that either.


>Martin: does the "substitution rule" explain why the denotations must be
>different?

It explains why the denotations must be different in this case.
It does not explain why the denotations must be different in
all cases.


>Now, Martin may have spotted something we have missed.  I suggest a way
>to describe this clearly.  We have three concepts, concrete syntax
>(rdf/xml), abstract syntax (closest rep is n-triples) and denotation.
>This can be described in a three column table, concrete syntax, abstract
>syntax and denotation.
>
>I think what we have at the moment is:
>
>Concrete Syntax                 | Abstract Syntax        | Denotation
>-----------------------------------------------------------------
><eg:prop>a</eg:prop>            | "a"                    | "a"
><eg:prop>&lt;br/&gt;</eg:prop>  | "<br/>"                | "<br/>"
><eg:prop pt="L"><br/></eg:prop> | "<br></br>^^rdf:XML    | C("<br/>")
><eg:prop pt="L">&amp;</eg:prop> | "&"^^rdf:XML           | C("&")
>
>I've abbreviated rdf:parseType="Literal" to pt="L" to fit on one line.

I've abbreviated rdf:XMLLiteral, too.


>C(x) is cannonicalization of x, encoded as a UTF8 octet sequence, e.g.
>C("&") is the octet sequence corresponding to "&amp;".

This is dangerous, because you have missed one escaping level.
What is C("&amp;")? If this works the same as C("<br/>"), it
should be "&amp;" (in UTF-8 if we still keep that, but I think
Patrick said that was removed). But I think again this is just
a detail. Let me use a somewhat more extensive, ad-hoc, notation,
and some more examples, to show what I mean.

(1)
Concrete Syntax: <eg:prop>a</eg:prop>

Abstract Syntax: "a"

Denotation:
     sequence(character('a'))

(2)
Concrete Syntax: <eg:prop>&lt;br/&gt;</eg:prop>

Abstract Syntax: "<br/>"

Denotation:
     sequence(character('<'), character('b'), character('r'),
          character('/'), character('>'))

(3)
Concrete Syntax: <eg:prop pt="L">&lt;br/&gt;</eg:prop>
(additional example)

Abstract Syntax: "&lt;br/&gt;"^^rdf:XML

Denotation:
     sequence(character('<'), character('b'), character('r'),
          character('/'), character('>'))

(4)
Concrete Syntax: <eg:prop pt="L"><br/></eg:prop>

Abstract Syntax: "<br></br>"^^rdf:XML

Denotation:
     sequence(markup('<br>'), markup('</br>'))


(5)
Concrete Syntax: <eg:prop pt="L">&amp;</eg:prop>

Abstract Syntax: "&amp"^^rdf:XML

Denotation:
     sequence(character('&'))

(6)
Concrete Syntax: <eg:prop>&amp;</eg:prop>

Abstract Syntax: "&"

Denotation:
     sequence(character('&'))

(7)
Concrete Syntax: <eg:prop pt="L"><em>Wow!</em></eg:prop>

Abstract Syntax: "<em>Wow!</em>"^^rdf:XML

Denotation:
     sequence(markup('<em>'), character('W'), character('o'),  character('w'),
         character('!'), markup('</em>'))

So the denotation for XML literals is a sequence of characters and markup.
Element content (including character escapes in canonical XML) becomes
characters; all the rest (start tags, end tags, PIs, comments,...)
becomes markup. We could refine this denotation to approximate the
infoset, but even the notation provided here is probably too much
detail for the spec.


>The relationship between the abstract syntax and the denotation must be
>functional - i.e. there is only one denotation (per interpretation) for
>any given fragment of abstract syntax.

I don't understand the 'per interpretation' bit, but I think the
system I have lined out above is indeed functional.

Please note that I don't think that the RDF Core WG is in any way
required to change to the above system, because neither I nor the
I18N WG have made a last call comment in this direction.

I hope however that this helps you understand why the issue of
consistent handling of language information on plain literals
and XML Literals is so important: If we want to have any chance
to recover the equivalence between (2) ad (3) above, or more generally,
between simple strings without markup in plain literals and XML literals,
then the different solutions for language information for plain
literals and XML Literals, and in particular the additional '<dummy>'
elements needed to put on language information, make a huge difference.


>The point I made above, is that for any two rows with the same value in
>the denotation column, it doesn't matter what form of concrete syntax
>one uses - they have equivalent meaning and one can be freely
>substituted for the other.

Yes, I understand that. That should still work in my system.


Regards,    Martin.
Received on Tuesday, 29 July 2003 18:01:58 UTC