RE: I18N Issue alternative: collapsing plain and xml literals from Patrick.Stickler@nokia.com on 2003-09-08 (w3c-rdfcore-wg@w3.org from September 2003)

From: <Patrick.Stickler@nokia.com>
Date: Mon, 8 Sep 2003 16:29:10 +0300
To: <bwm@hplb.hpl.hp.com>, <w3c-rdfcore-wg@w3.org>
Cc: <duerst@w3.org>, <ishida@w3.org>, <w3c-i18n-ig@w3.org>, <Patrick.Stickler@nokia.com>
Message-ID: <A03E60B17132A84F9B4BB5EEDE57957B02A2E958@trebe006.europe.nokia.com>

> -----Original Message-----
> From: ext Brian McBride [mailto:bwm@hplb.hpl.hp.com]
> Sent: 08 September, 2003 14:08
> To: rdf core
> Cc: Martin Duerst; Richard Ishida; i18n
> Subject: I18N Issue alternative: collapsing plain and xml literals
> 
> 
> 
> After discussing this informally over lunch, Danbri asked me 
> to send it 
> to the list to make our consideration of it explicit.
> 
> This is an alternative design for literals.  The  idea is to drop the 
> rdf:XMLLiteral datatype and allow plain literals to contain 
> markup.  Two 
> test cases illustrate:
> 
> <rdf:Description>
>    <eg:prop>foo &lt;br /&gt; bar</eg:prop>
> </rdf:Description
> 
> parses to:
> 
> _:a eg:prop "foo &lt;br /&gt; bar" .

This is not what I would expect, given how XML works.

I would expect

  _:a eg:prop "foo <br /> bar" .

If you wanted

 _:a eg:prop "foo &lt;br /&gt; bar" .

you'd need to say

 <rdf:Description>
    <eg:prop>foo &amp;lt;br /&amp;gt; bar</eg:prop>
 </rdf:Description

> <rdf:Description>
>    <eg:prop rdf:parseType="Literal"><br /></eg:prop>
> </rdf:Description>
> 
> parses to:
> 
> _:a eg:prop "foo <br></br> bar" .
> 
> The definition of a plain literal changes.  The lexical space 
> of plain 
> literal becomes the lexical space of rdf:XMLLiteral, i.e. is 
> restricted 
> to (the unicode representation of) canonicalised well formed balanced 
> xml markup.  The denotation of a plain literal remains - it is a 
> sequence of unicode characters - permitting string comparison for 
> equality testing.

I proposed something like this earlier, but got backlash at
the loss of distinction between things that are text+markup
versus things that simply look like text+markup.

C.f. http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0165.html


> Advantages:
> 
> I think this provides everything that Martin has been asking for:
> 
>    - no discontinuity between plain and xml literals
>    - lang on mixed content
>    - no use of datatypes
> 
> Disadvantages:
> 
> - a bigger change than alternatives
> - builds XML into the core of the RDF model

???

Can you elaborate?

> - breaks current implementations (but see below)
> 
> Ameliorating the Disadvantages - implementation strategy
> 
> The above design says that e.g. "<" is not in the lexical 
> space of plain 
> literals,

Why not? How could you constrain the infinite set of strings in 
such a fashion?

> and many (all?) current implementations will store
> "<" in their representation of a graph.  The point to note is that 
> implementations are free to represent literals any way they please. 
> Thus "<" is just the way this implementation represents the 
> literal "&lt;".

And how then do I represent the literal "<", distinct from "&lt;"???

Sorry, but I'm just gonzo confused...

> The implementation does need to distinguish between markup and plain 
> text.  To do this, it adds a single bit to literals to 
> indicate whether 
> they are stored in escaped or unescaped form.  The above 
> example was in 
> unescaped form, which cannot represent markup.  To represent 
> markup, the 
> literal must be be stored in escaped form.

So, we've got the XML bit back?

> Literal comparison becomes more complex - literals stored in 
> unescaped 
> form should first be escaped and then canonicalized.  Various 
> optimization strategies can be employed here.
> 
> By this strategy, It may be possible to argue that this approach does 
> not break current implementations of plain literals.  It simply makes 
> clearer what xml literals are.

Well, it has certainly confused me...

Patrick

Received on Monday, 8 September 2003 09:29:34 UTC