- From: <Patrick.Stickler@nokia.com>
- Date: Mon, 8 Sep 2003 16:29:10 +0300
- To: <bwm@hplb.hpl.hp.com>, <w3c-rdfcore-wg@w3.org>
- Cc: <duerst@w3.org>, <ishida@w3.org>, <w3c-i18n-ig@w3.org>, <Patrick.Stickler@nokia.com>
> -----Original Message-----
> From: ext Brian McBride [mailto:bwm@hplb.hpl.hp.com]
> Sent: 08 September, 2003 14:08
> To: rdf core
> Cc: Martin Duerst; Richard Ishida; i18n
> Subject: I18N Issue alternative: collapsing plain and xml literals
> 
> 
> 
> After discussing this informally over lunch, Danbri asked me 
> to send it 
> to the list to make our consideration of it explicit.
> 
> This is an alternative design for literals.  The  idea is to drop the 
> rdf:XMLLiteral datatype and allow plain literals to contain 
> markup.  Two 
> test cases illustrate:
> 
> <rdf:Description>
>    <eg:prop>foo <br /> bar</eg:prop>
> </rdf:Description
> 
> parses to:
> 
> _:a eg:prop "foo <br /> bar" .
This is not what I would expect, given how XML works.
I would expect
  _:a eg:prop "foo <br /> bar" .
If you wanted
 _:a eg:prop "foo <br /> bar" .
you'd need to say
 <rdf:Description>
    <eg:prop>foo &lt;br /&gt; bar</eg:prop>
 </rdf:Description
> <rdf:Description>
>    <eg:prop rdf:parseType="Literal"><br /></eg:prop>
> </rdf:Description>
> 
> parses to:
> 
> _:a eg:prop "foo <br></br> bar" .
> 
> The definition of a plain literal changes.  The lexical space 
> of plain 
> literal becomes the lexical space of rdf:XMLLiteral, i.e. is 
> restricted 
> to (the unicode representation of) canonicalised well formed balanced 
> xml markup.  The denotation of a plain literal remains - it is a 
> sequence of unicode characters - permitting string comparison for 
> equality testing.
I proposed something like this earlier, but got backlash at
the loss of distinction between things that are text+markup
versus things that simply look like text+markup.
C.f. http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0165.html
> Advantages:
> 
> I think this provides everything that Martin has been asking for:
> 
>    - no discontinuity between plain and xml literals
>    - lang on mixed content
>    - no use of datatypes
> 
> Disadvantages:
> 
> - a bigger change than alternatives
> - builds XML into the core of the RDF model
???
Can you elaborate?
> - breaks current implementations (but see below)
> 
> Ameliorating the Disadvantages - implementation strategy
> 
> The above design says that e.g. "<" is not in the lexical 
> space of plain 
> literals,
Why not? How could you constrain the infinite set of strings in 
such a fashion?
> and many (all?) current implementations will store
> "<" in their representation of a graph.  The point to note is that 
> implementations are free to represent literals any way they please. 
> Thus "<" is just the way this implementation represents the 
> literal "<".
And how then do I represent the literal "<", distinct from "<"???
Sorry, but I'm just gonzo confused...
> The implementation does need to distinguish between markup and plain 
> text.  To do this, it adds a single bit to literals to 
> indicate whether 
> they are stored in escaped or unescaped form.  The above 
> example was in 
> unescaped form, which cannot represent markup.  To represent 
> markup, the 
> literal must be be stored in escaped form.
So, we've got the XML bit back?
> Literal comparison becomes more complex - literals stored in 
> unescaped 
> form should first be escaped and then canonicalized.  Various 
> optimization strategies can be employed here.
> 
> By this strategy, It may be possible to argue that this approach does 
> not break current implementations of plain literals.  It simply makes 
> clearer what xml literals are.
Well, it has certainly confused me...
Patrick
Received on Monday, 8 September 2003 09:29:34 UTC