- From: <Patrick.Stickler@nokia.com>
- Date: Mon, 8 Sep 2003 16:29:10 +0300
- To: <bwm@hplb.hpl.hp.com>, <w3c-rdfcore-wg@w3.org>
- Cc: <duerst@w3.org>, <ishida@w3.org>, <w3c-i18n-ig@w3.org>, <Patrick.Stickler@nokia.com>
> -----Original Message-----
> From: ext Brian McBride [mailto:bwm@hplb.hpl.hp.com]
> Sent: 08 September, 2003 14:08
> To: rdf core
> Cc: Martin Duerst; Richard Ishida; i18n
> Subject: I18N Issue alternative: collapsing plain and xml literals
>
>
>
> After discussing this informally over lunch, Danbri asked me
> to send it
> to the list to make our consideration of it explicit.
>
> This is an alternative design for literals. The idea is to drop the
> rdf:XMLLiteral datatype and allow plain literals to contain
> markup. Two
> test cases illustrate:
>
> <rdf:Description>
> <eg:prop>foo <br /> bar</eg:prop>
> </rdf:Description
>
> parses to:
>
> _:a eg:prop "foo <br /> bar" .
This is not what I would expect, given how XML works.
I would expect
_:a eg:prop "foo <br /> bar" .
If you wanted
_:a eg:prop "foo <br /> bar" .
you'd need to say
<rdf:Description>
<eg:prop>foo &lt;br /&gt; bar</eg:prop>
</rdf:Description
> <rdf:Description>
> <eg:prop rdf:parseType="Literal"><br /></eg:prop>
> </rdf:Description>
>
> parses to:
>
> _:a eg:prop "foo <br></br> bar" .
>
> The definition of a plain literal changes. The lexical space
> of plain
> literal becomes the lexical space of rdf:XMLLiteral, i.e. is
> restricted
> to (the unicode representation of) canonicalised well formed balanced
> xml markup. The denotation of a plain literal remains - it is a
> sequence of unicode characters - permitting string comparison for
> equality testing.
I proposed something like this earlier, but got backlash at
the loss of distinction between things that are text+markup
versus things that simply look like text+markup.
C.f. http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0165.html
> Advantages:
>
> I think this provides everything that Martin has been asking for:
>
> - no discontinuity between plain and xml literals
> - lang on mixed content
> - no use of datatypes
>
> Disadvantages:
>
> - a bigger change than alternatives
> - builds XML into the core of the RDF model
???
Can you elaborate?
> - breaks current implementations (but see below)
>
> Ameliorating the Disadvantages - implementation strategy
>
> The above design says that e.g. "<" is not in the lexical
> space of plain
> literals,
Why not? How could you constrain the infinite set of strings in
such a fashion?
> and many (all?) current implementations will store
> "<" in their representation of a graph. The point to note is that
> implementations are free to represent literals any way they please.
> Thus "<" is just the way this implementation represents the
> literal "<".
And how then do I represent the literal "<", distinct from "<"???
Sorry, but I'm just gonzo confused...
> The implementation does need to distinguish between markup and plain
> text. To do this, it adds a single bit to literals to
> indicate whether
> they are stored in escaped or unescaped form. The above
> example was in
> unescaped form, which cannot represent markup. To represent
> markup, the
> literal must be be stored in escaped form.
So, we've got the XML bit back?
> Literal comparison becomes more complex - literals stored in
> unescaped
> form should first be escaped and then canonicalized. Various
> optimization strategies can be employed here.
>
> By this strategy, It may be possible to argue that this approach does
> not break current implementations of plain literals. It simply makes
> clearer what xml literals are.
Well, it has certainly confused me...
Patrick
Received on Monday, 8 September 2003 09:29:34 UTC