- From: <Patrick.Stickler@nokia.com>
- Date: Mon, 8 Sep 2003 16:29:10 +0300
- To: <bwm@hplb.hpl.hp.com>, <w3c-rdfcore-wg@w3.org>
- Cc: <duerst@w3.org>, <ishida@w3.org>, <w3c-i18n-ig@w3.org>, <Patrick.Stickler@nokia.com>
> -----Original Message----- > From: ext Brian McBride [mailto:bwm@hplb.hpl.hp.com] > Sent: 08 September, 2003 14:08 > To: rdf core > Cc: Martin Duerst; Richard Ishida; i18n > Subject: I18N Issue alternative: collapsing plain and xml literals > > > > After discussing this informally over lunch, Danbri asked me > to send it > to the list to make our consideration of it explicit. > > This is an alternative design for literals. The idea is to drop the > rdf:XMLLiteral datatype and allow plain literals to contain > markup. Two > test cases illustrate: > > <rdf:Description> > <eg:prop>foo <br /> bar</eg:prop> > </rdf:Description > > parses to: > > _:a eg:prop "foo <br /> bar" . This is not what I would expect, given how XML works. I would expect _:a eg:prop "foo <br /> bar" . If you wanted _:a eg:prop "foo <br /> bar" . you'd need to say <rdf:Description> <eg:prop>foo &lt;br /&gt; bar</eg:prop> </rdf:Description > <rdf:Description> > <eg:prop rdf:parseType="Literal"><br /></eg:prop> > </rdf:Description> > > parses to: > > _:a eg:prop "foo <br></br> bar" . > > The definition of a plain literal changes. The lexical space > of plain > literal becomes the lexical space of rdf:XMLLiteral, i.e. is > restricted > to (the unicode representation of) canonicalised well formed balanced > xml markup. The denotation of a plain literal remains - it is a > sequence of unicode characters - permitting string comparison for > equality testing. I proposed something like this earlier, but got backlash at the loss of distinction between things that are text+markup versus things that simply look like text+markup. C.f. http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0165.html > Advantages: > > I think this provides everything that Martin has been asking for: > > - no discontinuity between plain and xml literals > - lang on mixed content > - no use of datatypes > > Disadvantages: > > - a bigger change than alternatives > - builds XML into the core of the RDF model ??? Can you elaborate? > - breaks current implementations (but see below) > > Ameliorating the Disadvantages - implementation strategy > > The above design says that e.g. "<" is not in the lexical > space of plain > literals, Why not? How could you constrain the infinite set of strings in such a fashion? > and many (all?) current implementations will store > "<" in their representation of a graph. The point to note is that > implementations are free to represent literals any way they please. > Thus "<" is just the way this implementation represents the > literal "<". And how then do I represent the literal "<", distinct from "<"??? Sorry, but I'm just gonzo confused... > The implementation does need to distinguish between markup and plain > text. To do this, it adds a single bit to literals to > indicate whether > they are stored in escaped or unescaped form. The above > example was in > unescaped form, which cannot represent markup. To represent > markup, the > literal must be be stored in escaped form. So, we've got the XML bit back? > Literal comparison becomes more complex - literals stored in > unescaped > form should first be escaped and then canonicalized. Various > optimization strategies can be employed here. > > By this strategy, It may be possible to argue that this approach does > not break current implementations of plain literals. It simply makes > clearer what xml literals are. Well, it has certainly confused me... Patrick
Received on Monday, 8 September 2003 09:29:34 UTC