Re: I18N Issue alternative: collapsing plain and xml literals from Brian McBride on 2003-09-08 (w3c-rdfcore-wg@w3.org from September 2003)

From: Brian McBride <bwm@hplb.hpl.hp.com>
Date: Mon, 08 Sep 2003 17:22:13 +0100
To: Patrick.Stickler@nokia.com
Cc: w3c-rdfcore-wg@w3.org, duerst@w3.org, ishida@w3.org, w3c-i18n-ig@w3.org
Message-ID: <3F5CACB5.1050702@hplb.hpl.hp.com>

Patrick.Stickler@nokia.com wrote:

[...]

>>This is an alternative design for literals.  The  idea is to drop the 
>>rdf:XMLLiteral datatype and allow plain literals to contain 
>>markup.  Two 
>>test cases illustrate:
>>
>><rdf:Description>
>>   <eg:prop>foo &lt;br /&gt; bar</eg:prop>
>></rdf:Description
>>
>>parses to:
>>
>>_:a eg:prop "foo &lt;br /&gt; bar" .
> 
> 
> This is not what I would expect, given how XML works.

The idea is that literals in a graph are XML (fragments).  "<" is not a 
legal xml fragment, but "&lt;" is.

[...]


[...]

> 
> I proposed something like this earlier, but got backlash at
> the loss of distinction between things that are text+markup
> versus things that simply look like text+markup.
> 
> C.f. http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2003Jul/0165.html

Hmm, that's a bit different, but you do remind me of another issue that 
may be important to you.  In the design I described, xml literals are 
not a dataype, which wouldn't support the subclassing ideas you were 
proposing as a possible future extension.

[...]

>>
>>Disadvantages:
>>
>>- a bigger change than alternatives
>>- builds XML into the core of the RDF model
> 
> 
> ???
> 
> Can you elaborate?

All literals are required to balanced XML.  A graph that contained "<" 
would not be a legal graph.  Anyone building a strict inferencing engine 
would need to be able to check that they had a valid graph so would need 
to check that a literal was indeed a well formed xml fragment.

> 
> 
>>- breaks current implementations (but see below)
>>
>>Ameliorating the Disadvantages - implementation strategy
>>
>>The above design says that e.g. "<" is not in the lexical 
>>space of plain 
>>literals,
> 
> 
> Why not? 

To ensure we can distinguish text and markup.

How could you constrain the infinite set of strings in
> such a fashion?

Same way xml does - by fiat of the spec.

> 
> 
>>and many (all?) current implementations will store
>>"<" in their representation of a graph.  The point to note is that 
>>implementations are free to represent literals any way they please. 
>>Thus "<" is just the way this implementation represents the 
>>literal "&lt;".
> 
> 
> And how then do I represent the literal "<", distinct from "&lt;"???
> 
> Sorry, but I'm just gonzo confused...

Yup - I must have explained it real well, or maybe its me who is confused.

Wherever you have "<" as a literal with the current design, we replace 
it with "&lt;".  One way to think of it is that "&lt;" denotes "<". 
Formally doing that introduces some complexities into the semantics, so 
I was kinda hoping that could be avoided though.  Maybe we can't.  The 
idea is that application should treat "&lt;" as if it were "<".
> 
> 
>>The implementation does need to distinguish between markup and plain 
>>text.  To do this, it adds a single bit to literals to 
>>indicate whether 
>>they are stored in escaped or unescaped form.  The above 
>>example was in 
>>unescaped form, which cannot represent markup.  To represent 
>>markup, the 
>>literal must be be stored in escaped form.
> 
> 
> So, we've got the XML bit back?

That's one way to implement this, yes.  But the XML bit is not in the 
spec.  This is a possible implementation strategy.

[...]

> 
> Well, it has certainly confused me...

Me too probably.

Brian

Received on Monday, 8 September 2003 14:18:00 UTC