- From: <Patrick.Stickler@nokia.com>
- Date: Wed, 10 Sep 2003 12:21:47 +0300
- To: <Patrick.Stickler@nokia.com>, <gk@ninebynine.org>, <danbri@w3.org>, <bwm@hplb.hpl.hp.com>
- Cc: <w3c-rdfcore-wg@w3.org>, <Patrick.Stickler@nokia.com>
> along the lines > indicated in > [1] (also described earlier, with some embellishment, by > Patrick [2]), and > have not become aware of any fundamental problem with it. The one problem that I've seen identified with our approach is that it loses the distinction between text with markup and text that just looks like text with markup. I.e., at the end of the day, we just have strings, not XML. With Brian's/Dan's approach, we'd just have XML, not strings. Here's a twist on the two approaches which may promises to give us the semantic distinction between text and markup (i.e. both XML and strings) but with a consistent treatment in the graph: 0. RDF/XML provides for the expression of both plain literals and XML literals, with optionally associated language tag, via xml:lang scoping (i.e. no more rdf:XMLLiteral datatype). 1. All non-typed literals in the graph are canonicalized XML with optionally associated language tag. 2. If in the RDF/XML, for plain literals (where parseType="Literal" is not specified) then the content is *converted* to an XML-literal form, escaping all necessary characters, then canonicalized, as part of the mapping to its graph representation. 3. If parseType="Literal" is specified, then it is presumed to already be in an XML-legal form and is simply canonicalized on its way to the graph. Thus: <rdf:Desription rdf:about="#something" xmlns:ex="http://example.com/" ex:p0="abc" ex:p1="2 > 3" ex:p2="2 > 3" ex:p3="xxx <br/> zzz" ex:p4="xxx <br/> zzz"> <ex:p5>2 > 3</ex:p5> <ex:p6 xml:lang="en">xxx <br/> zzz</ex:p6> <ex:p7 parseType="Literal">xxx <br/> zzz</ex:p7> <ex:p8 parseType="Literal">xxx <br/> zzz</ex:p8> <ex:p9 parseType="Literal">xxx &lt;br/&gt; zzz</ex:p9> <ex:p10 parseType="Literal">abc</ex:p10> </rdf:Description> gives us <#something> <ex:p0> "abc" ; <ex:p1> "2 > 3" ; <ex:p2> "2 > 3" ; <ex:p3> "xxx <br/> zzz" ; <ex:p4> "xxx <br/> zzz" ; <ex:p5> "2 > 3" ; <ex:p6> "xxx <br/> zzz"@en ; <ex:p7> "xxx <br></br> zzz" ; <ex:p8> "xxx <br/> zzz" ; <ex:p9> "xxx &lt;br/&gt; zzz" ; <ex:p10> "abc" . Thus (in terms of their values): p1 = p2 = p5 p3 = p4 = p8 (p3|p4|p8) != p6 p0 = p10 etc. The benefit of adding the conversion/escaping on plain literals is that *all* legacy RDF/XML remains legal, even though RDF applications (or ideally, RDF APIs) will need to add the reverse conversions/unescaping to provide the plain strings for applications that only want plain strings. And since the escaping will protect the non-markup based semantics of any markup characters occurring literally in plain strings, there is no confusion when comparing strings with markup and strings that just look like they have markup since they won't be the same string in the graph. I.e. in RDF/XML in graph plain: "<b>foo</b>" "<b>foo</b>" XML: "<b>foo</b>" "<b>foo</b>" It also addresses the equivalence of plain literals and XML literals without markup, which denote the same thing, even though one is specified as a plain literal and the other as an XML literal, since plain literals in the RDF/XML are always converted to XML literals in the graph and thus their equivalence becomes apparent. I.e. in RDF/XML in graph plain: "<b>foo</b>" "<b>foo</b>" XML: "<b>foo</b>" "<b>foo</b>" plain "abc" "abc" XML: "abc" "abc" Eh? Patrick
Received on Wednesday, 10 September 2003 05:22:01 UTC