Re: Summary of strings, markup, and language tagging in RDF (resend) from Martin Duerst on 2003-06-29 (w3c-rdfcore-wg@w3.org from June 2003)

From: Martin Duerst <duerst@w3.org>
Date: Sun, 29 Jun 2003 08:48:32 -0400
To: Graham Klyne <gk@ninebynine.org>, Dan Connolly <connolly@w3.org>
Cc: w3c-i18n-ig@w3.org, "Ralph R. Swick" <swick@w3.org>, misha.wolf@reuters.com, Tim Berners-Lee <timbl@w3.org>, w3c-rdfcore-wg@w3.org
Message-Id: <4.2.0.58.J.20030629081641.04b0eb78@localhost>

Hello Graham,

At 18:53 03/06/27 +0100, Graham Klyne wrote:

>Speaking for myself, and my understanding of our discussion...
>
>What I found "distasteful" was the suggestion that one would have to look 
>*inside* the content of a literal to figure out what type it is.

Obviously, to find out whether it is text with markup or text
without markup, one way is to look inside. Another way would be
to disallow rdf:parseType='Literal' on pure text strings.

>In discussion, I understood the request to be for:
>
>[[
><dc:title rdf:parseType='Literal'>
>   A Midsummer Night's Dream
></dc:title>
>]]
>
>to denote a plain string literal, but
>
>[[
><dc:title rdf:parseType='Literal'>
>   <em>A Midsummer Night's Dream</em>
></dc:title>
>]]
>
>to be a completely different kind of literal denoting an XML document in 
>some way (because of the presence of markup).
>
>(I originally read Martin's note to suggest that an XML document is itself 
>just a string of Unicode characters, not distinguished from non-XML 
>strings.  That is a position I could support but with which others have 
>expressed concerns.)

Can we please make sure that we separate syntax and semantics?

XML is defined as a syntax on a sequence of Unicode characters,
so treating it as such in a particular implementation,... is
possible. If you are a bit careful with escaping, you can store
text without markup in the same form. Other implementations are
easily possible (for example, one could observe that "<>" is illegal
in XML, and thus use "<>" to escape '<', and not escape &, and
use '""' to escape '"' in an attribute. This would no longer look
like XML, but would store the same information).

For RDF to say that XML is *treated* as a string of Unicode characters
is perfectly okay. For RDF to say that XML *is* nothing but a string
of Unicode characters is a bad idea.

What is important is that the same semantic things, i.e.:
- Text (without markup or language information)
- Text with language information (but no markup)
- Text with markup (but no language info)
- Text with markup and language information
are in each of the above cases recognized as being the same rather
than being split up in a number of different things based on some
representational details. On top of that, recognizing the continuity
between the four variants above and making it easy to deal with
this continuity would be a definite plus.

Regards,    Martin.

Received on Sunday, 29 June 2003 09:00:26 UTC