Re: Summary of strings, markup, and language tagging in RDF (resend) from pat hayes on 2003-07-02 (w3c-rdfcore-wg@w3.org from July 2003)

From: pat hayes <phayes@ihmc.us>
Date: Wed, 2 Jul 2003 00:12:20 -0500
To: Martin Duerst <duerst@w3.org>
Cc: w3c-rdfcore-wg@w3.org
Message-Id: <p06001211bb2815591cee@[10.0.100.7]>
>Hello Graham,
>
>At 18:53 03/06/27 +0100, Graham Klyne wrote:
>
>>Speaking for myself, and my understanding of our discussion...
>>
>>What I found "distasteful" was the suggestion that one would have 
>>to look *inside* the content of a literal to figure out what type 
>>it is.
>
>Obviously, to find out whether it is text with markup or text
>without markup, one way is to look inside. Another way would be
>to disallow rdf:parseType='Literal' on pure text strings.

I don't think that makes sense.  The only function of 
rdf:parseType='Literal' in the RDF/XML syntax is to tell the parser 
to include some well-balanced XML as an object in the RDF graph. So 
if it is applied to something that cannot be so parsed, then the 
result is illegal RDF/XML; and if it can be so parsed, then what 
results *is* an XML literal.

>
>
>>In discussion, I understood the request to be for:
>>
>>[[
>><dc:title rdf:parseType='Literal'>
>>   A Midsummer Night's Dream
>></dc:title>
>>]]
>>
>>to denote a plain string literal, but
>>
>>[[
>><dc:title rdf:parseType='Literal'>
>>   <em>A Midsummer Night's Dream</em>
>></dc:title>
>>]]
>>
>>to be a completely different kind of literal denoting an XML 
>>document in some way (because of the presence of markup).
>>
>>(I originally read Martin's note to suggest that an XML document is 
>>itself just a string of Unicode characters, not distinguished from 
>>non-XML strings.  That is a position I could support but with which 
>>others have expressed concerns.)
>
>Can we please make sure that we separate syntax and semantics?

I sincerely wish we could.

>
>XML is defined as a syntax on a sequence of Unicode characters,
>so treating it as such in a particular implementation,... is
>possible. If you are a bit careful with escaping, you can store
>text without markup in the same form. Other implementations are
>easily possible (for example, one could observe that "<>" is illegal
>in XML, and thus use "<>" to escape '<', and not escape &, and
>use '""' to escape '"' in an attribute. This would no longer look
>like XML, but would store the same information).
>
>For RDF to say that XML is *treated* as a string of Unicode characters
>is perfectly okay. For RDF to say that XML *is* nothing but a string
>of Unicode characters is a bad idea.

OK, that is one perspective, I agree. You seem to have just said that 
there is a clear conceptual distinction between a sequence of Unicode 
characters and an XML document: the latter might be *treated* as the 
former, but one should not say it *is* that.  That is exactly the 
intuition which suggests that XML documents, on the one hand, and 
Unicode character strings, on the other, are distinct kinds of entity 
and that a semantics therefore needs to keep them distinct. So for 
example on this view, there is a distinction *in kind* between a mere 
character string (such as a plain untyped literal in RDF, which 
refers to itself) and an XML document (such as the referent of an XML 
typed literal in RDF), and a semantics should therefore not identify 
them even when they are (considered now merely as strings of 
characters) indistinguishable from one another on their face, as it 
were.  But I have read your views in other messages as being quite 
opposed to making distinctions like this, so I am left puzzled as to 
what your views are.

>What is important is that the same semantic things

?? the following are all *syntactic* things: kinds of text, it seems 
to me. The *semantic* distinctions arise when one asks what these 
various things *mean* (denote, refer to; whatever). The question of 
semantics doesn't arise until we get to a language which has a 
semantics defined on it (which XML notoriously does not).

>, i.e.:
>- Text (without markup or language information)
>- Text with language information (but no markup)
>- Text with markup (but no language info)
>- Text with markup and language information
>are in each of the above cases recognized as being the same rather
>than being split up in a number of different things based on some
>representational details.

I cannot understand what you are saying. Is your point that these 
four categories are the same category, or that they are essentially 
distinct? For example, consider a string of characters regarded as 
text, eg 'A Midsummer Night's Dream' (that is 25 consecutive 
characters, three of which are whitespace and one an apostrophe), and 
the same string of characters labelled as, say Italian: do you want 
these to be thought of as distinct entities or the same entity? Do 
you think of text with markup as in a distinct category from text 
without markup?

In case you are wondering, the answers to questions like these are 
not obvious, and I genuinely have no idea what your answers will be.

Pat Hayes
-- 
---------------------------------------------------------------------
IHMC	(850)434 8903 or (650)494 3973   home
40 South Alcaniz St.	(850)202 4416   office
Pensacola			(850)202 4440   fax
FL 32501			(850)291 0667    cell
phayes@ihmc.us       http://www.ihmc.us/users/phayes
Received on Wednesday, 2 July 2003 01:12:23 UTC