Re: Bad job on literals?

[freed from spam trap -rrs]

Date: Tue, 21 May 2002 18:28:43 -0400 (EDT)
Message-Id: <a0510030bb9107995001b@[65.217.30.61]>
To: Sergey Melnik <melnik@db.stanford.edu>
From: pat hayes <phayes@mail.coginst.uwf.edu>
Cc: w3c-rdfcore-wg@w3.org

>At the last telecon we briefly discussed the issue related to the 
>semantics of literals.
>
>Per F2F decision, the literals have three components (unicode 
>string, language tag, and a bit). This representation may not be the 
>best. Here are several concerns:
>
>(1) Interpretation
>
>It is unclear what the literals represent. It seems that a literal can denote
>
>	a) a character string
>	b) a word in a natural language
>	c) an XML tree
>	d) an abstract structure that consists of a string,
>            a tag, and a bit.
>
>Choice d) seems ugly if we think of RDF as a foundation for the SW. 
>If we go for a)-c), then the literals become polymorphic... 
>Furthermore, defining rules for comparing trees and words seems 
>counterproductive.
>
>(2) Extensibility
>
>The language tags keep evolving. How do we accommodate new language 
>encoding schemes gracefully?
>
>The current XML standard may be surpassed. How do we indicate what 
>particular XML encoding or canonical form (or maybe a completely 
>different graph-like structure) is used?
>
>
>In short, I think that we might be doing a bad job on literals. I'm 
>afraid that additional difficulties may arise in datatyping (e.g., 
>we might need to deal with XML trees in lexical spaces of datatypes).
>
>BTW, did TimBL and DanC, the original issue raisers, finally take a 
>position to the F2F decision (comp. [1])? Unfortunately, I missed 
>that F2F.

I also missed it. My understanding of the decision was that a literal 
is best thought of as a unicode character string plus some additional 
decorations whose function is to record XML-specific syntactic 
information which has no RDF semantic content but which RDF 
nevertheless needs to record in order to properly permit 
round-tripping from XML. In particular, both your b and c are ruled 
out: the correct syntactic answer is d, but the model theory can 
treat d as though it were a.

>  A cleaner solution might be/have been to leave literals as strings 
>and to use bNodes with special properties for representing words and 
>XML structures.

That would be cleaner in the RDF graph but would probably break the RDF/XML.

Pat



-- 

Received on Wednesday, 22 May 2002 08:23:14 UTC