Bad job on literals?

At the last telecon we briefly discussed the issue related to the 
semantics of literals.

Per F2F decision, the literals have three components (unicode string, 
language tag, and a bit). This representation may not be the best. Here 
are several concerns:

(1) Interpretation

It is unclear what the literals represent. It seems that a literal can 
denote

	a) a character string
	b) a word in a natural language
	c) an XML tree
	d) an abstract structure that consists of a string,
            a tag, and a bit.	

Choice d) seems ugly if we think of RDF as a foundation for the SW. If 
we go for a)-c), then the literals become polymorphic... Furthermore, 
defining rules for comparing trees and words seems counterproductive.

(2) Extensibility

The language tags keep evolving. How do we accommodate new language 
encoding schemes gracefully?

The current XML standard may be surpassed. How do we indicate what 
particular XML encoding or canonical form (or maybe a completely 
different graph-like structure) is used?


In short, I think that we might be doing a bad job on literals. I'm 
afraid that additional difficulties may arise in datatyping (e.g., we 
might need to deal with XML trees in lexical spaces of datatypes).

BTW, did TimBL and DanC, the original issue raisers, finally take a 
position to the F2F decision (comp. [1])? Unfortunately, I missed that 
F2F. A cleaner solution might be/have been to leave literals as strings 
and to use bNodes with special properties for representing words and XML 
structures.

Sergey

[1] http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Apr/0410.html

Received on Tuesday, 21 May 2002 10:34:31 UTC