Bermudan flowers: Query, I18N, and syntax vs semantics

Suggestion: migrate whole datatyping discussion into syntax and out of model
theory.

I'm feeling increasingly disheartened with both proposals.
"S is bad, and so is TDL[*]".

I find the latest issue B10 (say what you mean) pertinent particularly when
cast (as Brian has done) as an I18N issue. It is an issue against both
proposals. I would not be surprised if the I18N WG saw it as a cannot live
with issue for both TDL & S (not that they blocked XML Schema Datatypes?).

I find comments about query and datatypes (e.g. Libby's on rdf-comments, or
Andy Seaborne's private communication) as equally a curse on both proposals.
(Summary of what I understand of Andy's position: query is over the
datamodel. Both proposals have more than one way of expressing the same
data, hence making query substantially more difficult).

I found Pat's Bermuda triangle posting  encouraging in that Pat was prepared
to identify and retract some shared assumptions that the whole group has
been making.

My understanding of Pat's message amounted to:
- use rdf:type and rdfs:range for value space distinctions
- use rdf:dType for lexical=>value mapping distinctions.

It seems to me that the value space is the plausible global range, whereas
the lexical=>value mapping is a more local thing. However, it seems very
inconvenient to only permit a local mechanism for typing. A document scope
method of lexical=>value mapping seems the most obvious to me.

Sergey's flower power message suggested syntactic changes with increased
difference between the XML document and the abstract syntax.

Our abstract syntax is a partially labelled graph.

We could allow arbitrary typed values as node labels. e.g. in the B10
example the nodes are labelled with the number 10.5 and not with the string
"10.5" nor the string "10,5".

If we were to do that then in a serialization of RDF each label would be
represented as a pair, a datatype URI and a string. The value would be found
by evaluating the datatype mapping on the string. Any of our local idiom
proposals could be read in this fashion. S-A is the most direct such local
idiom, but the Daml idiom is also plausible.

None of the global idioms work in this view; because the global idioms are
global, i.e. worldwide. If the global idiom interprets "10,500" as ten and a
half thousand then it does not interpret it as ten and a half.

The important global idiom is the single triple idiom (+ 'global' range
constraint) e.g.
_:b <age> "30.5".

Sergey's flower power message showed that we could propose a syntactic
change that still allowed the use of this direct property style by the
document author, but assigned to it a richer abstract graph, that better
permitted features (such as [un]tidiness that were important).

How about replacing the global idiom with a document scope type declaration
idiom, possibly with an include mechanism.

e.g.

<rdf:RDF>

<!-- Datatype definitions
   No triples correspond to this part.
   This is merely a syntactic instruction,
-->
   <rdf:dataProperty rdf:about="http://example.org/#age"
               rdf:dRange="xsd:integer" />

<!-- data. This generates a triple. -->
    <rdf:Description eg:age="20" />

</rdf:RDF>


In a datatype aware processor the triple:

_:b <eg:age> 20 .

is produced. The 20 is a number not a string, and can only be represented
for non xsd:integer aware processors as a pair (<xsd:integer>,"20").

An old processor doesn't know this and produces three triples:

<eg:age> <rdf:type> <rdf:dataProperty> .
<eg:age> <rdf:dRange> "xsd:integer" .
_:b <eg:age> "20" .


=====

Summary of idea:
   Use true typed values as node labels in abstract syntax.
   Use purely syntactic means to generate these labels (built on top of
XSD).


I'm sure we've seen this sort of idea before ...


Jeremy


[*] (just not quite as much)

Received on Monday, 4 February 2002 10:18:21 UTC