A basis for convergence and closure?

After a few comments, I outline below a possible scenario by which
we may reach convergence and closure for the datatyping challenge.

I've tried to state questions, actions, and decisions explicitly
so that we can assess to some degree whether there would be
consensus about such a scenario.

[Please read it slowly, and count to ten before posting any
 replies...  you all should know by now how first readings
 of my posts can be misleading...  ;-) and please read all of
 it before firing off comments on idividual points which
 may be clarified further on or may become clear on a second
 reading... Thanks]

> ... I could live with S-P only. Again, CC/PP and DAML folks
> might oppose trashing S-B. I wouldn't like disposing of S-A completely
> just for the reason that S-A supports datatyping decoupled from lexical
> representation, and S-P does not. However, I could live without S-A and
> without S-B.

That's good news, as I think it means we are reaching
that much craved convergence.

I agree that the S-P/TDL-local idiom is the least problemmatic
of the lot.

We do, however, have to have a global/implicit idiom, or else
we can't express constraints/expectations on datatypes for
property values.

So we need either TDL-global or S-B, or something like that.

Jeremy's revised global idiom based on S-P/TDL-local, making
rdf:type optional seems the most promising.

Per Pat's latest comments about the interpretation of literals
in the S-B/TDL-global idiom, perhaps we can get round them
by just dropping those idioms entirely.

That said.....

--


It seems that, from all the past couple of weeks discussions,
there are the following characteristics on folks wish lists
(this is a partial recap of some of the desiderada):

1. A working MT (duh ;-)
2. Tidy literals
3. A global/implicit idiom
4. A local/explicit idiom
5. Same vocabulary valid for both local and global idioms
6. Free combination of local and global idioms without conversion
7. The ability to conduct queries by value
8. The ability to conduct queries by literal
9. Datatype URIs denote the entire datatype, as defined by
   the datatype "owner", not only one of its components

It seems to me that (backwards compatability issues aside
for the moment) that the following may make everyone happy:

   Take TDL (sans present MT) with its present local idiom,
   is also the S-P idiom (apart from the designation of the
   datatype URI)

   Replace the global idiom with Jeremy's proposed bNode
   global idiom, which is a derivative of the local idiom
   with rdf:type omitted

   Make literals tidy (untidyness is born by the bNodes)

   Extend the RDF vocabulary to include the property rdf:dtype
   which is an rdfs:subPropertyOf rdf:type, and which is
   to be used by both local and global idioms (I think this is
   a more conservative choice than adopting a completely separate
   vocabulary per Pat's recommendation)

   State for the benefit of the XML Schema community that
   datatype URIs in this solution denote the whole datatype
   as defined by the datatype owner with no extension or
   modification. The datatype simply serves as the context of
   interpretation for a typed literal.

   Fix/extend/refine the TDL MT to take these changes into
   account and make it all work ;-)

Thus, we have global and local datatyping idioms that look like the
following:

   Bob ex:age _:1 .
   _:1 rdf:value "30" .
   ex:age rdfs:range xsd:integer .

   Mary ex:age _:2 .
   _:2 rdf:value "30" .
   _:2 rdf:dtype xsd:integer .

where the literal "30" is a tidy literal shared by
both rdf:value statements and they live happily in
the same knowledge base with the same vocabulary with
no problems, and have a consistent and symmetrical
graph representation.

This, I believe, meets all the items in the above defined
wishlist (presuming the working MT of course ;-)

--

BACKWARDS COMPATABILITY:

Issue 1: Intuitive use of old-style global idiom

The old-style global idiom (Bob ex:age "30") would be
considered a contracted form of the new style global
idiom, which is more convenient for users to manually
edit and view.

The expansion of the old-style contracted idiom to
the new bNode global idiom would be performed by the
parser, just as are all contracted forms in the RDF/XML
(or by an external transformation for legacy parsers).

Thus

   <rdf:Description rdf:ID="Bob">
      <ex:age>30</ex:age>
   </rdf:Description>

will produce the two triples

   Bob ex:age _:1 .
   _:1 rdf:value "30" .

rather than the single triple

   Bob ex:age "30" .

It would be acceptable for a parser to have an option for
generating the old-style single triples in order to support
legacy systems (see immediatly below) though such behavior
would be deprecated and not the default.

--

Issue 2: Queries on non-datatyped literal values

It has been clarified, I feel, that one can make both
literal based and value based queries on TDL datatyped
graphs simply by whether or not the query ignores or
takes into account the datatyping. Thus, with minor
tweaks to existing query APIs, legacy systems based
on literal equality tests will continue to work fine
with this proposed convergence solution.

Nevertheless, if that is not acceptable to all, then
if we still can have in the graph, in addition to the bNode
global and local idioms, statements such as

   Fred ex:age "30" .

then the literal "30" is the same literal as in
the two rdf:value statements of the bNode datatyping
idioms, since literal nodes are tidy, but it does
*not* denote an integer insofar as the RDF defined
interpretation is concerned as the statement does not
conform to either of the datatyping idioms (this
is a crucial distinction, think about it and keep
reading).

As pointed out in issue 1 above, a parser can provide
backwards compatible ntriples generation (or one can
use a legacy parser ;-) to continue using RDF without
datatyping.

And both the old-style global idiom and the bNode
datatyping idioms can coexist in the same graph
with no problems -- as queries based on datatyped
values would simply disregard the non-datatyped
literal values, and likewise queries based on literals
would disregard the bNode isolated literal values.

(warning, MT rapids ahead... life jackets on... ;-)

This coexistence of course requires the MT to exclude
literals from datatyping interpretation by rdfs:range such
that rdfs:range only asserts datatyping for non-literal
property values: either bNodes or URIrefs.

QUESTION: Can the MT exclude literals in the
          datatyping interpretation of rdfs:range?

By allowing the old-style global, or basic, idiom, this
allows folks who are treating literals as having globally
consistent meaning to continue doing so, regardless of any
range defined datatyping, and to conduct their queries
in terms of literal string equality, etc.

Thus, current practices and systems are not impacted in
any way by the datatyping solution at all.

Those that want datatyping must use the bNode idioms and
datatyped values expressed by those idioms have no
misinterpretation of meaning by literal-comparison queries.

--

Issue 3: Old-style global idioms with rdfs:range datatyping

Per the treatment of the old-style global idiom as a contracted
form of the bNode global idiom, there is no problem with
supporting legacy RDF instances which employ both the old-style
global and local idioms (e.g. DAML) since both receieve a
consistent representation in the graph with a consistent
interpretation from the MT, and query APIs supporting this
datatyping solution will have a consistent foundation to
conduct queries.

--

OK, that's pretty much it. I guess it's time to duck and cover ;-)

Patrick

--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com

Received on Tuesday, 5 February 2002 06:43:17 UTC