Re: Datatyping Summary V4 from Patrick Stickler on 2002-02-05 (w3c-rdfcore-wg@w3.org from February 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Tue, 05 Feb 2002 13:50:52 +0200
To: ext Sergey Melnik <melnik@db.stanford.edu>, Brian McBride <bwm@hplb.hpl.hp.com>
CC: RDF Core <w3c-rdfcore-wg@w3.org>
Message-ID: <B88593BC.D3D6%patrick.stickler@nokia.com>
On 2002-02-05 5:29, "ext Sergey Melnik" <melnik@db.stanford.edu> wrote:


> If the schema designers (e.g. of DublinCore) want to ensure that all
> three idioms S-A, S-B, and S-P are usable with a given property (e.g.
> dc:Date), they can simply define the range of the property as a UNION of
> xsd:date.val, xsd:date.lex and xsd:date.map. These three sets are
> disjoint, so no clash can occur.

But is that union then not another datatype?

Or are you saying that the URI of the datatype itself is
interpreted as a union of the members of its components
(lexical space, value space, and mapping)?

> Moreover, the schema designers have fine-grained control with respect to
> the lexical representations that each compliant DublinCore application
> *must* support. For example, imagine that there is another datatype for
> date, say uml:date, that shares the value space of xsd:date, but uses a
> different (disjoint) lexical representation. To enforce that each
> DublinCore application can handle both lexical forms we can make the
> range of dc:Date a union of xsd:date.val (=uml:date.val), xsd:date.lex,
> xsd:date.map, uml:date.lex, and uml:date.map.

But this only works if we don't get any new datatypes, and
can agree on the single standardized union.

Otherwise, a set of ranges is an intersection, not a union.

I think that this proposed approach presumes far far more
control over the data than anyone ever will have or has
ever had.

> If, in contrast,
> uml:date.lex and xsd:date.lex clash in some incompatible way, the range
> of dc:Date could comprise just xsd:date.map and uml:date.map, or a union
> of xsd:date.val, xsd:date.lex, xsd:date.map, and uml:date.map.
> 
> No "second" property is needed in the above examples.

But the reality is that DC does not impose any types on its values.

We must be able to deal with syndications of arbitrary graphs
without foreknowledge of the types employed.

> Remark:
> 
> Notice that a schema is like a contract. Imagine we are in the position
> of the DublinCore, i.e. we have to design a schema that insures maximum
> interoperability between compliant applications. If, for example, we
> decide to enforce a specific lexical representation of a certain
> datatype, we could use S-P. On the other hand, if the schema needs
> maximum flexibility, we could take S-A to allow lexical representations
> to evolve with time. In such case, the "contract" merely states that
> certain value space is under consideration, but no further requirement
> is put forth with respect to the lexical encoding. Both variants, i.e.
> with "decoupled" and "coupled" lexical representation are useful.

Dublin Core is a vocabulary, not a schema. It has realizations other
than RDF. This is also true of most other vocabularies/ontologies.

We need a way to deal with the knowledge that different folks
express, based on their intersections. We cannot exclude any
knowledge on the basis of preferred or manditory idioms.
 
Thus the DC folks cannot (and I expect will not) mandate that
folks use one or another idiom, or use one or another union type
in their schemas "just in case" their data might be syndicated
with someone elses.

>> Issue B6: S requires 4 URI's be registered for each data type
>> =============================================================
>> S requires that for each datatype 4 URI's be registered
>> datatype
>> datatype.lex
>> datatype.val
>> datatype.map
>> 
>> Sergey: Do you agree this is the case? If not, how many URI's are required
>> to implement ALL the idioms of S and coexist in the same model.
> 
> nope ;)
> 
> Surprise: only one URI is required.
> Price:    special vocabulary is needed to identify lexical spaces,
>         value spaces, and datatype mappings for a given datatype.
>
> Here how it works. In the simplest scenario, we define additional three
> properties (in total, not for each datatype), say rdfdt:isValueSpaceOf,
> rdfdt:isLexicalSpaceOf, rdfdt:isDatatypeMappingOf. Then, we write e.g.
> 
> dc:Date rdf:range _:1
> _:1 rdfdt:isValueSpaceOf xsd:date
> 
> Voila! Defining the semantics of the above three rdfdt: properties is
> straightforward. Additionally, we can reuse xsd: URIs without concern.

In comparison to the TDL alternative

  dc:Date rdf:range xsd:date .

the price of your proposed approach is too
"expensive". Sorry.

Again, not "will it work?" but "is it the most
efficient way to do it?"


>> Issue B7: Complexity
>> ====================
>> 
>> status: agreed
>> 
>> S has several ways of expressing the same thing. An RDF processor has to be
>> aware of them all.
> 
> If by RDF processor you mean a general-purpose API and/or parser, I
> disagree. 

I both agree and disagree ;-) Whoa, how did I miss that one? ;-)

If we're speaking about a graph-access API or parser, then
like Sergey, I disagree that such applications must be
aware of such variability.

Likewise, if we are speaking about general-purpose applications
that may utilize RDF encoded knowledge, I also disagree, in that
recent discussions in rdf-interest regarding querying
indicate that many folks (myself included) expect that
there will be query and other APIs that will hide all
variation between idioms and allow -- within the context
of that API -- folks to interact with values wherever
possible.

That said, I *do* agree that there is complexity there
that has to be addressed. It's just not complexity that
must be addressed at all application levels.

Though, it goes without saying, I think, that if the
complexity can be avoided at *all* levels, all the better.


> Issue B11: Misuse of datatypes
> ==============================
> 
> Given untidy graphs it is possible to create a "datatype" for persons
> and another one for names, so that literal "Martyn" may represent a
> person if it occurs in one context, or it may represent a person's name
> in another context. Thus, untidy graphs facilitate ambiguous modeling
> techniques.

I think this is a valid point, with certain qualifications.

But.... I'm going to hold off commenting on this for the moment
(can you believe it?! ;-) as I think that this issue is resolved
by the proposal outlined in my recent posting with the subject
"A basis for convergence and closure?" (sorry, offline).

If not, say so, and I'll offer the comments in my cache ;-)

Cheers,

Patrick


--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com
Received on Tuesday, 5 February 2002 06:51:12 UTC