Re: Datatyping Summary V4 from Sergey Melnik on 2002-02-05 (w3c-rdfcore-wg@w3.org from February 2002)

From: Sergey Melnik <melnik@db.stanford.edu>
Date: Tue, 05 Feb 2002 10:57:36 -0800
To: Patrick Stickler <patrick.stickler@nokia.com>
CC: Brian McBride <bwm@hplb.hpl.hp.com>, RDF Core <w3c-rdfcore-wg@w3.org>
Message-ID: <3C602B20.D29C88E4@db.stanford.edu>
Patrick Stickler wrote:
> 
> On 2002-02-05 5:29, "ext Sergey Melnik" <melnik@db.stanford.edu> wrote:
> 
> > If the schema designers (e.g. of DublinCore) want to ensure that all
> > three idioms S-A, S-B, and S-P are usable with a given property (e.g.
> > dc:Date), they can simply define the range of the property as a UNION of
> > xsd:date.val, xsd:date.lex and xsd:date.map. These three sets are
> > disjoint, so no clash can occur.
> 
> But is that union then not another datatype?

In general, no (if by datatype you mean a 3-tuple of things). The above
union is just some class.
 
> Or are you saying that the URI of the datatype itself is
> interpreted as a union of the members of its components
> (lexical space, value space, and mapping)?

Was not my intention, but the idea sounds quite interesting, indeed! In
such case one could allow using all three idioms by just saying that the
range is xsd:date.
 
> > Moreover, the schema designers have fine-grained control with respect to
> > the lexical representations that each compliant DublinCore application
> > *must* support. For example, imagine that there is another datatype for
> > date, say uml:date, that shares the value space of xsd:date, but uses a
> > different (disjoint) lexical representation. To enforce that each
> > DublinCore application can handle both lexical forms we can make the
> > range of dc:Date a union of xsd:date.val (=uml:date.val), xsd:date.lex,
> > xsd:date.map, uml:date.lex, and uml:date.map.
> 
> But this only works if we don't get any new datatypes, and
> can agree on the single standardized union.
> 
> Otherwise, a set of ranges is an intersection, not a union.

Well, I'm not suggesting to use rdfs:range to get a union. Another kind
of property would be required, of course.
 
> I think that this proposed approach presumes far far more
> control over the data than anyone ever will have or has
> ever had.
> 
> > If, in contrast,
> > uml:date.lex and xsd:date.lex clash in some incompatible way, the range
> > of dc:Date could comprise just xsd:date.map and uml:date.map, or a union
> > of xsd:date.val, xsd:date.lex, xsd:date.map, and uml:date.map.
> >
> > No "second" property is needed in the above examples.
> 
> But the reality is that DC does not impose any types on its values.

DC is meant to be very general. It's very hard to agree on the ranges of
most properties (e.g. dc:Creator) so that everybody is happy. However,
dc:Date (or some specialized version of it) sounds like a more tractable
one.

> We must be able to deal with syndications of arbitrary graphs
> without foreknowledge of the types employed.

No doubt.
 
> > Remark:
> >
> > Notice that a schema is like a contract. Imagine we are in the position
> > of the DublinCore, i.e. we have to design a schema that insures maximum
> > interoperability between compliant applications. If, for example, we
> > decide to enforce a specific lexical representation of a certain
> > datatype, we could use S-P. On the other hand, if the schema needs
> > maximum flexibility, we could take S-A to allow lexical representations
> > to evolve with time. In such case, the "contract" merely states that
> > certain value space is under consideration, but no further requirement
> > is put forth with respect to the lexical encoding. Both variants, i.e.
> > with "decoupled" and "coupled" lexical representation are useful.
> 
> Dublin Core is a vocabulary, not a schema. It has realizations other
> than RDF. This is also true of most other vocabularies/ontologies.

A vocabulary is nothing but a very simple flat schema...
 
> We need a way to deal with the knowledge that different folks
> express, based on their intersections. We cannot exclude any
> knowledge on the basis of preferred or manditory idioms.
> 
> Thus the DC folks cannot (and I expect will not) mandate that
> folks use one or another idiom, or use one or another union type
> in their schemas "just in case" their data might be syndicated
> with someone elses.

I missed that one, but don't bother, let's go ahead with "convergence"
first...
 
> >> Issue B6: S requires 4 URI's be registered for each data type
> >> =============================================================
> >> S requires that for each datatype 4 URI's be registered
> >> datatype
> >> datatype.lex
> >> datatype.val
> >> datatype.map
> >>
> >> Sergey: Do you agree this is the case? If not, how many URI's are required
> >> to implement ALL the idioms of S and coexist in the same model.
> >
> > nope ;)
> >
> > Surprise: only one URI is required.
> > Price:    special vocabulary is needed to identify lexical spaces,
> >         value spaces, and datatype mappings for a given datatype.
> >
> > Here how it works. In the simplest scenario, we define additional three
> > properties (in total, not for each datatype), say rdfdt:isValueSpaceOf,
> > rdfdt:isLexicalSpaceOf, rdfdt:isDatatypeMappingOf. Then, we write e.g.
> >
> > dc:Date rdf:range _:1
> > _:1 rdfdt:isValueSpaceOf xsd:date
> >
> > Voila! Defining the semantics of the above three rdfdt: properties is
> > straightforward. Additionally, we can reuse xsd: URIs without concern.
> 
> In comparison to the TDL alternative
> 
>   dc:Date rdf:range xsd:date .
> 
> the price of your proposed approach is too
> "expensive". Sorry.

Precision is costly. If no precision is needed, one could use the trick
you suggested earlier, i.e. to define CEXT(xsd:integer) as a union of
both lex. and val. spaces and the mapping.
 
> Again, not "will it work?" but "is it the most
> efficient way to do it?"
> 
> >> Issue B7: Complexity
> >> ====================
> >>
> >> status: agreed
> >>
> >> S has several ways of expressing the same thing. An RDF processor has to be
> >> aware of them all.
> >
> > If by RDF processor you mean a general-purpose API and/or parser, I
> > disagree.
> 
> I both agree and disagree ;-) Whoa, how did I miss that one? ;-)
> 
> If we're speaking about a graph-access API or parser, then
> like Sergey, I disagree that such applications must be
> aware of such variability.
> 
> Likewise, if we are speaking about general-purpose applications
> that may utilize RDF encoded knowledge, I also disagree, in that
> recent discussions in rdf-interest regarding querying
> indicate that many folks (myself included) expect that
> there will be query and other APIs that will hide all
> variation between idioms and allow -- within the context
> of that API -- folks to interact with values wherever
> possible.
> 
> That said, I *do* agree that there is complexity there
> that has to be addressed. It's just not complexity that
> must be addressed at all application levels.
> 
> Though, it goes without saying, I think, that if the
> complexity can be avoided at *all* levels, all the better.
> 
> > Issue B11: Misuse of datatypes
> > ==============================
> >
> > Given untidy graphs it is possible to create a "datatype" for persons
> > and another one for names, so that literal "Martyn" may represent a
> > person if it occurs in one context, or it may represent a person's name
> > in another context. Thus, untidy graphs facilitate ambiguous modeling
> > techniques.
> 
> I think this is a valid point, with certain qualifications.
> 
> But.... I'm going to hold off commenting on this for the moment
> (can you believe it?! ;-)

I'm impressed, indeed ;)

Sergey


> as I think that this issue is resolved
> by the proposal outlined in my recent posting with the subject
> "A basis for convergence and closure?" (sorry, offline).
> 
> If not, say so, and I'll offer the comments in my cache ;-)
> 
> Cheers,
> 
> Patrick
> 
> --
> 
> Patrick Stickler              Phone: +358 50 483 9453
> Senior Research Scientist     Fax:   +358 7180 35409
> Nokia Research Center         Email: patrick.stickler@nokia.com

-- 
E-Mail:      melnik@db.stanford.edu (Sergey Melnik)
WWW:         http://www-db.stanford.edu/~melnik
Tel:         OFFICE: 1-650-725-4312 (USA)
Address:     Room 438, Gates, Stanford University, CA 94305, USA
Received on Tuesday, 5 February 2002 13:40:43 UTC