Re: Re(buttal): Why I cannot live with S from Patrick Stickler on 2002-01-28 (w3c-rdfcore-wg@w3.org from January 2002)

From: Patrick Stickler <patrick.stickler@nokia.com>
Date: Mon, 28 Jan 2002 12:10:26 +0200
To: ext Sergey Melnik <melnik@db.stanford.edu>
CC: RDF Core <w3c-rdfcore-wg@w3.org>
Message-ID: <B87AF032.C6CE%patrick.stickler@nokia.com>
On 2002-01-25 21:39, "ext Sergey Melnik" <melnik@db.stanford.edu> wrote:

> Patrick Stickler wrote:
>> 
>> OK, here are the reasons why I cannot live with S, in no particular
>> order:
>> 
>> 1. Local and global idioms are not compatable in the same knowledge base
>> without having to resort to a duality of ontolgies, one for local idiom
>> and one for global idiom. Cohabitatino of local and global idioms
>> is IMO not just a desiderada, but a requirement.
> 
> Proposal S [1] suggests three idioms to choose from. In S-A and S-P,
> global and local typing can merrily coexist, so I don't see why the
> above counts as an cannot-live-with argument. S does not mandate that
> all three idioms are required to be accepted for use simultaneously by
> the same application.

But this precludes arbitrary and unrestricted syndication of
knowledge from disparate sources which may use different idioms.

From the perspective of a single, tightly controlled application
environment, it appears a reasonable restriction. From the perspective
of a global semantic web of knowlege, it is not acceptable.

>> 2. Allows definition of alternate notations in lexical forms (e.g. octal)
>> which are not supported by the actual datatype. I consider this to
>> violate the precision of the datatype definition and a bug. It also
>> means that there are two means to subclass a lexical datatype, either
>> by reference to a new, subordinate datatype or by localized definition
>> of (unsupported) lexical notations. TDL has only one single means of
>> doing so, which respects the boundaries of definition of the datatype
>> creator. With S, applications (and users) must know not only datatypes
>> but ontologies of notational variant properties (octal, decimal, inKg,
>> inOz, etc.) rather than, as with TDL, just datatypes.
> 
> Of course, each limitation can be twisted so that is looks like a
> feature. I believe, it is much easier to arrive at a stable definition
> of say integers (which has remained the same for centuries) than to find
> an appropriate lexical representation (roman/arabic, decimal/octal
> etc.). I see great value in being able to migrate to new encodings
> gracefully, without breaking the existing applications. In TDL, you'd
> have to use different formats for old and new systems. This is the same
> as saving your text document with MS Word 2000 and not being able to
> read it with MS Word 95. That's a horror scenario.

I don't quite follow your argument. Can you give an example of "old" and
"new" formats?

If the lexical space for xsd:integer is defined to be based on decimal
notation, then I hardly see how it is reasonable to expect applications
to support octal notation. An application supports standards, as and
insofar as those standards are defined, and the definition of xsd:integer
says nothing about octal or any other non-decimal notation. The S
approach would allow knowledge to be expressed in a non-standard manner.

If you need/want a way to express integer values in octal, then
define a new datatype, and if you want values of that new datatype to
also be treated as members of the value space of xsd:integer, then
define a relationship between your new datatype and xsd:integer which
reflects such a relationship.

But if an application supports xsd:integer, it should not be getting
values in octal notation and be expected to achieve a correct
interpretation of them.

> 
>> 3. Requires four (4) URIs for each datatype rather than just one. How
>> can we ask every authority that has already defined datatypes and given
>> them URI identity to now go back and mint three more URIs so that RDF
>> can use them?! TDL has already shown that one URI is sufficient.
> 
> S-A and S-P can live with a single URI. In [1], different URIs were
> introduced for clarity, so that a precise distinction can be made
> whether lexical spaces, value spaces, or mappings are used for typing in
> a given idiom.

Fair enough, though the definition of S does not suggest that.

But if S can be revised so that only one URI per datatype is required,
then certainly this specific objection is no longer valid.

>> 4. Requires users to understand the MT to understand which URI
>> variant to use in which idiom (*.lex, *.map, *.val, or *) rather
>> than addressing such distinctions in the MT alone, as TDL does.
>> TDL allows you to talk about those different components, but it
>> does not require you to do so simply to denote the type of a
>> literal so that some application knows which value it corresponds
>> to.
> 
> Same as reply for (3).

Same as reply to reply for (3) ;-)

>> 5. Requires additional clarification regarding semantics and use
>> of rdfs:subPropertyOf relation between properties which convey
>> datatyping.
> 
> This is untrue. Semantics of rdfs:subPropertyOf remains exactly as
> defined in the model theory draft.

I've detailed my concerns on this point numerous times. It has
nothing to do with the semantics of rdfs:subPropertyOf being
different, but rather having to clarify for users the
implications for relating properties which may involve datatyping
semantics.

>> 6. Use of rdfs:subPropertyOf relation between datatyping property
>> and non-datatyping property is valid in the absence of domain
>> constraints but not valid if domain constraints defined.
>> 
>> E.g.
>> 
>>    #Bob ex:age "30" .
>>    foo:integer rdfs:range xsd:integer.lex .
>>    ex:age rdfs:subPropertyOf foo:integer .
>> 
>> implies that "30" is a member of the lexical space of
>> xsd:integer, and this is correct and (even arguably)
>> intuitive and efficient, avoiding the need for
>> an anonymous node idiom.
>> 
>> But with the additional constraint
>> 
>>    foo:integer rdfs:domain xsd:integer.val .
>> 
>> which is valid for idioms using anonymous nodes, e.g.
>> 
>>    #Sue ex:age _:1 .
>>    _:1 foo:integer "30" .
>> 
>> where _:1 is then inferred to denote a member of
>> the value space of xsd:integer, it also means that,
>> when these two graphs are merged, #Bob, from the
>> example above is *also* inferred to be a member of
>> the value space of xsd:integer, and thus a perfectly
>> valid use becomes an invalid use.
> 
> What a discovery! If you happen to believe that cats are dogs you can
> hardly expect consistency. The use that you suggested above is simply
> incorrect...

It is not incorrect, though perhaps you have misunderstood it.

>> 7. Provides no additional expressive power over TDL
>> yet requires significantly more machinery and deeper
>> understanding of the model by users, and is not as
>> compatable with current idioms as is TDL. Per Occam's
>> razor, TDL is the simpler and better choice.
> 
> That sounds more like a plug rather than an argument. I think it has
> been made sufficiently clear over the past several month, which of the
> proposals requires "significantly more machinery and deeper
> understanding of the model by users" ;)

Right, S does, so then we agree ;-)

>> 8. The S model, particularly literal labeled nodes
>> participating in graph tidying, precludes any later
>> adoption of a P+ treatment, which is a more ideal
>> local idiom than D given the consistency of representation
>> of local and global typing in the graph; I.e. with P+
>> the literal object node is not shifted to an anonymous
>> node as with D, it just stays where it is and the local
>> type is specified for the literal node directly
>> via rdf:type -- elegant, efficient, and worth keeping
>> as a desirable future option. S would prevent this
>> option forever.
> 
> Talking about esthetics, what about this: in TDL, if you write
> 
> _1 rdf:value "bla"
> 
> then _1 and "bla" denote exactly the same thing. Do you consider this
> elegant?

They do not denote the same thing. Precisely where do
you get that? It may be that you do not actually understand
TDL?

In the above example, _1 denotes (if anything) a member
of the value space of a datatype, and "bla" (in the
context of a given datatype) denotes a lexical form,
a member of the lexical space of that datatype.

Are you saying that a member of the lexical space
and a member of the value space are the same thing?

Though, per my recent discussion of TDL with tidy literals,
I would say that it is reasonable to presume that isofar
as the graph is concerned, _1 is simply an object node
and "bla" is simply a literal, and depending on other
characteristics of the graph, may participation in an
interpretation which will treat "bla" as a lexical form
for some datatype.
 
>> In short, S is too complex, too different from present
>> usage, able to result in conflicting knowledge on
>> merge that is valid separately, and too burdensome
>> from the practical point of URI management to be
>> acceptable.
> 
> Another plug... What about this: "In short, TDL is too complex, breaks
> most existing applications, APIs, querying, storage, ..."? ;)

TDL (the model, not necessarily the MT) is very simple and elegant,
and does not break any existing applications.

Furthermore, the argument that TDL with non-tidy literals breaks
existing applications seems to presume that those applications are
testing for literal node equality rather than string equality
of literal node labels -- a claim which I find *very* hard to
believe.

Can you clarify: do these applications make their equality tests
by actual node identity or by checking if two property values
have the same string equality? If the latter, then I would assert
that TDL, even with non-tidy literals, does not break any such
application.
 
>> If S were the only option, I'd still not choose to
>> use it. I'd look for some other KR solution that dealt
>> with datatyping in a more economical and user friendly
>> manner. But since there is another option, TDL, which
>> has none of the above faults yet meets all of the
>> specified desiderada, the choice for TDL is obvious.
> 
> As a developer, I don't have an option. With TDL-RDF, all of my own
> applications in mediation, model management, backend storage etc. would
> technically become non-RDF applications, or applications "formally known
> as complying with the deprecated RDF 1.0" ;)

I really don't believe that this would be the case. I think that
there has been a misunderstanding about what TDL is actually
requiring, or what perhaps such applications are expecting in
the RDF graph.

Still, even if your applications are doing tests for tidy literal
node equality rather than string equality, TDL works fine with
tidy literals (though the MT needs adjustment, perhaps, to
support that).

So, I still don't see any clear evidence that TDL breaks
your existing implementations.

> However, I strongly object against adopting untidy literals.

No problem. TDL works fine with tidy literals. Whether or not
literals are tidy or untidy is irrelevant insofar as the TDL
model is concerned (though, again, the present MT may need to
be tweaked).

Have a close look at my recent posting

http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0314.html

and please reconsider if TDL really breaks your applications.

Patrick



--
               
Patrick Stickler              Phone: +358 50 483 9453
Senior Research Scientist     Fax:   +358 7180 35409
Nokia Research Center         Email: patrick.stickler@nokia.com
Received on Monday, 28 January 2002 05:09:25 UTC