Dublin Core, the Primer and the Model Theory

Summary:
  - the current model theory
    misarticulates the meaning of the triple:
 <eg:doc1> <dc:creator> "John Smith" .
  - many such triples occur both in the primer
    and on the web.
  - the model theory should be overhauled along the 
    lines of Pat's simpledatatype2
    


I have been looking through the primer, particularly looking
at the Dublin Core examples (throughout the primer).

These seem like perfectly fair examples of how Dublin Core
is used. Unfortunately, there are many instances where
strings are used to represent people and things rather than
themselves. This is not in agreement with the model theory
in which strings denote themselves.

See the end of this message for a list of such strings.

Thus I conclude that either the primer is in need of an 
overhaul, or the model theory.

Since the examples follow standard usage, and the model theory 
is meant to be merely a rearticulation of what RDF means, I 
believe that it is the model theory that is at fault. (Not 
Pat's fault, he was merely following the WG's lead).

Within its own terms, the model theory talks about 
entailments.
Thus every model theory we have looked at uses an
entailment like:

Premise:
 <eg:doc1> <dc:creator> <urn:id:1> .
 <eg:doc2> <dc:creator> <urn:id:1> .
Conclusion:
 <eg:doc1> <dc:creator> _:blank .
 <eg:doc2> <dc:creator> _:blank .

to show that the two documents have the same author.

The current model theory also mandates the following 
entailment:

Premise:
 <eg:doc1> <dc:creator> "John Smith" .
 <eg:doc2> <dc:creator> "John Smith" .
Conclusion:
 <eg:doc1> <dc:creator> _:blank .
 <eg:doc2> <dc:creator> _:blank .

All Dublin Core users would recognise that it is not
always true that the premise entail that the two documents
have the same author i.e. they would recognise that there 
might be two Johns.

The model theory describes necessary truth (not optional 
or probable truth), which all uses of all RDF documents
must follow. Given a reading of the conclusion
as the two documents have the same author then, 
according to the april model theory
the Dublin Core users are simply wrong in this case.

So, in practice, we have decided to deprecate the single most 
common RDF triple
  <uri>  <dc:creator> "string" .


This seems a very peculiar decision of any standardization 
committee, to deprecate its single greatest use case. If that 
is indeed our decision it needs to be both highlighted and 
respected in the primer.


Moreover the examples in the primer seem to be very much in
accord with the datatyping proposal that we have not discussed
(simpledatatype2 [1]).
In that proposal a triple like:
  <eg:doc1.html> <dc:creator> "Eric J. Miller" .
is read as 'the dc:creator of eg:doc1.html can be written as 
"Eric J. Miller"'. The current model theory in contrast
says
  'the dc:creator of eg:doc1.html is the string 
"Eric J. Miller"'

In essence, the model theoretic problem is that we chose
a strictly typed system, in which strings always represents
strings, instead of a context sensitive typing in which
the type of thing being represented by each string is
determined by the context in which it is used. (This is the
tidy versus untidy debate at a semantic level rather than 
a syntactic level).

Context sensitive typing is much more robust against
simplifications in the data modelling used by the RDF author. 
Since all data modelling involves simplifications this feels 
like a good feature.

I believe that the sheer quantity of these examples in the
primer indicates that this a substantial issue: that RDF, as
deployed, particularly in Dublin Core, does not conform with
the model theoretic changes agreed on the 22nd of February [2].

I believe that we will have great difficulty persuading the 
Dublin Core community to stop using string literals to 
represent real world entities. 

Thus we have a legacy problem, in terms of both legacy data 
and RDF expertise of the people writing the data. We can 
resolve this either by deprecating the data and persuading 
dublin core experts of the error of their ways or by maintaining 
backward compatibility along the lines of simpledatatytpes2. 

I believe that if deprecation does not cause substantial 
problems at last call it will reflect that 
+ we have failed to communicate the impact of the
  model theory on DC usage.
+ much of that community is not intending to respect the 
  model theory.

I suggest that this is an important problem with our current 
position on datatyping and that we should reopen the topic 
of tidy semantics, with a view to seeing whether
simpledatatype2 can act as a basis to resolve this problem 
while retaining as much of the current datatyping work 
as possible.

Jeremy

References
==========
[1] Pat Hayes, 
Simple-datatypes-2,
http://www.coginst.uwf.edu/users/phayes/simpledatatype2.html 
[2] Aaron Swartz, Minutes of 22nd February
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Feb/0656.html

Appendix
========

The strings in question are:

strings as people, entities, complex values
...........................................

"English"
"1501 Grant Avenue, Bedford, Massachusetts 01730"
"Richard Roe"
"Corporation For National Research Initiatives"
"Research; statistical methods"
"Education, research, related topics"
"Library use Studies"
"World Wide Web Home Page"
"Amy Friedlander"
"electronic journal"
"library use studies"
"magazines and newspapers"
"Eric J. Miller"
"John Peterson"
"Sally Smith, lighting"
"Greece"
"Greece"-en
"Grece"-fr
"Garret Wilson"


strings as datatype values
..........................
[without clarifying that RDF global idiom
only delivers strings.
If the application can interpret these as
values then why wasn't Patrick allowed to 
say so?]

"urn:issn:1082-9873"
"August 16, 1999"
"27"
"2"
"2.4"
"127"
"1998-01-05"


Strings where the property name was such as 
to suggest a real world entity and a renaming
of the property name might be appropriate.
"tent"
"Overnighter"

"Bedford"
"Massachusetts"
"01730"
"1501 Grant Avenue"

I have worked through some of these examples in
more detail; I might post that if there is interest.
 ("English", "Bedford", "Eric J. Miller",
  "Amy Friedlander", "1501 Grant Avenue"
  "1501 Grant Avenue, Bedford, Massachusetts 01730"
  "Corporation For National Research Initiatives"
  "Research; statistical methods"
  "World Wide Web Home Page")


  

Received on Thursday, 16 May 2002 05:33:00 UTC