- From: Jeremy Carroll <jjc@hplb.hpl.hp.com>
- Date: Thu, 16 May 2002 10:32:42 +0100
- To: <w3c-rdfcore-wg@w3.org>
Summary: - the current model theory misarticulates the meaning of the triple: <eg:doc1> <dc:creator> "John Smith" . - many such triples occur both in the primer and on the web. - the model theory should be overhauled along the lines of Pat's simpledatatype2 I have been looking through the primer, particularly looking at the Dublin Core examples (throughout the primer). These seem like perfectly fair examples of how Dublin Core is used. Unfortunately, there are many instances where strings are used to represent people and things rather than themselves. This is not in agreement with the model theory in which strings denote themselves. See the end of this message for a list of such strings. Thus I conclude that either the primer is in need of an overhaul, or the model theory. Since the examples follow standard usage, and the model theory is meant to be merely a rearticulation of what RDF means, I believe that it is the model theory that is at fault. (Not Pat's fault, he was merely following the WG's lead). Within its own terms, the model theory talks about entailments. Thus every model theory we have looked at uses an entailment like: Premise: <eg:doc1> <dc:creator> <urn:id:1> . <eg:doc2> <dc:creator> <urn:id:1> . Conclusion: <eg:doc1> <dc:creator> _:blank . <eg:doc2> <dc:creator> _:blank . to show that the two documents have the same author. The current model theory also mandates the following entailment: Premise: <eg:doc1> <dc:creator> "John Smith" . <eg:doc2> <dc:creator> "John Smith" . Conclusion: <eg:doc1> <dc:creator> _:blank . <eg:doc2> <dc:creator> _:blank . All Dublin Core users would recognise that it is not always true that the premise entail that the two documents have the same author i.e. they would recognise that there might be two Johns. The model theory describes necessary truth (not optional or probable truth), which all uses of all RDF documents must follow. Given a reading of the conclusion as the two documents have the same author then, according to the april model theory the Dublin Core users are simply wrong in this case. So, in practice, we have decided to deprecate the single most common RDF triple <uri> <dc:creator> "string" . This seems a very peculiar decision of any standardization committee, to deprecate its single greatest use case. If that is indeed our decision it needs to be both highlighted and respected in the primer. Moreover the examples in the primer seem to be very much in accord with the datatyping proposal that we have not discussed (simpledatatype2 [1]). In that proposal a triple like: <eg:doc1.html> <dc:creator> "Eric J. Miller" . is read as 'the dc:creator of eg:doc1.html can be written as "Eric J. Miller"'. The current model theory in contrast says 'the dc:creator of eg:doc1.html is the string "Eric J. Miller"' In essence, the model theoretic problem is that we chose a strictly typed system, in which strings always represents strings, instead of a context sensitive typing in which the type of thing being represented by each string is determined by the context in which it is used. (This is the tidy versus untidy debate at a semantic level rather than a syntactic level). Context sensitive typing is much more robust against simplifications in the data modelling used by the RDF author. Since all data modelling involves simplifications this feels like a good feature. I believe that the sheer quantity of these examples in the primer indicates that this a substantial issue: that RDF, as deployed, particularly in Dublin Core, does not conform with the model theoretic changes agreed on the 22nd of February [2]. I believe that we will have great difficulty persuading the Dublin Core community to stop using string literals to represent real world entities. Thus we have a legacy problem, in terms of both legacy data and RDF expertise of the people writing the data. We can resolve this either by deprecating the data and persuading dublin core experts of the error of their ways or by maintaining backward compatibility along the lines of simpledatatytpes2. I believe that if deprecation does not cause substantial problems at last call it will reflect that + we have failed to communicate the impact of the model theory on DC usage. + much of that community is not intending to respect the model theory. I suggest that this is an important problem with our current position on datatyping and that we should reopen the topic of tidy semantics, with a view to seeing whether simpledatatype2 can act as a basis to resolve this problem while retaining as much of the current datatyping work as possible. Jeremy References ========== [1] Pat Hayes, Simple-datatypes-2, http://www.coginst.uwf.edu/users/phayes/simpledatatype2.html [2] Aaron Swartz, Minutes of 22nd February http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Feb/0656.html Appendix ======== The strings in question are: strings as people, entities, complex values ........................................... "English" "1501 Grant Avenue, Bedford, Massachusetts 01730" "Richard Roe" "Corporation For National Research Initiatives" "Research; statistical methods" "Education, research, related topics" "Library use Studies" "World Wide Web Home Page" "Amy Friedlander" "electronic journal" "library use studies" "magazines and newspapers" "Eric J. Miller" "John Peterson" "Sally Smith, lighting" "Greece" "Greece"-en "Grece"-fr "Garret Wilson" strings as datatype values .......................... [without clarifying that RDF global idiom only delivers strings. If the application can interpret these as values then why wasn't Patrick allowed to say so?] "urn:issn:1082-9873" "August 16, 1999" "27" "2" "2.4" "127" "1998-01-05" Strings where the property name was such as to suggest a real world entity and a renaming of the property name might be appropriate. "tent" "Overnighter" "Bedford" "Massachusetts" "01730" "1501 Grant Avenue" I have worked through some of these examples in more detail; I might post that if there is interest. ("English", "Bedford", "Eric J. Miller", "Amy Friedlander", "1501 Grant Avenue" "1501 Grant Avenue, Bedford, Massachusetts 01730" "Corporation For National Research Initiatives" "Research; statistical methods" "World Wide Web Home Page")
Received on Thursday, 16 May 2002 05:33:00 UTC