- From: David Booth <dbooth@w3.org>
- Date: Tue, 23 Jul 2002 23:46:53 -0400
- To: www-rdf-comments@w3.org
- Cc: www-rdf-interest@w3.org
Brian McBride <bwm@hplb.hpl.hp.com> writes: >If we choose the untidy option, the value of the object of the statement >is unknown from this statement alone; a range constraint is required to >determine the value from the literal string: > > <jenny> <ageInYears> "10" . > <ageInYears> <rdfs:range> <xsd:decimal> . > >With a range constraint, we can know that the object of the property is >the integer 10. I have three comments. One a simple answer to the A versus D question; one on the "tidy" versus "untidy" alternatives; and the third on data types in general, which has direct bearing on the question at hand. 1. It is clearly more important for test A to be true than test D, in order for RDF to be Web scalable, as explained below. However, it should be possible for both test A and test D to be true, if you apply different equality operators to tests A and D. Test A could check for literal string equality; test D could check for integer equality. No contradiction and no unnecessary restriction. Further explanation is below. 2. It seems to me that the "untidy" option would be unscalable and incompatible with Web architectural principles. Allowing "anyone to say anything about anything" means that people should be able to make statements about UNtyped data as well as typed data. It would be very bad if the RDF processor were to throw up its hands and say "sorry, I don' t know if they're equal" even if the application that's asking doesn't care at all about datatypes. (Perhaps the statements didn't even originate as RDF. They may have originated in non-RDF XML, which was transformed through some kind of mapping to generate RDF. There's a LOT of XML and other structured data around whose semantics should be usable once the data is mapped to RDF. In fact, most RDF probably won't originate as RDF.) If I don't have data type information, I should still be able to make *some* kind of inferences and comparisons with my data. I should not be dead in the water. If I *do* have (complete) data type information, and I wish to use it, then I should be able to make *additional* inferences about my data. Remember, I may be comparing data from wildly different sources across the Web -- some having very complete type information, and some having little or no type information. For Web-level scalability, complete type information must not be required. It is imperative that I still be able to make sensible literal comparisons in the absence of more sophisticated type information. Furthermore, requiring <ageInYears> to have a datatype (beyond string) is almost like limiting it to have only a single datatype or interpretation, and I don't think RDF should have this limitation. (In other words I think RDF should allow something to simultaneously have more than one type, like multiple inheritance: HP-LaserJet-3100 is-a-kind-of Printer, but also HP-LaserJet-3100 is-a-kind-of FaxMachine. But please correct me if you think I'm wrong here!) Would the requirement for a sensible equality comparison be that there exist at-least-one data type for <ageInYears>? Or would the requirement be that there exist one-and-only-one data type for <ageInYears>? If more than one data type is permitted for <ageInYears>, then how do we know which one should be used in the comparison? (See point 2 below for more on this.) A typed comparison is very different from an untyped string comparison. It involves transforming the original (string) representation into a type+value pair and then comparing both the types and the values. This transformation is important and should be explicitly represented. It seems like the "untidy" option would gloss over this transformation and require it to be built-in to any interpretation of the data, rather than being an explicit overlay that one may optionally apply. Bottom line: Data types should NOT be required. They should provide additional benefit if used. The "untidy" option would require datatypes in order to do any sensible processing, which is not Web scalable, and is therefore a BAD option. 3. Bill de hÓra <dehora@eircom.net> writes in http://lists.w3.org/Archives/Public/www-rdf-interest/2002Jul/0059.html : > > Test A: > > > > <Jenny> <ageInYears> "10" . > > <John> <ageInYears> "10" . > > > > Should an RDF processor conclude that the value of the ageInYears > > properties for Jenny and John are the same? > >[The processor should ask:] what does ageInYears say the answer to this >question is? [The answer is] it depends on the semantics of >the RDF property, period. > >. . . > >I would certainly add these to your test case: > > <John> <ageInYears> "ten" . > <John> <ageInYears> "Ten" . > >which should make clear the point about why properties need to be >deferred to for questions such as this, *unless* literals are given >types. +1 to Bill's comments, except that I think it is perfectly reasonable (and natural) for multiple kinds of comparison to exist (for different data types) and for a string literal comparison to be already known to a processor on boot-up, without loading any rule sets. A string comparison is not the same as an integer comparison, which is not the same as a myPrivateDataType comparison, just as a string is not the same as an integer, which is not the same as a myPrivateDataType value. If you want to compare two things as anything other than literal strings, to see if they are equal, you not only need to know the data types of the things that you wish to compare, but you also need to know what kind of COMPARISON you wish to make. I.e., you need to know the data type of the comparison operator that you wish to use. If a thing is only permitted to have one data type (an unwise restriction, in my opinion), and the data types of the things that you wish to compare happen to be the same, then the processor can easily guess which comparison operator to use. However, if things can have more than one data type, and/or you wish to compare things of different types, then you need to know the data type of the comparison you wish to make, and seek a coercion from the things' initial types to the input type of the comparison operator. In other words, the choice of comparison operator depends on the application, or the kind of question that you are trying to ask about the data -- not only on the data itself. For example, if we ask an RDF processor whether an ageInYears of "10" and a filmTitle of "10" are equal as strings, the answer should be yes. If we ask the processor whether they are equal as integers, the answer should be no (assuming a filmTitle has no defined coercion to type integer). And if we ask the processor whether they are equal as values of myPrivateDataType, then the answer should depend on: (a) whether ageInYears and filmTitle both have coercions to myPrivateDataType; and (b) the equality rules for myPrivateDataType. -- David Booth W3C Fellow / Hewlett-Packard Telephone: +1.617.253.1273
Received on Tuesday, 23 July 2002 23:45:54 UTC