- From: Pat Hayes <phayes@ai.uwf.edu>
- Date: Mon, 22 Oct 2001 15:16:18 -0500
- To: Sergey Melnik <melnik@db.stanford.edu>
- Cc: w3c-rdfcore-wg@w3.org
>Folks, this is a current snapshot of the datatyping discussion taken >place on various lists. Nice survey!! Wish I had read it before I wrote http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0446.html I have some quibbles, but I think they are important for the conclusion. >This posting reviews several kinds of suggested approaches, lists some >proposed criteria for comparing them, and concludes with a short >scratch-the-surface discussion. A list of major relevant references is >given in the end (please send further pointers to me if I you have >some). > >1. SUGGESTED APPROACHES >======================= > >All suggested approaches can be roughly divided into two groups, >"typed instances" and "schema-based typing" (also called weak and >strong typing [OL]). In the former approach, the typing information is >attached directly to the data values, whereas in the latter the typing >information is provided in some (typically external) schema or rule >set. Quibble Q1: part of the merit (to me) of the suggestions S3 and S4 is that they do *not* require an external schema or rule set, but that all the relevant information can be asserted in conventional RDFS (with a slight extension in S3 to introduce rdf:value). This is important for the reason given in Q3. >Examples and references to some concrete suggestions follow >below. > >Typed instances: >--------------- > >(S1) encode literals as URIs, merge literals and resources [PS] > > Examples: x:urn:int:5, x:urn:data:a%20space > >(S2) make literals composite, e.g. a pair <resource, unicode string> >[SM1,SM2] > > Examples: <(http://www.w3.org/2001/XMLSchema-datatypes, integer), >"5"> > >(S3) use bNodes [M&S,TBL] > > Examples: John_Smith weight [units Pounds, rdf:value "10"], or > John_Smith weight [pounds [decimal "10"]] Q2. I think it is misleading to characterize this under this heading. It only looks that way if you write it in N3. If you wrote this out in Ntriples, the typing information is clearly asserted in the RDF graph, so this is better called 'schema-based'. (A general meta-quibble here is that this strong/weak classification, though traditional, is only meaningful relative to a particular syntax, and is therefore rather shallow, IMHO. I think a much more meaningful criterion for us is whether or not the datatyping information is available to an RDF inference engine. On that basis, S1 and S2 would be classified together (under 'not') and S3 and S4 classed together (under 'available')) >Schema-based or rule-based typing: >--------------------------------- > >(S4) type of property values is defined in a schema [PH,PPS1], possibly > by a set of rules [TBL] > > Examples: (weight rdfs:range >http://www.w3.org/2001/XMLSchema-datatypesinteger), or > > (John_Smith weight "160 1/8") goes together with a rule >like > 'if X is a person living in the US and (X weight Y), > then Y is a "pieces-of-eight" number that gives weight >in pounds' > >2. CRITERIA >=========== > >Below is a non-exhaustive list of several criteria that can be used >for deciding on the suggested approaches. I picked the criteria that >affect applications critically. > >(C1) backward compatibility wrt existing data and applications > >(C2) comparing values for custom or unknown datatypes > (Is myint:05==myint:5? Given _x1 decimal "5" and _x2 decimal "5", >is _x1==_x2?) Q3. This seems meaningless since RDF has no way to express identity. So these questions cannot even be posed in RDF; and if some external application could infer them, it would have no way to assert its conclusions in RDF . So I propose that this criterion simply be ignored for now. >(C3) is typing information self-contained or requires external schema >[DC2] I would like to know what 'external' means here. This seems like a three-or four-way distinction. Typing information can be: >> self-contained or not (ie potentially arising from any general inference) >> external (to RDF) or internal. These seem orthogonal dimensions to me. >(C4) are multiple type assignments allowed? (e.g. US dollar, decimal) Better, what happens when they occur? Eg suppose two sources/documents/whatever supply different such information; which part of the RDF machinery complains? (the lexical analyzer, the parser, an inference engine, or some other datatyping module?) >(C5) compactness (verbosity of serialization, storage efficiency in >databases, elegant APIs) > >3. DISCUSSION >============= > >The discussion can be found at >http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/ ;) > >Seriously, it looks like encoding of data types using bNodes (S3) is >still our best bet. I prefer S4. It scores high on all of C1, C3, C4 , C5; and C2 should be ignored. I don't think that S3 does score on C1, by the way; that usage is *incompatible* with the M&S examples. That seems to me to be a major score against it. >It is backward compatible for obvious reasons Seems obviously not. Am I missing something? Eg a simple usage of a literal with no datatyping information: aaa bbb "56788" . is unchanged in S4, but would be illegal in S3 and must be rewritten as aaa bbb _:1 . _:1 rdf:value "56788" . >(C1), uses self-contained typing (C3) Why is it self-contained? Like S4, it imposes typing by making RDF assertions (using rdf:type). But any RDF assertion might be a consequence of some other assertions, eg about rdfs:range; so S3 and S4 seem to me to be indistinguishable on C3. >, and is flexible in allowing >multiple type assignments (C4). > >Of course, deficiencies in (C2) and (C5) are the back side of the >coin. All of (S1),(S2), and (S4) have a problem with (C2), RDF has a problem with C2 in general, and it applies just as forcibly to S3 (see Q3) >i.e. comparing typed values, but do well in compactness (C5). > >The semantics of datatyping has been investigated extensively in >[PH,PPS1,PPS2]. It seems that (S3) fits well into the suggested >theories. S4 (at least in the version discussed in [PH][PPS1/2] ) fits here better. S3 doesn't even require such fitting; it fits into the current MT without alteration. However, it doesn't fit with widespread practice in literal usage in other languages, so I think it would be short-sighted to build it into RDF. Also, frankly, the use of rdf:value seems ugly and ad-hoc. >In conclusion, my suggestion is to focus on (S3) and try to work out >detailed use scenarios, limitations and work-arounds. Note that (S3) >is orthogonal to the question whether namespaces are part of the >model. I'm not sure what you mean by 'model' here. S3, like S4, does require that datatypes (the things denoted by datatype names) are in every satisfying interpretation, and have their 'natural' interpretation. >It seems particularly desirable to be able to identify >namespaces of properties that carry data types and download the >associated schemas. Right, I agree. Pat >Sergey > > >REFERENCES >========== > >[DC1] Dan Connoly. http://www.w3.org/2001/01/ct24 >[DC2] Dan Connoly. >http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0338.html >[JG] Jan Grant. http://ioctl.org/rdf/literals >[OL] Ora Lassila. >http://lists.w3.org/Archives/Public/www-rdf-logic/2001Oct/0099.html >[PH] Pat Hayes. >http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0164.html >[PPS1] Peter F. Patel-Schneider. >http://lists.w3.org/Archives/Public/www-rdf-comments/2001OctDec/0057.html >[PPS2] Peter F. Patel-Schneider. >http://lists.w3.org/Archives/Public/www-rdf-interest/2001Oct/0054.html >[PS] Patrick Stickler. >http://lists.w3.org/Archives/Public/www-rdf-interest/2001Oct/0051.html >[SM1] Sergey Melnik. >http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Sep/0444.html >[SM2] Sergey Melnik. >http://lists.w3.org/Archives/Public/www-rdf-interest/2001Feb/0090.html >[TBL] Tim Berners-Lee. >http://www.w3.org/DesignIssues/InterpretationProperties.html >[M&S] http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/ -- --------------------------------------------------------------------- IHMC (850)434 8903 home 40 South Alcaniz St. (850)202 4416 office Pensacola, FL 32501 (850)202 4440 fax phayes@ai.uwf.edu http://www.coginst.uwf.edu/~phayes
Received on Monday, 22 October 2001 16:16:35 UTC