- From: Sergey Melnik <melnik@db.stanford.edu>
- Date: Mon, 04 Feb 2002 19:29:36 -0800
- To: Brian McBride <bwm@hplb.hpl.hp.com>
- CC: RDF Core <w3c-rdfcore-wg@w3.org>
Brian McBride wrote: > > An updated summary of the datatyping issues, as I currently understand them. > > Changes: > > B1 now disputed > B7 status changed to agreed > B9 withdrawn > B10 added "say what you mean" > > Issue B1: > ========= > > status: disputed by Sergey. Sergey you owe us an explanation of why. > > In S, if one wants to use both idiom A and idiom B, e.g. > > <mary> <age> "10" . > <age> <rdfs:range> <xsd:integer.lex> . > > and > > <mary> <ageD> _:a . > _:a <xsd:integer.map> "10" . > > two properties have to be used, <age> and <ageD>, in this example. > > I believe there is a agreement that this is a difference between the > two proposals. Indeed, it may be said that the main aim of TDL is > to avoid requiring different properties for these different idioms. > > Can't Live With: PatrickS If the schema designers (e.g. of DublinCore) want to ensure that all three idioms S-A, S-B, and S-P are usable with a given property (e.g. dc:Date), they can simply define the range of the property as a UNION of xsd:date.val, xsd:date.lex and xsd:date.map. These three sets are disjoint, so no clash can occur. Moreover, the schema designers have fine-grained control with respect to the lexical representations that each compliant DublinCore application *must* support. For example, imagine that there is another datatype for date, say uml:date, that shares the value space of xsd:date, but uses a different (disjoint) lexical representation. To enforce that each DublinCore application can handle both lexical forms we can make the range of dc:Date a union of xsd:date.val (=uml:date.val), xsd:date.lex, xsd:date.map, uml:date.lex, and uml:date.map. If, in contrast, uml:date.lex and xsd:date.lex clash in some incompatible way, the range of dc:Date could comprise just xsd:date.map and uml:date.map, or a union of xsd:date.val, xsd:date.lex, xsd:date.map, and uml:date.map. No "second" property is needed in the above examples. Remark: Notice that a schema is like a contract. Imagine we are in the position of the DublinCore, i.e. we have to design a schema that insures maximum interoperability between compliant applications. If, for example, we decide to enforce a specific lexical representation of a certain datatype, we could use S-P. On the other hand, if the schema needs maximum flexibility, we could take S-A to allow lexical representations to evolve with time. In such case, the "contract" merely states that certain value space is under consideration, but no further requirement is put forth with respect to the lexical encoding. Both variants, i.e. with "decoupled" and "coupled" lexical representation are useful. > Issue B2: Multiple Lexical Representations of a data value > ========================================================== > > status: agreed that S-A allows this and TDL does not. > > S, idiom A, permits multiple lexical representations of a data value: > > _:i <xsd:double> "10.1" . > _:i <xsd:double.de> "10,1" . > > Issue B3: the self entailment issue > =================================== > status: Withdrawn in favour of B4: > > From: > > http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0410.html > > [[I accept the reasoning above; it doesn't address my objection; > it' just shows that my example wasn't very good. Sergey's > example makes the point better:]] > > B9 also added in response to Graham's request. > > Issue B4 - TDL breaks existing code > =================================== > > status: facts agreed; significance disputed. > > This is similar to B3. I've changed the example slightly from Sergey's. > > Under TDL, consider the graph: > > _:f <rdf:type> <film> . > _:f <dc:Title> (_, "10") . > <mary> <age> (_, "10"). > > Does not entail: > > _:x <dc:Title> _:y . > _:z <age> _:y . > > Can't Live With: DanC > > Issue B5: Storage Requirements > =============================== > > status: disputed. > > TDL requires significantly more storage to implement. In most recent suggestions, there is a way of indicating (by means of syntax) which literals are to be treated as untidy. As long as not *all* literals is required to be untidy, I withdraw the storage issue. > Issue B6: S requires 4 URI's be registered for each data type > ============================================================= > S requires that for each datatype 4 URI's be registered > datatype > datatype.lex > datatype.val > datatype.map > > Sergey: Do you agree this is the case? If not, how many URI's are required > to implement ALL the idioms of S and coexist in the same model. nope ;) Surprise: only one URI is required. Price: special vocabulary is needed to identify lexical spaces, value spaces, and datatype mappings for a given datatype. Here how it works. In the simplest scenario, we define additional three properties (in total, not for each datatype), say rdfdt:isValueSpaceOf, rdfdt:isLexicalSpaceOf, rdfdt:isDatatypeMappingOf. Then, we write e.g. dc:Date rdf:range _:1 _:1 rdfdt:isValueSpaceOf xsd:date Voila! Defining the semantics of the above three rdfdt: properties is straightforward. Additionally, we can reuse xsd: URIs without concern. > Issue B7: Complexity > ==================== > > status: agreed > > S has several ways of expressing the same thing. An RDF processor has to be > aware of them all. If by RDF processor you mean a general-purpose API and/or parser, I disagree. The applications do need to deal with the diverse representation, but even then conditionally. Using different datatyping idioms in schemas amounts to establishing distinct "contracts" among applications. As explained above, it's in the hands of schema designers to make it easier to comply with the schema (using a less flexible representation), or harder (using a more general representation). We only provide the tools (= datatyping idioms). The burden of translating between different datatyping idioms would hit an application if it needs to interoperate with another, independently developed(!), but related application. In such case, the schemas of used in both applications are typically heterogeneous, i.e., use different properties, classes etc. Thus, we have a standard problem of data/application integration, where having different styles of datatyping is the easiest issue by far and can even be fully automated (in contrast, general schema mapping cannot be done fully automatically). > Supported by Jeremy's error cases message > > http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2002Jan/0397.html > > and a message from Andy Seaborne to rdf comments: > > http://lists.w3.org/Archives/Public/www-rdf-comments/2002JanMar/0058.html > > Issue B8: S-B encourages logically (sic) errors in the > application type processing. > ======================================================= > > status: ? Sergey agrees with the significance of B8 (but can live with it). > Given: > > _:f <rdf:type> <film> . > _:f <dc:Title> "10" . > <mary> <age> "10" . > > an application 'knows' that the range of <age> is an integer so it 'knows' > that mary has <age> 10. Under S-B, running a query: > > ?x <dc:Title> ?y . > ?z <age> ?y . > > will return ?x = _:f and ?z = <mary>, and knowing that the age of <mary> is > 10, may conclude that the title of the film is also 10. > > Can't Live With: Jeremy > > Issue B9: In TDL a document does not entail itself > ================================================== > > status: Withdrawn. > > Under TDL, does: > > <foo> <dc:Title> "W3C" . > > entail > > <foo> <dc:Title> "W3C" . > > yes. > > Issue B10: Say what you mean > ============================ > > status: ? > > The concern here is that in TDL, a literal denotes a pair consisting of a > value and a lexical representation of that value. The problem is then that > the german representation of floating point number, e.g. "10,5" is > different from the english representation, e.g. "10.5". > > Thus under TDL a german 10 and a half is a different thing from an english > 10 and a half. > > More formally, under TDL: > > <foo> <eg:size> _:s1 . > _:s1 <rdf:value> "10,5" . > _:s1 <rdf:type> <xsd:double-de> . > > <bar> <eg:size> _:s2 . > _:s2 <rdf:value> "10.5" . > _:s2 <rdf:type> <xsd:double> . > > does not entail: > > <foo> <eg:size> _:s . > <bar> <eg:size> _:s . > > Does anyone dispute the facts, or that this is a significant issue? I believe the above issue is closely related to B1 and B2... I'd like to raise another issue: Issue B11: Misuse of datatypes ============================== Given untidy graphs it is possible to create a "datatype" for persons and another one for names, so that literal "Martyn" may represent a person if it occurs in one context, or it may represent a person's name in another context. Thus, untidy graphs facilitate ambiguous modeling techniques. Sergey -- E-Mail: melnik@db.stanford.edu (Sergey Melnik) WWW: http://www-db.stanford.edu/~melnik Tel: OFFICE: 1-650-725-4312 (USA) Address: Room 438, Gates, Stanford University, CA 94305, USA
Received on Monday, 4 February 2002 22:12:39 UTC