Re: Action 2001-11-16#9 (datatypes)

Thanks Patrick.  Comments below.

Patrick.Stickler@nokia.com wrote:

> Great summary Frank. A few comments and clarifications below
> (to be sure I'm understanding everything as you intended) ...
> 
> 
>>-----Original Message-----
>>From: ext Frank Manola [mailto:fmanola@mitre.org]
>>Sent: 06 December, 2001 14:07
>>To: RDF core WG
>>Subject: Action 2001-11-16#9 (datatypes)
>>
>>
>>
>>>Action 2001-11-16#9 FrankM Clarify the architectural and 
>>>
>>other broader
>>
>>>concerns with any datatyping scheme that must be considered.
>>>
>>As usual, the action recorded in the minutes sounds much grander and
>>well-thought-out than what I actually had in mind when I raised the
>>issue (thanks to charitable interpretation by the scribe!)  
>>Anyway, what
>>I was getting at was that, in the ongoing datatype 
>>discussion, what with
>>all the discussion about the details of how the various proposals
>>worked, what things in those proposals denoted, and so on, I 
>>was afraid
>>we were losing track of what requirements we were trying to 
>>support.  We
>>need to be able to keep straight, in discussing the various 
>>approaches,
>>whether we are (for example) disagreeing about the requirements, or
>>disagreeing about how to support them (or both, or neither).  
>>This isn't
>>to say there hasn't been discussion of requirements, and a 
>>number of the
>>submissions have described individual requirements, or lists of them,
>>but I'd like to see more discussion (perhaps in parallel with what's
>>going on now) specifically targeting requirements, and 
>>determining what
>>priority to give to them.  Here's a list of the sorts of 
>>requirements I
>>have in mind (in no particular order).  Most, if not all, of them have
>>been cited in earlier postings.  In some cases, these requirements may
>>overlap or are contradictory.  I meant (and still do mean) to develop
>>this theme further, but I thought I'd get this much out now.
>>
>>--Frank
>>
>>Example requirements:
>>
> 
> Should we follow Pat's original mental dump format by trying to
> construct a matrix showing which proposals meet which requirements.
> That way, we can perhaps more easily compare the pros/cons of
> each proposal insofar as the requirements are concerned.


Yes, that would be a good presentation.  But things we also need to be 
thinking about (and identifying) in connection with requirements 
(perhaps each of us individually first) are:

a.  whether there are necessarily conflicts among the requirements 
(i.e., whether satisfying one necessarily precludes satisfying another)

b.  the priorities we assign to satisfying the various requirements 
(which becomes important if there *are* conflicts).


>  
> 
>>1. backward compatibility with existing RDF data and 
>>applications usage
>>
> 
> We should specifically identify what this means. I.e. specific
> idioms, interpretations, etc.


Yes (in fact, this is true for all the requirements;  stated generally, 
they may mean multiple things, and we need to be specific).


> 
> 
>>2. works with XML Schema data types (or, at least, doesn't preclude
>>using them)
>>[In considering this requirement, consider both the case where we're
>>processing RDF/XML containing uses of XML Schema data types, and the
>>case where we're processing triples.  Are there differences?]
>>
> 
> We need to clarify what "works with XML Schema data types" means. I.e.
> a) only simple data types
> b) simple and complex data types (though I don't see how the latter is
>    really possible)
> c) only referencing data type URIs
> d) employing XML Schema mechanisms/vocabulary
> e) ...


Yes.


> 
> 
>>3. compatibility (or at least non-interference) with the current
>>DAML+OIL approach to datatypes
>>
> 
> Should this be a sub-requirement/component of #1 above?


Both #1 and #2 (since the DAML+OIL mechanism specifically uses XML 
Schema types).  I think it's particularly important to explicitly deal 
with DAML+OIL (and the ongoing WebONT work) because I think we need to 
take seriously the need for a coherent set of W3C specifications in this 
area, and address how they are supposed to work together.


> 
> 
>>4. ability to use non-XML-Schema datatypes (custom or user-defined
>>datatypes, or those from major components external to RDF, like SQL or
>>UML datatypes) 
>>[The idea here is that I should be able to indicate that I'm using
>>datatypes from a given datatype scheme by citing its URI, and handoff
>>datatype-specific processing at appropriate points to a processor that
>>is associated with that scheme.  If XML Schema datatypes are 
>>handled in
>>RDF without building them into RDFS processors (presumably by
>>appropriately invoking an XML Schema processor), then it ought to be
>>possible to generalize this architecture to allow other processors.]
>>
> 
> This would likely suggest an interpretation of #2, such that, if we
> must support non-XML Schema data types (which I take as imperative)
> then this would seem to preclude an interpretation #2(d).


Depends on what #2(d) means in detail, but it certainly means that that 
"employing XML Schema mechanisms/vocabulary" can't mean doing that in 
such a way as to preclude using other datatype schemes.


> 
> 
>>5. ability to represent type information without an associated schema
>>
> 
> I take this to mean "local typing". I.e. type is associated with the
> object node in the graph by some idiom (DAML/DC/U/...). Right?


Yes.


> 
> 
>>6. ability to represent type information in an associate schema
>>
> 
> I take this to mean "global typing". I.e. the use of rdfs:range
> or similar mechanisms. And by "schema" you mean RDF schema not
> XML schema.


Yes.


> 
> 
>>7. ability for an RDF processor to handle typed literals without
>>building the types in
>>
> 
> This is likely related to #2 and #4 as to how generic "open" support 
> for data typing schemes is provided.


Yes.


> 
> 
>>***Others?****
>>
> 
> There should be a single conceptual model of data typing that is not tied
> to a specific idiom, and if multiple idioms are adopted then they
> should be defined as synonyms for this underlying model. We should
> not adopt multiple idioms which are based on different models, even
> if one can define the relations between the models and map between
> them.


I'm not sure I know what this means right now, but that doesn't matter. 
  We should collect whatever requirements people think are appropriate, 
and then do the analysis as to whether the WG thinks they really are 
requirements, what priority they have vs. others, etc.


> 
> 
>>Architectural isues arise in, among other things, thinking 
>>about how and
>>where various bits of processing involving data types are going to
>>occur.  For example, in
>>http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Oct/0448.html,
>>Pat asks:
>>
>>
>>>>(C4) are multiple type assignments allowed? (e.g. US 
>>>>
>>dollar, decimal)
>>
>>>Better, what happens when they occur? Eg suppose two 
>>>sources/documents/whatever supply different such information; which 
>>>part of the RDF machinery complains?  (the lexical analyzer, the 
>>>parser, an inference engine, or some other datatyping module?)
>>>
>>So what machinery are we assuming, and how do we assume the various
>>pieces handoff information to each other?
>>
> 
> Is this basically the issue of validation? I.e. which layer tests
> that some lexical form is a member of the lexical space of a data
> type and if multiple data types are specified, that the values
> denoted constitute the same or analogous values. Where are lexical
> errors or contradictions trapped. Right?


Validation is one place where we need to have a clear picture of the 
machinery, but it seems to me there must be others as well.  Any time I 
have to use the value of a property, I could (depending on how you 
assume processing is going to work) need to interpret a literal using 
its associated datatype machinery.  Suppose, for example, that my 
runtime processing gets information from two separate collections of RDF 
  (using different data typing schemes) on the Web, and needs to check 
two values for equality.  This scenario may be considered beyond our 
specific scope (since you'd also presumably need to deal with vocabulary 
differences in the general case), but even if we don't need to have a 
"clear" picture of how this might work, I'd sure like to have at least a 
"vague" picture.  We might decide that, as Pat suggests in his example, 
some inference engine is involved, but that decision, based as it would 
be on some general agreement on how processing would be done, would be 
very helpful guidance in understanding the issues, in my opinion.


Regards,


Frank

 

-- 
Frank Manola                   The MITRE Corporation
202 Burlington Road, MS A345   Bedford, MA 01730-1420
mailto:fmanola@mitre.org       voice: 781-271-8147   FAX: 781-271-875

Received on Tuesday, 11 December 2001 09:16:08 UTC