big issue (2001-09-28#13)

This is a followup clarification regarding the issue #13 assigned to me
this Friday (what an inspiring number play) and christened "datatypes".
At the same time, after giving some context information, I'm throwing
the ball into the game by putting forth some (hopefully, provocative)
suggestions.

It seems to be generally acknowledged that the following 4 issues are
closely related and, thus, may need to be resolved simultaneously:

1. Are literals resources?

  Tracked as: #rdfms-literals-as-resources
  Dependent issue: #rdfms-literalsubjects, would be resolved immediately
if literals are resources

2. Are resource URIs opaque or composed of namespace + local name?

  Tracked as: #rdfms-uri-substructure
  Intro:
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Jul/0270.html

3. Are literals opaque or composed of unicode string + language ID/URI?

  Tracked as: #rdfms-xmllang
  Related: #rdfms-literal-is-xml-structure
  Summary:
http://lists.w3.org/Archives/Public/w3c-rdfcore-wg/2001Jul/0122.html
           (suggests literals are composite values)

4. How to use datatypes in RDF?

  Tracked as: #rdfs-xml-schema-datatypes
  Possible foundation: http://www.w3.org/TR/xmlschema-2/


Let me start the discussion backwards. What are the requirements for
using datatypes? Here is a quote from
http://www.w3.org/TR/xmlschema-2/#requirements:

- provide for primitive data typing, including byte, date, integer,
sequence,
  SQL and Java primitive datatypes, etc.; 
- define a type system that is adequate for import/export from database
systems
  (e.g., relational, object, OLAP); 
- distinguish requirements relating to lexical data representation vs.
  those governing an underlying information set; 
- allow creation of user-defined datatypes, such as datatypes that are
derived
  from existing datatypes and which may constrain certain of its
properties
  (e.g., range, precision, length, format). 

In other words, to use datatypes effectively, we must at least be able
to:

a) identify a resource (or literal) as a typed one
b) refer to a datatype as a resource (this allows defining datatypes,
determining their equivalence etc.)

Here is a tentative suggestion of how datatypes may be introduced into
RDF. This is also an attempt of a simultaneous attack on issues 1-4
above. Previously, I proposed several ways of treating literals as
resources. For a change, here is an alternative view:

s1) A resource is a pair of (URI, local name) (URIs may contain "#"
etc.; this satisfies the M&S requirement that given a property, one can
retrieve the schema describing this property)

s2) A literal is a pair <resource, unicode string>. The first component
of a literal denotes its (data)type.

That's it for now. Here is an example of how (s1),(s2) address (a),(b):

Notice that the datatypes themselves are resources. For instance, a
resource for "integer" defined in XML schema is
(http://www.w3.org/2001/XMLSchema-datatypes, integer). Literals are
tagged using resources. For instance, value "5" is tagged as "integer"
using <(http://www.w3.org/2001/XMLSchema-datatypes, integer), "5">. The
definition of the datatype of the above primitive value "5" can be
retrieved from http://www.w3.org/2001/XMLSchema-datatypes (according to
M&S requirement).

These are the (possible) consequences:

c1) Resources and literals are disjoint
c2) Language tagging in literals is done using the typing mechanism,
e.g.
    <(http://iso.org/lang/, en-us), "rat">
c3) #rdfms-literalsubjects is still open.
c4) s1,s2 say nothing about the type system or creation of user-defined
types. XML schema introduces a very elaborate one (maybe ugly, but
comprehensive), which may or may not be worth mirroring in RDF.

Rotten tomatoes are welcome.

Sergey

Received on Friday, 28 September 2001 13:05:50 UTC