ACTION 2001-11-02#02: Datatyping use-cases from CC/PP

[I hope this is timely.  I fear the whole datatyping debate is going off 
the rails.]

CC/PP [1] (and UAProf [2], a CC/PP compatible vocabulary), are existing 
specifications that use RDF, and make heavy use of implied datatyping for 
literals -- both in the sense of indicating a value space for literals and 
indicating the interpretation of literal strings denoting literal values.

CC/PP uses RDF properties to describe a user agent (client) system to an 
information provider (server), so that appropriately contextualized 
information can delivered to the user agent.  Each such property is used to 
describe a CC/PP client attribute, and they constitute a CC/PP 
vocabulary.  Associated RDF schema indicate the value type of each property 
used to describe a CC/PP attribute;  in most cases the attribute value is 
expressed as a literal, and the value type also indicates a mapping from 
lexical space to value space.

NOTE:  folks may disagree with details of how things have been done, BUT 
the point is that CC/PP demonstrates some ways of using RDF that have been 
devised by reasonable people trying in good faith to use RDF as 
described.  Also, CC/PP has been implemented at least twice to my knowledge 
and maybe more.


There are three levels at which the CC/PP specification invokes literal 
data types:


(A)
     it specifies some preferred simple data values and encodings for use 
as CC/PP client attribute values, with lexical mapping rules for 
each.  Most (but not all) of these are based on XSD data types.  e.g.

[[[
4.1.1.3 Integer number

Integer numbers may be positive, zero or negative. They are represented by 
a string containing a sequence of decimal digits, optionally preceded by a 
'+' or '-' sign. Leading zeros are  permitted and are ignored. The number 
value is always interpreted as decimal (radix 10). It is recommended that 
implementations generate and support integer values in the range 
-2147483647 to +2147483647, or -(2^31-1) to (2^31-1); i.e. integers whose 
absolute value can be expressed as a 31-bit unsigned binary number.
]]]


(B)
     it imports a (very) basic vocabulary for web clients from earlier IETF 
work (CONNEG).  An RDF schema for this vocabulary is in appendix C.

'pix-x' is an example of an integer-valued attribute:
[[[
   <ccpp:Attribute rdf:ID='pix-x'>
     <rdfs:label xml:lang="en">Pixel display width</rdfs:label>
     <rdfs:domain rdf:resource='&ns-ccpp;Component'/>
     <rdfs:range  rdf:resource='&ns-xsdt;integer'/>
     <rdfs:comment xml:lang="en">
       For raster displays, the width of the display in pixels.
     </rdfs:comment>
   </ccpp:Attribute>
]]]

'color' is an example of a string-valued attribute:
[[[
   <ccpp:Attribute rdf:ID='color'>
     <rdfs:label xml:lang="en">Color display capabilities</rdfs:label>
     <rdfs:domain rdf:resource='&ns-ccpp;Component'/>
     <rdfs:range  rdf:resource='&ns-xsdt;string'/>
     <rdfs:comment xml:lang="en">
       For display or print devices, an indication of the color
       rendering capabilities:
       binary  - indicates bi-level color (black-and-white, or similar).
       grey    - indicates gray scale capability, capable of sufficient
                 distinct levels for a monochrome photograph.
       limited - indicates a limited number of distinct colors, but
                 not with sufficient control for displaying a color
                 photograph (e.g. a pen plotter, high-light printer or
                 limited display).
       mapped  - indicates a palettized color display, with enough
                 levels and control for coarse display of color
                 photographs.
       full    - indicates full color display capability.
     </rdfs:comment>
   </ccpp:Attribute>
]]]

And here is an example that uses these attributes:

[[[
   <?xml version="1.0"?>
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
            xmlns:client="http://www.w3.org/2000/07/04-ccpp-client#">
      <rdf:Description rdf:about="http://example.com/WebDisplay">
        <rdf:type rdf:resource="http://example.com/Schema#WebDisplayPlatform"/>
        <client:pix-x>320</client:pix-x>
        <client:pix-y>200</client:pix-y>
        <client:color>limited</client:color>
      </rdf:Description>
   </rdf:RDF>
]]]


(C)
     WAP forum have created UAPROF, a vocabulary for CC/PP that uses 
literals in various ways (some of which I don't agree with).  CC/PP was 
designed to be compatible with the existing deployment of UAPROF.  Here's a 
simple example:

[[[
   <?xml version="1.0"?>
   <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
            xmlns:prf="http://www.wapforum.org/UAPROF/ccppschema-20000405#">
      <rdf:Description rdf:about="http://example.com/HWDefault">
        <rdf:type rdf:resource="http://example.com/Schema#HardwarePlatform"/>
        <prf:cpu>PPC</prf:cpu>
        <prf:display>320x200</prf:display>
      </rdf:Description>
   </rdf:RDF>
]]]

The literal value associated with the RDF property 'prf:cpu' encodes a 
member of an enumerated set in a literal string.

The value associated with the RDF property 'prf:display' encodes two 
integer values in a literal string.


(D)
     IETF work on content negotiation work (CONNEG) has created an IANA 
registry of media features [3].  (This isn't strictly part of CC/PP, but I 
am working on a proposal to allow all IANA registered media feature tags to 
be usable as CC/PP attributes -- by giving them a URI
form.)

The goals of the CONNEG work were in some ways more ambitious than CC/PP, and
(I think) overlap some of the description logic (DL) background that is 
informing the DAML+OIL
work.  Feature matching is based on something like a DL subsumption 
calculation.  A fixed interpretation of literal strings was a simple way to 
make this work.  Certainly, *some* way to interpret and appropriately 
compare literal strings was needed.


CONCLUSIONS
-----------
I draw the following conclusions:

(a) given a literal string, there needs to be some way to decide how it 
should be interpreted as a literal value.

(b) applications should be able to perform basic RDF handling without 
knowing how to interpret the literal strings.

(c) there is an existing base of RDF usage in which interpretation of a 
literal string depends upon knowledge of and about a property that 
references it.

(d) if we are to support existing RDF applications, interpretation of 
literal strings cannot be confined to a single defined scheme of data 
typing and interpretation.  Flexibility to accommodate things like 
prf:display is needed.

(e) insistence on having complete information about data typing and 
representation embedded in
any single piece of RDF is not an option.  In some cases, full knowledge 
may be embedded in a specific application that processes the 
RDF.  (Yes:  such information will not be accessible to generic RDF 
processors.  Tough luck.  We can encourage future RDF application designers 
to do better -- that was my goal in specifying preferred simple attribute 
types for CC/PP, based on my CONNEG work.)


#g
--


[1] http://www.w3.org/TR/CCPP-struct-vocab/
     (public version)
     http://www.w3.org/Mobile/CCPP/Group/Drafts/WD-CCPP-struct-vocab-20010906/
     (Near-final WG working draft with revisions - needs W3C member access)

[2] http://www.w3.org/Mobile/CCPP/Group/Drafts/uaprof-099.doc
     (This may need W3C member access --
      a public version can also be obtained from WAP forum site)

[3] http://www.iana.org/assignments/media-feature-tags
     See also:
     http://www.isi.edu/in-notes/rfc2506.txt -- describes the registry and 
procedures
     http://www.isi.edu/in-notes/rfc2533.txt -- describes an expression 
format and
       how literal data types are interpreted.



------------
Graham Klyne
(GK@ACM.ORG)

Received on Monday, 5 November 2001 07:05:13 UTC