FW: Comments on RDF last call working drafts

Dear Colleagues,

> Mark and Colleagues,
> 
> Thank you for taking the time to review RDFCore's documents 
> and provide 
> feedback.
> 
> Your comment has been recorded as
> 
>    http://www.w3.org/2001/sw/RDFCore/20030123-issues/#jsr188-01
> 
> The WG will consider your comment and you will hear further 
> from us in due 
> course.
> 
> I note your concerns with local datatyping as:
> 
>    a) it uses up more network bandwidth.
>    b) you are concerned about inconsistency
> 
> Concerning a:
> 
>    - is there a quantative assessment of the impact on bandwidth
>    - has the use of entity declarations to provide a more compact 
> representation been considered
>    - has the use of DTD default attributes to provide a more compact 
> representation been considered
> 
> Concerning b:
> 
>    - could you provide an example of the sort of 
> inconsistency you are 
> concerned about.

I'm sorry about the delay in replying with these additional comments. I have
circulated an earlier version to these comments to the JSR-188 EG, and then
revised them based on their comments. However I have not had time to
circulate the comments to the JSR-188 EG for a second review. Therefore
please regard these comments as being submitted by me rather than JSR-188
for the moment.

==================

Firstly, to help you resolve this issue, I would like to explain how I would
like to see this issue resolved:

1. Most preferred solution:

Allow both global and local datatyping. 

2. Secondary solution:

The use of DTD default attributes (DTD-DA) is a good one and I had not
considered it. The primary barrier to adopting it is the difficulty of using
DTDs with the current RDF/XML serialisation. However if the RDF/XML
serialisation was simplified so it is more compatible with DTDs as proposed
in
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-12
then CC/PP and UAProf could adopt the simplified syntax. This in itself
would resolve many of the problems I outline below. In addition, the use of
DTD default attributes would avoid the need for authors to include
datatyping information "by hand" in CC/PP and UAProf documents. 

So here my solution would be to resolve xmlsch-12 by setting up a work item
to produce a document that proposes a simplified RDF/XML syntax that allows
the DTD-DA solution to be used. This could then be used by CC/PP and UAProf.
Note this work item should not delay progress on the existing RDF documents.


==================

Secondly here are some more detailed answers to your questions:

Of these two issues, I feel the inconsistency / increasing the difficulty
for RDF/XML authors is the more important. 

RDF/XML is often described as a format for machines, not people. However
currently many people using RDF, including authors of UAProf and CC/PP
profiles, have to author RDF/XML directly. From my work on CC/PP and UAProf
I have observed that authors have difficulty authoring RDF/XML and I
attribute this to a number of reasons:

- unfamiliarity with the RDF/XML format

- confusion over the multiple serialisation forms

- difficulty reading the striped syntax 

- confusion arising due to revisions in the format (for example most
profiles do not place a namespace prefix in front attributes - see 
http://w3development.de/rdf/uaprof_repository/)

- the lack of automatic validation tools. Here I don't mean tools like the
W3C RDF validator that just validate documents as being valid RDF. I mean
tools that validate a document to check it conforms to a specific schema
structure and a controlled vocabulary. For example in UAProf it is common to
encounter misspelt properties e.g.
<prf:AudioEncorder> instead of <prf:AudioEncoder>
<prf:BitsPerPixels> instead of <prf:BitsPerPixel>
In XML it is common practice to use DTDs or Schemas to validate documents.
Although it is possible to use RDFS to perform some validation of RDF/XML,
this necessitates the creation of custom validation tools which is
unnecessary with XML - for a fuller description of the problems see
http://www.hpl.hp.com/techreports/2002/HPL-2002-268.html
As issue 
http://www.w3.org/2001/sw/RDFCore/20030123-issues/#xmlsch-12xmlsch-12 
points out it is difficult to using the existing XML approaches with the
current RDF/XML format. 

As a result of this, I am wary of any change in RDF/XML which will place
additional burdens on authors. 

Specifically in the case of CC/PP, all CC/PP and UAProf profiles which
conform to a specific vocabulary will give the same data type to specific
attributes, so why include this information in every document (i.e. local
datatyping) increasing the chance of errors on behalf of the author? Just as
in relational databases, it is common practice to normalise schema design in
order to avoid field replication and data integrity issues, it is desirable
to do the same thing here with data type information. 

I anticipate the current data type decision will cause authors to make
errors such as
- omitting the data type definition altogether
- giving the attribute the wrong data type
- without suitable validation tools, they may introduce typing errors in the
URL used to indicate the data type


As for the network bandwidth problem, I anticipate that adopting this
approach to datatyping will make profiles 20-30% larger than existing
profiles, depending on whether profiles use DTD entities or not as you
suggest. I provide details of my calculation below. 

INCREASES RESULTING FROM DATATYPING

In UAProf version 20010330, 20 out of the 62 attributes will require
datatyping.

Here we will assume an estimated length of an attribute definition is 41
characters e.g.
<prf:ColorCapable>Yes</prf:ColorCapable>

With datatyping, this increases to 97 characters.
<prf:ColorCapable
rdf:datatype="http://www.w3.org/2001/XMLSchema#boolean">Yes</prf:ColorCapabl
e>

Profiles have a static amount structure which is approximately 1270
characters e.g. Namespace information

<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:prf="http://www.wapforum.org/profiles/UAPROF/ccppschema-20010430#">
 <rdf:Description rdf:ID="MyDeviceProfile">

and six component declarations

 <prf:component>
  <rdf:Description rdf:ID="HardwarePlatform">
  <rdf:type
rdf:resource="http://www.wapforum.org/profiles/UAPROF/ccppschema-20010430#Ha
rdwarePlatform"/>
  </prf:component>

Therefore current profile length = 1270 + (62 * 41) = 1270 + 2542 = 3812
After datatyping, profile length = 1270 + (20 * 97) + (42 * 41) = 1270 +
1940 + 1722 = 4932

This increases profile length by approximately 30 %.

USING ENTITIES

Declaring entities at the top requires a fixed section 247 characters long

<!DOCTYPE rdf:RDF [
<!ENTITY type-boolean  'http://www.w3.org/2001/XMLSchema#boolean'>
<!ENTITY type-number 'http://www.w3.org/2001/XMLSchema#number'>
<!ENTITY type-dimension
'http://www.wapforum.org/profiles/UAPROF/ccppschema-20010430#dimension'>
]>

using entities gives an estimated length of 70
<prf:ColorCapable rdf:datatype="&type-boolean">Yes</prf:ColorCapable>

Therefore using entities, profile length = 1270 + 247 + (20 * 70) + (42 *
41) = 1270 + 247 + 1400 + 1722 = 4639

Here profile length is increased by approximately 20%.

Mark H. Butler, PhD
Research Scientist                HP Labs Bristol
mark-h_butler@hp.com
Internet: http://www-uk.hpl.hp.com/people/marbut/

Received on Friday, 21 March 2003 10:38:48 UTC