Re: RDF syntax 'improvements'? from Lee Jonas on 2000-08-07 (www-rdf-interest@w3.org from August 2000)

From: Lee Jonas <lee@oakglen.netkonect.co.uk>
Date: Mon, 7 Aug 2000 11:03:48 +0100
To: <www-rdf-interest@w3.org>
Message-ID: <PLEJLECAFJELCIAEAEALCEKOCAAA.lee@oakglen.netkonect.co.uk>
I have only recently joined this list.  I have scanned most of the archive
but please forgive me if I am rehashing anything.

I don't doubt that a log syntax which closely resembles the triples is
useful to RDF developers.  However, it side-steps the issues I was raising,
which is far more fundamental to RDF Syntax.


Dan Brickley <mailto:danbri@w3.org> wrote:
<
>Here are some things that could happen. We could/should bring the
>errata document for RDF M&S up to date with the experience of RDF
>implementors, and make available answers to FAQs where these
>seem clear, and writeup summaries for topics (eg. the xmlns
>prefix pairing business) that are perhaps not so clear. We could
>explore possibility of new work on a 'better' rdf syntax, either
>as a W3C Working Group as an informal effort amongst RDF
>implementors on this list, with the intention of publishing
>either a new 'better syntax' REC (which would be a substantial piece
>of work) or an informational W3C Note outlining an alternative
>XML syntax for RDF models. ('we' being the RDF implementor
>community, ie. RDF IG)


I for one think Dan's suggested course of action is not only reasonable but
necessary.  Though, if I am correct, my concerns over the misuse of
namespaces are significant enough to warrant further activity to define a
'better' RDF Syntax (or some general web data graph syntax), which will also
have a knock on effect into the RDF Schema spec as well.

W3C could let RDF Model & Syntax 1.0 remain a recommendation and issue a
Note with an alternative syntax.  Although this would be a pragmatic
work-around, I believe that RDF Syntax is in error and hence am not
convinced it is sufficient.  If the problems are deemed serious enough, the
W3C could begin a working draft that will supercede it.  Only that way could
the qname-to-URI mapping be fully removed from the recommendation.  If the
latter is necessary, then I believe RDF Schema should not become a
recommendation until syntax issues are resolved.


Dan Brickley <mailto:danbri@w3.org> wrote:
>I'm not personally convinced that a new alternate RDF syntax is a
>priority right now, though I'd like to hear arguments to the contrary.


I realise that there are some early implementations and at this stage in the
recommendation process there would be an understandable reluctance to alter
the existing code base.  However, RDF is not yet widely adopted.  Any delay
now could mean far more dependencies on the current (erroneous?) RDF Syntax.
In addition, I believe the added complexity and confusing nature of current
RDF Syntax could be a barrier to its widespread adoption by the general web
community.  The sooner an alternative syntax is developed, the sooner RDF
can realise its full potential.

So, Dan, perhaps you could review the concerns at the end of this message
for inclusion in the 'developer issues' list.

In Summary, I think that:
1) qname-to-URI mapping is a perversion of XML Namespace and has subtle, yet
fundamental negative implications.  It should be withdrawn from the current
RDF M & S recommendation.
2) RDF Syntax  must be a well-defined, finite set of element types and
attribute names encapsulated within the rdf namespace.
3) RDF Syntax must consist of one and only one clear way of serialising RDF
model, no alternative 'abreviation' syntax forms.  This is not to say that
RDF Syntax is the only way of serialising RDF Model.  Other general web data
graph syntaxes could be investigated for this purpose (though not XLink).
4) RDF Schema must not reflect any aspects of syntactic validity.
5) RDF Schema documents must not be locatable implicitly from any namespace
URI.


Dan Brickley <mailto:danbri@w3.org> wrote:
>we should step back and ask for characterisations of what we
>want from an XML syntax for RDF. What are the must-haves?  What would the
goals
>be for any effort to provide a 'better' syntax? ie. what would make it
>better...?


If people agree that further activity is necessary, I will translate my
thoughts into what I believe should be included in the list of goals,
must-haves, etc.

Anyway, here are my main concerns:

1) Mapping a QName to a RDF Identifier
======================================
RDF specifies that all resources should be identifiable by URIs.  This is a
good aspect of RDF as it allows decentralised (and even, in XLink parlance,
'third-party') resource descriptions and resource-type hierarchies (i.e.
using rdfs subClassOf & subPropertyOf).

However, IMHO RDF perverts the intended use of namespaces to achieve this.
The use of a qname, to map a localpart element type or attribute name
together with its namespace's unique URI into a RDF identifier, causes major
problems.  I believe this to be the case whether that qname is an element
type, an attribute name, or even an attribute value or text-node.
Specifically:

a) mixing RDF with other markup vocabularies
--------------------------------------------
From Namespaces in XML (http://www.w3.org/TR/1999/REC-xml-names-19990114):

'We envision applications of Extensible Markup Language (XML) where a single
XML document may contain elements and attributes (here referred to as a
"markup vocabulary") that are defined for and used by multiple software
modules. One motivation for this is modularity; if such a markup vocabulary
exists which is well-understood and for which there is useful software
available, it is better to re-use this markup rather than re-invent it.

Such documents, containing multiple markup vocabularies, pose problems of
recognition and collision. Software modules need to be able to recognize the
tags and attributes which they are designed to process, even in the face of
"collisions" occurring when markup intended for some other software package
uses the same element type or attribute name.'

Although mappings between XLink & RDF have been proposed (using XLink as an
alternative web data graph syntax in order to serialise RDF Model), I would
like to use a hybrid of the two that combines both metadata and hyperlinks -
hence, IMHO, XLink should *not* be used as an alternative web data graph
syntax for serialising RDF Model.

Consider the following:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:xlink="http://www.w3.org/1999/xlink"
         xmlns:my="http://mydomain.com/mylinks#">
   <my:link xlink:type="extended">
      <my:locator xlink:type="locator" xlink:href="#A"/>
      <my:locator xlink:type="locator" xlink:href="#B"/>
   </my:link>
</rdf:RDF>

In the true nature of 'Namespaces in XML', with small changes to RDF Syntax
(i.e. not using the qname-to-URI mapping mechanism), this could become a
*very* straightforward way of combining RDF and XLink - RDF describes
resources in terms of metadata (note that the link and locator elements
should themselves be treatable as resources by RDF) and XLink interprets
this as a 'third-party' link between (RDF metadata) resources.

However, my understanding (though I could be wrong?) is that an RDF
processor would assume 'xlink:type', 'xlink:href', etc. to be RDF metadata
whose definitions are identified by the URIs
'http://www.w3.org/1999/xlinktype', 'http://www.w3.org/1999/xlinkhref', etc.
respectively.  This is clearly wrong as the XLink namespace has nothing to
do with RDF.

This problem stems from the fact that RDF uses namespaces to specify RDF
identifiers instead of their intended use - for demarking markup
vocabularies.

Any counter-arguments specific to XLink w/ RDF aside, the example above
highlights a general problem mixing RDF with any other markup vocabulary.

b) Fuzzy segregation between RDF Model and RDF Syntax
-----------------------------------------------------
It seems to be a contentious issue that Model & Syntax aspects are so
closely intertwined, even to the extent of combining them into the same
specification (rightly so, IMHO).

The qname-to-URI mapping mechanism allows serialisation of resources from
the abstract RDF model into a multitude of different markup lexemes (i.e.
element types and attribute names).  I strongly suspect that this direct
encoding of abstract entities into (an open-ended collection of) syntactic
constructs is the reason why there is such a degree of 'tangling' between
RDF Model & RDF Syntax.

If a well defined, finite RDF markup vocabulary was used, RDF Model would
become totally distinct from RDF Syntax.

c) XML Schema vs RDF Schema
---------------------------
XML Schema also utilises namespaces to imply the location of schema
documents.  Does the resulting ambiguous nature of schema document location
by namespace then make XML Schema and RDF Schema incompatible?  My guess
would be 'yes, unless you invent yet another overly complex way to untangle
the mess.'

AFAIK, XML Schema is an alternative to DTDs for describing the validity of
an XML document (markup and data).  As such I view it is a 'syntax schema'
and associations with namespaces (which after all distinguish different
syntaxes) is justified.  Also, as RDF Syntax should be the mere
serialisation of RDF Model into an XML document, it seems reasonable to want
to make assertions about the validity of those documents using XML Schema.

Due to fuzziness between RDF Model & Syntax as outlined above, RDF Schema is
forced to imbue RDF Syntax validity assertions, hence a potential clash with
XML Schema.  However, I believe it would serve a far better purpose to
describe the validity of RDF models at an abstract level without regard to
its encoding in XML.  Note that this would be invaluable to facilitate
encoding RDF with some general web data graph syntax in common with other
XML technologies (if feasible) e.g. SOAP.

Totally separating RDF Model from RDF Syntax would allow RDF Schema to
become a pure 'model schema'.  It would still be specifiable as any other
RDF Model and hence serialised into RDF Syntax the same as RDF instance
documents.  Indeed, there is no reason why you couldn't have RDF validity
assertions internally within instance documents.

It follows that, as a 'model schema', there should be no association between
namespaces and RDF schema documents - an alternative mechanism should be
used, e.g. either implicitly from the URI specified by the 'rdf:type'
property, or by the 'isDefinedIn' property if the type is identified by a
URN (I retract my suggestion in a previous message that resource identifiers
should only be URLs).

The logical view then becomes:

++++++++++++++++++++++++++++++++++++
Application Layer
++++++++++++++++++++++++++++++++++++
RDF Model     |   Abstract, validated by RDF Schema
++++++++++++++++++++++++++++++++++++
RDF Syntax    |   XML doc, validated by DTD / XML Schema
++++++++++++++++++++++++++++++++++++


d) What real purpose does it serve?
-----------------------------------
I can't think why the current qname-to-URI mapping scheme is in place, apart
from for the sake of brevity.  Note that with resource URIs specified as
attribute values and / or text nodes, there are two ways to abbreviate them
that I can think of right now:

General Parameter entities:

<!DOCTYPE rdf:RDF [
   <!ENTITY my 'http://mydomain.com/myschema/'>
]>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns">
   <rdf:Description rdf:about="&my;#SomeResource"/>
</rdf:RDF>

or XML Base:

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns"
         xml:base="http://mydomain.com/myschema/">
   <rdf:Description rdf:about="./#SomeResource"/>
</rdf:RDF>


2) XPath
========
Dan wrote:
<<
The XSLT / Semantic Web Screenscraping threads on this
list have shown how we can extract RDF models from all manner of well
managed XML data
>>

Let me clarify what I meant.  My concern is about going in the other
direction:
1) XPath and XSLT - Suppose I want to visualise RDF in a page by turning
resource descriptions into html tables containing name-value pairs of all
their properties.
2) XPath and XPointer - Suppose I want to mix RDF and XLink vocabularies and
add hyperlinks, using XPointer, between resource descriptions.

With the various abbreviation forms available, there are at least two or
three ways of saying the same thing with current RDF syntax.  This means
unless I know the style used in advance, I have to specify the union of
different nodesets, one for each syntax variant.

In addition, with an unlimited, arbitrary set of element tags I cannot write
generic XPath selections that use specific element types as axes.  Instead,
the best I can do is rely on position and levels of nesting within the
document, if known.

Indeed there was an earlier post that contained a presentation proposing
just this.  However, it relied on the RDF being 'canonicalised' so that
*all* resource descriptions had to be children of the document element, and
properties referred to other resources by reference only.  This seems far
too restrictive to me.

Alternatively, I would have to select all nodes (i.e. an axis of
'//node()'), then filter down to the ones I want based on using attributes
and element tag names as predicates.

Any way you look at it, my XPath string gets unneccessarily complex.  I am
not sure what the impact is on performance, but I would guess that more
processing would make it slower.


Regards

Lee
Received on Monday, 7 August 2000 06:02:36 UTC