Re: Skipping the Sax-model transformation step in the new RDF/XML syntax spec from Dave Beckett on 2002-01-22 (www-rdf-comments@w3.org from January to March 2002)

From: Dave Beckett <dave.beckett@bristol.ac.uk>
Date: Tue, 22 Jan 2002 11:50:17 +0000
To: Sjoerd Visscher <sjoerd@w3future.com>
Cc: www-rdf-comments <www-rdf-comments@w3.org>
Message-ID: <4895.1011700217@tatooine.ilrt.bris.ac.uk>
Sorry for the delay in replying.

>>>Sjoerd Visscher said:
> I'm sorry to mail to you directly about this, but the www-rdf-comments lists
> seems to contain only spam.

Sure; I'm reading that list anyway.

> I like the direction the new RDF/XML Syntax Specification is going. However,
> I had the feeling the intermediate SAX-like model was a step too much. So I
> tried to express the RDF/XML Grammar directly in Infoset terms, using
> http://www.w3.org/2001/04/infoset

I did consider using infoset item directly and I'm trying to think
exactly what my reasoning was thn.

Something to do with since 90+% of the RDF/XML parsers use streaming
SAX events, and familar forms of grammars (BNFs etc) are a
serialisation, it seemed much more natural for presenting to parser
writers, to give a grammar based on concepts very close to the
software.  i.e. if you have SAX events, you can skip most of the
detail and just look at what to do with sequences of them.

[Although we are being SAX-orientated; this is not a required
parsing method; any other that produces the same output for the same
input and passes the test cases, is also OK.]

I was looking at the XPath nodeset as a starting basis and although
it was fine for XPath, for this problem it needed new nodes
(Identifer) and some node properties (identifer-type) as described in
http://www.w3.org/TR/2001/WD-rdf-syntax-grammar-20011218/#section-Data-Model

> 
> It turned out to be quite straight forward, it looks a lot like the Relax NG
> Schema. You'll find it attached. I tried to use the same notation as the
> Spec uses, i.e. classname(propertyname restriction [,...]) Properties that
> have no restrictions aren't shown, f.e.:
> 
> nodeElement = Element(
>   attributes=set((idAttr | aboutAttr)?, bagIdAttr?, propertyAttr*),
>   children=propertyEltList)
> 
> Which means that the namespaceName and localName are unrestricted.
> Unrestricted means unrestricted from the RDF Grammar point of view,
> properties like 'parent' are already restricted by the infoset
> specification.

It think, although precise, for a parser app taking in SAX events one
by one, it is too much to expect it to match big lumps of XML like
the above.  It isn't clear given a particular event, what to do then
or when the next one arrives - i.e. the state machine.

Not sure what you mean by understricted.  Some things are allowed,
some are restricted, some are forbidden (like non-namespaced prefixed
attributes).  By omitting the namespaceName and localName does that
mean any values are allowed - no.  We need to be more precise than
that.


> Some specific features:
> 
> ws = Character(
>   elementContentWhitespace=Boolean.true)
> 
> Boolean.true is defined in the rdf version of the infoset.
> 
> parseTypeLiteralPropertyElt = Element(
>   attributes=set(idAttr?, parseLiteral))
> 
> Here the children property is unrestricted.

This area - parseType literal - is still under consideration, so I
can't really make the definitive change here until we decided what is
allowed inside it.  There are *lots* of issues here for embedded XML
- namespaces, xml:base, XML Canonicalisation, ... and we aren't yet
in a stage to resolve it.

 
> It is also easy to check what Infoset features remain unrestricted:
> prefixes, namespace declarations, if attributes are specified in a DTD or
> not, CDATA, etc. And which are restricted: PI's or comments are not allowed,
> except inside the parseTypeLiteralPropertyElt, and no document type
> declaration. But maybe that is a too strict translation.

RDF/XML doesn't care about/use such things - in the next grammar I
will add that other infoset items are ignored (some of these don't
have SAX events).  I expect the list will be as follows:

  Processing Instruction
  Unexpanded Entity Reference
  Comment
  Document Type Declaration
  Unparsed Entity
  Notation

  -- reading http://www.w3.org/TR/xml-infoset/#infoitem

> 
> Kind regards,
> 
> Sjoerd Visscher
>   w3future.com

Original attachement removed; see
  http://lists.w3.org/Archives/Public/www-rdf-comments/2001OctDec/att-0391/01-infoset2rdf.txt

Thanks for the feedback

Dave
Received on Tuesday, 22 January 2002 06:50:27 UTC