References to CSS rules in RDFa syntax document

Hello all,

We've discussed defining two of our processing rules in terms of other
sets of rules, such as those in CSS:

  * the conversion of a sequence of elements that contain text into one
    text string;

  * the removal of leading and trailing whitespace.

Although there is still some discussion to have on these, I realised
today that we could actually define these rules quite clearly by using
the XPath specification--and in many ways that would be more
'correct'. So I'm sending this email mainly as a
'reminder-to-self-and-Shane' to look into this further, but I thought
I'd put it on the list in case anyone has any comments, particularly
from an implementation perspective.

The key 'concepts' that I'm thinking we need from XPath are the
function normalize-space() and the idea that all nodes have a
'string-value'.

For example, the 'string-value' of an element is defined as:

  The string-value of an element node is the concatenation of the
  string-values of all text node descendants of the element node in
  document order.

In turn, the 'string-value' of a text node is defined as:

  The string-value of a text node is the character data. A text node
  always has at least one character of data.

And so on, including a definition of the 'string-value' of the root node.

You can see that this is better than the definition I added to the
syntax document:

  The actual literal is either the value of @content (if present) or a string
  created by concatenating the text content of each of the child elements
  of the [current element] in document order...

since I don't define "text content", whilst the idea of "character
data" is very familiar. Also, given that XPath underpins a number of
specifications it would be wise to use their version of any concepts
that we share, rather than writing them afresh.

On the space normalisation, XPath defines the normalize-space()
function as follows:

  Function: string normalize-space(string?)

  The normalize-space function returns the argument string with
  whitespace normalized by stripping leading and trailing whitespace
  and replacing sequences of whitespace characters by a single space.
  Whitespace characters are the same as those allowed by the S
  production in XML. If the argument is omitted, it defaults to the context
  node converted to a string, in other words the string-value of the
  context node.

This again is much better than what we have at the moment, since the
idea is to use CSS rules:

  ... by concatenating the text content of each of the child elements
  of the [current element] in document order, and then normalising
  white-space according to [WHITESPACERULES].

In my view the XPath approach is better since it specifically refers
to the "S production in XML", and given that we are using XHTML at the
moment, this seems to me to suitably precise.

So I believe we should either refer to these two ideas, or even import
the prose as is, if we have to.

One way of using XPath by reference would be to define our processing
in terms of the XPath concepts. At the moment we say this:

  The actual literal is either the value of @content (if present) or a string
  created by concatenating the text content of each of the child elements
  of the [current element] in document order, and then normalising
  white-space according to [WHITESPACERULES].

But we could say:

  The actual literal is either the value of @content (if present) or a string
  created by {processing that has the same effect as taking the XPath
  string-value of the [current element] and passing it to the XPath
  normalize-space() function.}

Or some such wording. Essentially all we're really saying is that a
processor must act as if it has done this:

  normalize-space( string-value of [current element] )

Regards,

Mark

-- 
  Mark Birbeck, formsPlayer

  mark.birbeck@formsPlayer.com | +44 (0) 20 7689 9232
  http://www.formsPlayer.com | http://internet-apps.blogspot.com

  standards. innovation.

Received on Wednesday, 31 October 2007 11:14:59 UTC