RE: Comments on Concepts editors' draft 2003-06-19

At 15:23 20/06/03 +0200, Jeremy Carroll wrote:
> > > Section 5, para 5, editorial suggestion.
> > >
> > > "A datatype is identified by one or more URI references."
> > >
> > > While strictly correct, I'm not sure that "one or more" adds any
> > > important information, reads awkwardly and possibly confusingly.  I
> > > would be inclined to delete it.
> > >
> > > ...
>
>discuss at telecon ...

I don't feel strongly.  As far as I'm concerned, just make a decision.

> > > Section 5.1, "The lexical space", final bullet, maybe serious.
> > >
> > > The phrase "start and end element" seems wrong to me.  I think this
> > > should be just "element" or "start and end element tags".
> > >
> > > ...
> > >
>
>This is in response to an editorial comment from XML Schema WG; I will
>double check (their comment was about the term "root element tag")

Hmmm... My point was one of terminology rather than substance of the change.


> > > Section 5.1, lexical-to-value mapping, muddled?
> > >
> > > I don't understand this, and it seems over-complicated.
> > >
> > > My understanding is that we decided that XML literals are self denoting
> > > like plain literals, hence the lexical mapping maps the lexical
> > value to
> > > itself.  The surrounding description of the lexical space and the value
> > > space, which are both canonical XML, supports this view.
> > >
> > > ...
>
>discuss at telecon
>
>"maps a string to the corresponding exclusive Canonical XML "
>
>click though on exclusive canonical XML to
>"XML that is in exclusive canonical form"
>look up to see that
>
>"The exclusive canonical form of a document subset is a physical
>representation of the XPath node-set, as an octet sequence, produced by the
>method described in this specification"
>
>i.e. the value space of XMLLiteral is a set of octet sequences whereas the
>lexical space is a set of strings (character) sequences.
>
>I pass on whether { 173 } is both a character sequence and an octet
>sequence.
>ditto for {} (which is exclusive canonical XML if it is an octet sequence)

Er, I hadn't appreciated that distinction.  Now I see why the wording seems 
so convoluted.  In case I'm not uniquely ignorant in these matters, it 
might be worth adding a note pointing out that C14N defines a mapping from 
a string (using XML syntax) to an octet sequence encoding of that string.

(Though I now like XML literals even less.  I fear they could be a serious 
drag on getting widespread full implementation of RDF/XML.  I'm not sure if 
that I'll ever bother.)

> > > Section 6.4, 3rd para (just after numbered list), editorial.
> > >
> > > I think this description of disallowed octets could be confusing.
> > >
> > > For example, the current work-in-progress draft of RFC2396bis [1] uses
> > > '['...']' in IPv6 literals in the 'authority' component of a URI, and
> > > requires them to be escaped everywhere else.
> > >
> > > I think the problem stems from the dual purpose of escaping in URIs (to
> > > protect special characters from interpretation, and to include
> > otherwise
> > > disallowed characters in a URI.  I'm not sure how to fix this, other
> > > than to suggest maybe "less is more";  e.g.
> > >
> > > "The disallowed octets that must be %-escaped include all those that do
> > > not correspond to US-ASCII characters, and any others that would
> > > otherwise be interpreted inappropriately according to the URI
> > > specification and URI scheme specification concerned."
> > >
> > > [1] http://www.apache.org/~fielding/uri/rev-2002/rfc2396bis.html
> > >
> > > ...
> > >
>
>
>Hmmm ... I think that's trying to jump ahead of rfc2396bis - and basically
>given the state of the TAG issue IRI Everywhere, we should really only be
>treading water.

I'm not trying to suggest we be tracking RFC2396bis ... I just used that as 
an example that sprang to mind about the difficulty of deciding what should 
be escaped.  Another example:

the mid: scheme has a syntax like
   mid: message-id / content-id
the '/' character can appear within the message-id or content-id, and 
according to the mid: spec it must be escaped to prevent confusion with the 
'/' that separated the two parts.

Hence my comment about the dual role of escaping in URIs.

Thinking some more, how about this:

"The disallowed octets that must be %-escaped include all those that do not 
correspond to US-ASCII characters."

My rationale is that any other disallowed character should already be 
escaped in the Unicode form of the URI.

I'm almost tempted to say nevermind... if we're just copying the 
recommended XML 1.1/I18N text, we can blame any inconsistencies on them ;-)

#g


-------------------
Graham Klyne
<GK@NineByNine.org>
PGP: 0FAA 69FF C083 000B A2E9  A131 01B9 1C7A DBCA CB5E

Received on Friday, 20 June 2003 13:50:29 UTC