RE: Charmod review from Jeremy Carroll on 2002-05-17 (w3c-rdfcore-wg@w3.org from May 2002)

From: Jeremy Carroll <jjc@HPLB.HPL.HP.COM>
Date: Fri, 17 May 2002 14:11:34 +0100
To: "Dave Beckett" <dave.beckett@bristol.ac.uk>
Cc: "w3c-rdfcore-wg" <w3c-rdfcore-wg@w3.org>
Message-ID: <JAEBJCLMIFLKLOJGMELDIEPOCDAA.jjc@hplb.hpl.hp.com>
>
> > I attach a summary of the things we would need to consider to conform with
> > charmod.
> > I do not propose we address these in our specs, but postpone this issue to
> > RDF 2.
>
> I wouldn't use the term RDF 2; but say, not in this working group
> under the present charter.

That wasn't intended as part of a formal comment, but point taken.



>
> > Here is my proposed comment on charmod.
> > Given that they use some web based system for recording issues the actual
> > form of any issues we raise may differ, but I suggest we discuss this text.
> >
> > Upto and including the first paragraph of the body of the message looks
> > like an e-mail, with the last two paragraphs each adding one issue to their
> > issue list.
> >
> > =========
> >
> > The RDF Core WG has feedback concerning the following sections
> > of charmod:
> >
> > 1. Introduction
> > 2. Conformance
> > 3.4 Strings
> > 3.5 Reference Processing Model
> > 4. Early Uniform Normalization
> > 6. String Identity Matching
> > 8. Characeter Encoding in URI References
> > 9. Referencing the Unicode Standard
> > A.2 Other References
> > C. Composing Characters
> > D. Resources for Normalization
> >
> >
> > {{ the other sections are not relevant to RDF }}
> >
> >
> > Dave, please review section 9.
> >
> > http://www.w3.org/TR/charmod#sec-RefUnicode
>
> The syntax WD doesn't refer to Unicode.  The test cases WD ntriples
> section needs updating to use the new wording.

Thanks

>
> > For the sections 1,2, 3.4, 4, 6, 9, C. D.
> > RDF Core fully endorses the last call working draft.
>
> That's rather sweeping.  What does it say about the other sections of
> 3 and the other sections that you don't mention?

Maybe delete 'fully'. The point here is that the work we have done on character literals has been
helpfully informed by these sections (and their precursors). Significant substantive change to these
sections would be highly problematic, hence, IMO, we should support them.
Moreover, IMO, they have helped identify and solve a range of issues to do with normalization that
is helping us produce a better RDF recommendation.

It says nothing about the other bits of charmod. (Section 8, which is the only other relevant
section is dealt with later). e.g. I don't think RDF Core should have an opinion on "3.6 Choice and
Identification of Character Encodings" since XML does that for us.


>
> In detail, most things are related to
> ntriples http://www.w3.org/TR/rdf-testcases/#ntriples

I had been imagining that n-triples is not and will not be conformant with charmod.
Maybe we should identify a lack of an escape clause. I certainly do not believe that charmod
conformance for n-triple is an appropriate goal at this stage. I agree with your analysis that
sections I have not considered would be relevant.


> or syntax http://www.w3.org/TR/rdf-syntax-grammar/
>
> ntriples
>   charmod 1.2 -  Have to change to use U+hhhh format
>
>   charmod 3.4 - the charmod Character String term is already used;
>     should we remove this, all references to charmod?
>
>   charmod 3.5 - the right range of Unicode characters is used
>
>   charmod 3.6.1 - ntriples mandates US-ASCII which is forbidden by
>     charmod, breaks the MUST in 2nd para.
>
>     The obvious fix would be to mandate utf-8, which would require an
>     extra processing stage for ntriples before doing the processing
>     of the resulting unicode characters.  The \u and \U escapes might
>     remain?
>
>   charmod 3.7 - mostly conforms here.  But the "escaped chars SHOULD
>     be acceptable wherever unescaped chars are" isn't yet met since
>     they aren't allowed in names such as blank node ids or comments.
>     The last sentence notes this is SHOULD so isn't actually
>     REQUIRED.
>
>   charmod 4.2.1 - I have no idea which normalizing phrase we use here
>     - not include-normalized, given ntriples has no inclusion.

The include phase exapnds the character escapes.
I think the goal is that it is fully normalized.

>
>   charmod 4.3.2 - should U+0338 also be excluded from characters of an
>     extended ntriples indentifer class (if it went to a UTF-8 encoding)?
>
>
>
> syntax
>   charmod 2 conformance - need we add a conformance statement?  even
>      if not conforming to charmod?

IMO we should not attempt charmod conformance in this round.
We should omit a conformance statement.

>   charmod 2:
>     [I] Where this specification contains a procedural
> 	description, it MUST be understood as a way to specify the
> 	desired external behavior. Implementations MAY use other ways
> 	of achieving the same results, as long as observable behavior
> 	is not affected.
>    may need emphasising although we already have:
>        [[This document illustrates one way to create the N-Triples
>          from the XML - any other method that results in the same
>          N-Triples (RDF graph) may be used.]]
>        -- http://www.w3.org/TR/rdf-syntax-grammar/#section-Introduction
>
>   charmod 3.4 - the Character String term may need to be used in the
>     future graph model section
>
>   charmod 3.5 - reference processing model
>     we conform to this since we are based on XML
>
>   charmod 4.2.1 - I have no idea which normalizing phrase we use here

fully normalized
>
>   charmod 4.4 -
>     "[S] Specifications MUST document any security issues related to
>     normalization."
>     -- is this stuff we take from the internet draft or do we need to
>     demonstrate / point to example of encoding URIs that look the
>     same but are different?

I think we could meet this by simply pointing at appropriate pairs of test cases.

>
>   charmod 8 - char encoding in uri referenceso
>      several things here to conform to
>      do we really need to cite the IRI internet draft?
>      I agree with Jeremy that this is too early.
>
>   charmod 8 - fragment identifier (last para)
>     since we define the fragment indentifer for RDF, this says we
>     need to address the chars outside US-ASCII and how they are
>     encoded in it.
>
>     I agree with Jeremy - our specification is in accordance with
>     (sufficient for?) handling IRIs but does not explicitly support
>     them
>
>   charmod 9 -
>     xml 1.1 proposes changes to cites unicode terms for legal name
>     chars.  I expect we aren't going to depend on 1.1 (also a WD) so
>     does this need spelling out for, say RDF IDs?

We shouldn't have a dependency on xml 1.1 but we should work just as well with it.
I don't understand the unicode versioning issues enough to know how to say that. Can't we just have
a non-specific reference to Unicode and note that not all unicode versions can be serialized in XML
1.0.


>
>
> General comments
>
> charmod 4.3 is just baffling in the levels of inclusion, entities,
>   escaping and normalizing.

Some of that is XML and not charmod's fault.
Another aspect is that earlier simpler versions didn't work. I had made comments about earlier
drafts where the normalizations rules did not work for RDF.

>
> charmod 4.3.1 sentences are very complex, some mentioning 3 types of
>   normalizing and two parenthetical clauses in one sentence.

My preference would be you make personal editorial comments.
It seems heavy handed for a WG to make editorial suggestions, also I think we should focus our
review efforts on substance.

>
> charmod 4.3.2 does not explain what '-' means?  No or 'N' to compare
>   with 'Y'?

Ditto.
>
> charmod 4.4 - I'd hate to have to explain this.  Some of these
>   statements seem web architectural and will need a lot of buy-in
>

We might differ here - probably worth airing at the telecon.
I gave an editorial comment that normalization-sensitive definition was hard to understand.


> charmod 8 - charmod MUST NOT depend on the IRI I-D, and even if it
>   was an RFC, needs lots more buy in before the MUST could be used.
>   Charmod #2 maybe.

Was my suggested text strong enough - it seems not.
Would:

 RDF Core opposes any requirement on a specification which
 refers to the IRI draft.

be better?


>
> > For the section 3.5 we note that the language is somewhat offputting for us
> > as specification developers given that our specification explicitly does
> > not have a processing model. We have no particular suggestions about this,
> > nor would we object if the I18N WG chose not to address this issue.
>
> So this section should be labelled 'IF the spec has a processing model...'
>
>
> > Our main concern is with section 8 (and the IRI reference in A.2),
> > partciularly the requirement that specs "SHOULD use Internationalized
> > Resource Identifiers (IRI) [I-D IRI]".
> > The IRI draft is only a draft, the reference to it is not normative, and
> > the strength of this SHOULD dependency appears excessive ("not optional").
>
> +1
>
> > In particular, the IRI draft does not
> > adequately address IRI equality (not merely functional equivalence in
> > retrieval).
>
> I haven't review this but, it it doesn't at this draft stage we
> definitely shouldn't use it, since equality is all we require.
>
> > Moreover, the bidi section presents a learning curve which developers are
> > unlikely to want to climb before IRI has consensus around it; We have found
> > the text in Xlink section 5.4 and XML Erratum 26 adequately clear for some
> > of the IRI questions, particularly those ...
>
> Not sure what bidi impact you mean?  Is that the "logical order" requirement?

Bidi is bidirectional text.
The normal example is numbers in arabic.
I think that the arabic for

the number 1.234

is

1.234 REBMUN EHT

which is logically stored as

{ T, H, E,  , N, U, M, B, E, R, 1, ., 2, 3, 4, }

Frankly I don't think RDF will be known to be bidi compatible until we have bidi users involved with
rdf-ig.


>
> > ...
> > that are most pressing for RDF and believe that charmod should merely:
> > - reiterate such text;
> > - reiterate the early uniform normalization model for the iris when
> > regarded as unicode strings
>
> Those seem sensible
>
> > -  note that section 8 will be superseded by IRI-draft when it becomes a
> > recommendation.
>
> I do not recommend putting such future-guessing statements in docs.
> If IRI becomes an RFC, charmod should make a new version (WD --> REC)
>
> > I am independently raising some very minor editorial issues.
> >
> > Jeremy
>
> Jeremy - can you take my comments, merge and edit them to give some
> consistent summary?
>
> Dave
>

I imagine I'll take that as an action in an hour's time.

Jeremy
Received on Friday, 17 May 2002 09:11:46 UTC