Re: Omissions, Errors, and Misleading Prose in the N-Triples Specification

On Fri, 5 Sep 2003 05:23:02 +0100
"Sean B. Palmer" <sean@mysterylights.com> wrote:

> 
> All section references are to rdf-testcases [1].

Which unfortunately changed the same day you wrote this email
to a new version.

> 1) Section 3.2: "[t]he characters outside the US-ASCII range are made
> available by \-escape sequences as follows". However, some of the
> characters in the table are *inside* the US-ASCII range; i.e. #x5C,
> #x22, #x0A, #x0D, and #x09.
> 
> 2) It is not clear that #x0A, #x0D, and #x09 need to be encoded,
> except that they are not allowed in the character production of
> section 3.1.
> 
> 3) #x5C and #x22 (backslash and quote marks) are not disallowed from
> strings by the grammar, and there is no clear prose that disallows
> them either. Therefore, it is not stated that they are to be encoded
> within literals. This means that "\x" is a valid N-Triples literal,
> and "\" and """ are very ambiguous, and possibly valid.
> 
> 4) "#X5C" in the table in section 3.2 should be "#x5C".

I think this are all addressed in the 5th September version.

> 5) Since, as stated in section 3.1, the employed "EBNF cannot perform
> the counting required by the Primary-subtag and Subtag productions",
> perhaps it would be useful to either a) switch to an EBNF that *can*
> perform the counting, or b) note the counting in prose, and state
> whether conformant N-Triples parsers are required to perform such
> counting.

No, the EBNF is fine.  There is no requirement to validate
language tags, in the same way you don't do it for XML or XHTML.

(Plus the definition of what tags are valid is a mixture of a lot of
specifications with lists of terms that get updated now and then
and will never be complete)

> 6) Conformance levels are not clearly specified. Does a conformant
> N-Triples parser have to fully check URI syntax, for example?
> Primary-subtag and Subtag counting?

There is only a pass/fail conformance implicit in any such specification
and that is all that such a statement would say.

On validating URIs syntax or languages (as explained above) - No.

> 7) It is not clear that the absoluteURI production in N-Triples
> exactly matches (or imports) the absoluteURI production from RFC 2396,
> though the RFC is cited.

This has changed in the newer version.

> 8) Section 3.3: "[c]haracters above the US-ASCII range are made
> available by the \u or \U escapes". I am aware that this has been
> raised before, but this section should be removed, and UTF-8 + %HH
> encoding or non-US-ASCII characters used for synchronicity with the
> IRI mechanism (being employed in, e.g., XPointer, XInclude, and XML
> Base).

We considered this and kept  the simpler forms \u and \U with fixed,
required fields rather than require UTF-8 encoding support, or pick some
other UTF encoding.  This has been found suitable and quick to implement
and is not likely to change for this test case format.

> 9) Please indicate whether or not a charset parameter may or must not
> be used in conjunction with the text/plain MIME type, since according
> to section 3.1 the only allowed encoding is us-ascii.

Isn't that the default?  N-Triples is only using a subset of the 7-bit
ASCII (32..126) so it probably doesn't matter.

> Note that many of the comments above are based on implementor
> experience, in building a Python RDF API that includes N-Triples
> tools.
> 
> Thanks,
> 
> [1] http://www.w3.org/TR/rdf-testcases/
> - W3C Working Draft 23 January 2003

Thanks

Dave

Received on Tuesday, 9 September 2003 11:10:47 UTC