- From: Dave Beckett <dave.beckett@bristol.ac.uk>
- Date: Tue, 18 Sep 2001 10:38:59 +0100
- To: Bjoern Hoehrmann <derhoermi@gmx.net>, www-rdf-comments@w3.org
- CC: barstow@w3.org
You are mostly addressing section 3: N-Triples: http://www.w3.org/TR/2001/WD-rdf-testcases-20010912/#ntriples which I edited, so will respond >>>Bjoern Hoehrmann said: > Hi, > > I wonder why the working drafts doesn't reference RFC 2396 for the > absoluteURI syntax ... A missing citation I guess. The section defines a syntax creating a graph whose meaning is defined in a (still being drafted) RDF model theory document. So where tokens like 'subject', 'predicate', 'object', 'uriref' etc. appear, their syntax is defined but their meaning is left out. > ...and instead uses a very loose syntax definition with > incompatible character escape sequences. The [CHARMOD] requires > specifications to specify that URIs are escaped like > > http://www.hoehrmann.invalid/~bj%C3%B6rn/ > > but the RDF Test Cases WD implies, one should use > > http://www.hoehrmann.invalid/~bj\uF6rn/ > > or > > http://www.hoehrmann.invalid/~bj\u00F6rn/ We started with escaping rules taken from Python (which you mention later) i.e \-escapes for Strings. CHARMOD says, for Character Escaping (not URIS) * Specifications MUST NOT invent a new escaping mechanism if an appropriate one already exists. -- http://www.w3.org/TR/charmod/#sec-Escaping so the \-escaping for strings seemed appropriate. The choice for URI escaping was either to recommend a second way to escape characters (such as %xx) or to use the same method. For simplicity, the same method was used but the familiarity of %xx might be a better choice, although it would require a little more code. Looking; CHARMOD says, for URIs: A W3C specification that defines new syntax for URIs, such as a new kind of fragment identifier, MUST specify that characters outside the US-ASCII repertoire are encoded in URIs using UTF-8 and %HH-escaping -- http://www.w3.org/TR/charmod/#sec-URIs so we have to (MUST) change our URI escaping to match that requirement. Thanks for catching it. > The specification should clearly state that four characters must follow > the \u and eight characters the \U. ... I thought that was what we wrote: \uxxxx Hexadecimal digits xxxx encoding character ... \Uxxxxxxxx Hexadecimal digits xxxxxxxx encoding character ... > ... I don't see any good reason why \U > is defined for > > [[#x10000-#xFFFFFFFF] > > (note the unmatched bracket) instead of ...-#x10FFFF, Unicode doesn't > define anything above. ... True - the present latest version of Unicode doesn't, but we were neutral on that by allowing the full 32 bit range. I took the recommendation from http://www.w3.org/TR/charmod/#unicode and cited 3.0 Charmod says: * The specification MUST NOT arbitrarily restrict the range of characters that can be used, which must cover all Unicode code points from 0 to 0x10FFFF inclusive. -- http://www.w3.org/TR/charmod/#sec-RefProcModel which is a range that is allowed. It doesn't say we should exclude code points beyond that range. We can change it. > The \U should IMO only require six hex digits > instead of eight, otherwise authors have always to specify two > superflous zero digits. I would recommend a more perlish approach for \u > and \U in general, i.e. use \u{ <one to six hex digits> } in place of > them. the \-escapes come from deployed Python code reading this format. Python happens to use fixed lengths for the escapes http://www.python.org/doc/current/ref/strings.html and allows encoding Unicode chars with full 32 bits so we kept that. It is easier to have a fixed size field, since this is meant to be a simple format; which is why we require absolute URIs, line-by-line handling and other simple structures. Furthermore, it should be useful to retain the chance to expand this field later to encode the 32 bits if Unicode grew to require that (and there is plenty of growth there). > I _really_ wonder why #20, #3C and #3E should be additionally allowed > for absoluteURIs. They have to be URI-escaped, the WD implies I should > use > > http://www.example.org/test\u0020case/ > > instead of > > http://www.example.org/test%20case/ > > That's IMO pure nonsense. It is not nonsense - it has a meaning - but as I say above, given the requirements of CHARMOD in this area, I expect we will change to the second example. Those particular characters are escaped since they were used in the syntax: uriref ::= '<' absoluteURI '>' -- http://www.w3.org/TR/2001/WD-rdf-testcases-20010912/#uriref > The reference to Python string literals should be removed, I don't care > about Python string literals and they are of no relevance here. They were explanatory and probably should be removed, but I had to use the same references above to explain to you the reason for choosing this string escaping method and some other choices. > I don't see no need for the trailing '.' character required for each > n-triple line. This is compatibility with the existing N3 format http://www.w3.org/DesignIssues/Notation3 which remains useful to retain. It is possible we might want to take on more syntax from that other format and hence be able to use its tools. We are unlikely to change this. This N-Triples format is meant to be simple, complete format for encoding RDF graphs, compatible with existing tools and is proving very useful in our work. Thanks for your feedback. Dave
Received on Tuesday, 18 September 2001 05:39:05 UTC