- From: Patrik Fältström <paf@swip.net>
- Date: Wed, 07 Jan 1998 16:12:14 +0100
- To: Larry Masinter <masinter@parc.xerox.com>
- Cc: Harald Tveit Alvestrand <Harald.Alvestrand@maxware.no>, Dan Connolly <connolly@w3.org>, Leslie Daigle <leslie@bunyip.com>, "Roy T. Fielding" <fielding@kiwi.ics.uci.edu>, jcurran@bbn.com, harald.t.alvestrand@uninett.no, moore@cs.utk.edu, uri@bunyip.com, urn-ietf@bunyip.com
At 00:03 1998-01-07 PST, Larry Masinter wrote: >I should point out that the syntax (and any scheme-specific semantics) >are assigned to the character sequence, not to any octet sequence. >In fact, the mapping of character sequences to octet sequences is >part of the semantics that a scheme specifies. That's the reason >why some schemes might employ different encoding mechanisms than >%XX. I don't agree with this, but it might be because the overloaded use of the word "character". The way I interpret what you are saying is that a URI parser (yes, a URI parser) should operate on the _characters_ in the URI string and not the octets? That means, that I should be able to use percent encoding of the fragment identifier, and still have the fragment delimiter, which in turn means that the encoding does not have any meaning at all. I.e. what I am talking about, and I think we agree on, is that we have to define "characters", and we also have to agree on what octets are valid on various levels in the chain of parsing URIs. I see that we have four layers: Client [BIG5] Maps between nativ charset to some known which is specified in the schema definition. [UNICODE] URI string [UNICODE] This is mapped into whatever the translitteration string is defined to be according to the _URI_SYNTAX_ document. [UTF-8 encoded UNICODE] Translitterated string [UTF-8 encoded UNICODE] Here we can do some %-encoding if needed. [String in "US-ASCII"] URI sequence of bytes The processes above are described in various documents, and I want everything from the translitterated string and downwards to be described in a URI syntax document, while what is above the translitterated string should go in a URL/URN syntax document and various schema definition documents. When _I_ talk about characters, I talk about characters in the URI string, while the URI syntax document when talking about the fragment delimiter '#' as being forbidden in a URI, talks about the "Translitterated string". I.e. semantics for schemes are on the URI string, while syntax and semantics for URIs are on the tranlitterated string. Patrik Email: paf@swip.net URL: http://www.tele2.se PGP: 4D38 91A4 27D9 C8B2 6975 D6BB 21D0 4C57 BD23 6602
Received on Wednesday, 7 January 1998 10:17:47 UTC