- From: Marcos Caceres <marcosc@opera.com>
- Date: Thu, 26 Mar 2009 12:18:30 +0100
- To: Thomas Roessler <tlr@w3.org>
- Cc: WebApps WG <public-webapps@w3.org>
Hi Thomas, On Fri, Mar 20, 2009 at 1:31 PM, Thomas Roessler <tlr@w3.org> wrote: > On 20 Mar 2009, at 10:46, Marcos Caceres wrote: > >> To compliment the new i18n model, I've added the following >> restrictions on XML base: >> [[ >> xml:base attribute >> The xml:base attribute may be used in a configuration document to >> specify a base URI other than the base URI of the document. For the >> purpose of this specification, the value of xml:base attribute is >> restricted to an absolute path to a folder that must exist inside the >> widget package. > > That would be a relative URI reference. > Ok, things get a bit screwy here (for me) because we are again entering URI land. Basically, I need your help here translating the following from "Marcos speak" into URI speak. URIs are complex and I don't know the intricacies of both the URI and IRIs specs so I need help. Assuming I know _nothing_ of URIs, which is mostly true it seems, I'm going to break the problem down as much as I can, in the hope that we can specify something useful. (I've added a "Converting a zip relative path to a URI reference" section to the widget spec, where I hope this discussion will end up) Basically in the spec we have this: 1. In a zip file, the identity of a file is stored as a text string of variable length that always looks like: "path/to/file.ext" or "file.ext" or "folder/". The only parts that are reserved by the zip specification are: a. the "/" character as the delimiter b. the character encoding of string. There is no notion of a "file name" per se. Implementations just translate anything before a "/" into a folder on a file system and anything after the last "/" into a file. If there is nothing after the slash, it is an empty folder. The data is (obviously) written into the file whose name is represented by the string after the last "/". 2. In a zip file, the text string that identifies the file is a sequence of octets encoded as either CP437 or UTF-8 (which encoding was used is dependent on whether a bit flag is set to 1 or 0 - where 1 means UTF-8 - cp437 is the default but most OSs ignore this and just do a 1 to 1 encoding from native OS encoding to Zip file name (thus, not using cp437 and hence making invalid zip files - e.g., the file names of a zip file made on a mac are in fully decomposed Unicode, which then breaks on Windows)... but that is irrelevant here because in the spec we just say if it's not UTF-8, then it's cp437 - implementers, fix your darn Zip implementations!) 3. In the spec, we call the above a "zip relative path". It is not strictly a path in the URI sense because the encoding can be in UTF-8 without being "URL encoded" or whatever, right? 4. a zip relative path cannot contain any of the following characters: <,>,:,",/,\,|,?,*,^,`,{,}. I introduced that restriction, not Zip. The reason why these characters are banned is because they are not allowed on some OSs. I assume that a zip relative path must be converted (URL-encoded or whatever it is called) to be used as an URI or IRI (or LEIRI) to be useful in any XML context that accepts URIs (e.g., a content element's src attribute)? >The XML Base specification defines > xml:base as a LEIRI-valued attribute. Argh! The LEIRIs note (or xml:base) is confusing, why does LEIRIs say "New protocols and formats should not use Legacy Extended IRIs." Yet xml:base Second ed. still uses LEIRIs?! And why did the W3C allow something to go to rec that has a dependency on a non-normative note that discourages it's own use? (Ok, that's just me venting, I don't expect a real answer but just seems really stupid). So, (again pretending I have never heard of URIs) what do I need to specify in the specification to convert a zip relative path into an one of these things called "URI reference" or "IRI reference" (or a LEIRIs) that would be useful in the context of xml:base? I also don't understand: does xml:base work with IRIs at all? Are IRIs are stricter subset of LEIRIs? How can a zip relative path become a LEIRI? does it have to? or, given our understading of what a zip relative path is, can a zip relative path just be an IRI? > So, the relative URI reference would > be evaluated with respect to whatever base URI the configuration document > has anyway. This is _not_ what we want. In Marcos speak (not URI speak!) there are two kinds of path: 1. A relative path: a path that looks like this: hello/there.file or ./hello/there.file or ../hello/there.file. that is, a path that has no slash at the front, or has a dot or two at the front. What is this called in URI speak? 2. An absolute path: a path that looks like this /hello/there.file. That is, a path that has a slash on the front. The slash at the front denotes the root of a package. What is that called in URI speak? Ok, again pretending I know nothing of these "URI" things... so here is what I want: 1. the (Marcos) relative paths in a configuration document to be resolved to a (Marcos) absolute path based on the locale folder (a zip relative path) that matches the user agent's locale. So, for instance, the user agent's locale is "en": widget.wgt config.xml locales/en/myfile.gif I want config.xml to be parsed (and all Marcos relative paths to be converted to start with /locales/en/ - i.e., they become Marcos absolute paths based on xml:base). So: <widget> <icon src="myfile.gif"> </widget> Becomes the following DOM: widget |_ xml:base = "/locales/en/" (set dynamically) |_icon |_src = "/locales/en/myfile.gif" So, the zip relative path (cp437 encoded octets) "locales/en" must be converted to a URI (ascii) or IRI. What happens if zip relative path has characters outside the ASCII range? What does that IRI thing look like? Does it contain #%%, where %% are UTF-8 code points? >> If the said folder does not exist inside the widget >> package, then the user agent must ignore this attribute, meaning that >> the user agent must continue to either use the configuration >> document's location within the package as the value of xml:base; Ok, the above is wrong. >> or >> continue to use the value of any correctly declared xml:base attribute >> in the ancestor chain. Ok, I guess the equiv of a 404 is fine here. Will remove. >> When the xml:base attribute is absent, the base >> URI will be the folder in which the configuration document resides. Ok, this is now wrong too, because of the new i18n model. I can't fully fix this text until I finish working out the i18n model behavior. This is the subject for a different email but I will try to work on this. >> The value of xml:base attribute must be declared as a URL encoded zip >> relative path (the term URL encoded is defined in the [URI] >> specification). >> ]] > > -1 > > You're redefining the meaning of xml:base completely; the likely effect is > that XML processors will get all confused. With these restrictions, I'd > suggest to use a separate widget:base attribute -- or just a configuration > element that says where things will sit. No, I want to use xml:base here. I don't want to respecify what xml:base in our spec. >> The use case here is that an author might want to override >> element-based localization so URIs are dereferenced to a folder of >> their choice. This might be the case if the author has the following >> folder/file structure: > > I think the use case is reasonable, but the use of xml:base to solve it > isn't. Why? -- Marcos Caceres http://datadriven.com.au
Received on Thursday, 26 March 2009 11:19:33 UTC