- From: Marcos Caceres <marcosc@opera.com>
- Date: Thu, 26 Mar 2009 12:18:30 +0100
- To: Thomas Roessler <tlr@w3.org>
- Cc: WebApps WG <public-webapps@w3.org>
Hi Thomas,
On Fri, Mar 20, 2009 at 1:31 PM, Thomas Roessler <tlr@w3.org> wrote:
> On 20 Mar 2009, at 10:46, Marcos Caceres wrote:
>
>> To compliment the new i18n model, I've added the following
>> restrictions on XML base:
>> [[
>> xml:base attribute
>> The xml:base attribute may be used in a configuration document to
>> specify a base URI other than the base URI of the document. For the
>> purpose of this specification, the value of xml:base attribute is
>> restricted to an absolute path to a folder that must exist inside the
>> widget package.
>
> That would be a relative URI reference.
>
Ok, things get a bit screwy here (for me) because we are again
entering URI land. Basically, I need your help here translating the
following from "Marcos speak" into URI speak. URIs are complex and I
don't know the intricacies of both the URI and IRIs specs so I need
help.
Assuming I know _nothing_ of URIs, which is mostly true it seems, I'm
going to break the problem down as much as I can, in the hope that we
can specify something useful. (I've added a "Converting a zip relative
path to a URI reference" section to the widget spec, where I hope this
discussion will end up)
Basically in the spec we have this:
1. In a zip file, the identity of a file is stored as a text string of
variable length that always looks like: "path/to/file.ext" or
"file.ext" or "folder/". The only parts that are reserved by the zip
specification are:
a. the "/" character as the delimiter
b. the character encoding of string.
There is no notion of a "file name" per se. Implementations just
translate anything before a "/" into a folder on a file system and
anything after the last "/" into a file. If there is nothing after the
slash, it is an empty folder. The data is (obviously) written into the
file whose name is represented by the string after the last "/".
2. In a zip file, the text string that identifies the file is a
sequence of octets encoded as either CP437 or UTF-8 (which encoding
was used is dependent on whether a bit flag is set to 1 or 0 - where 1
means UTF-8 - cp437 is the default but most OSs ignore this and just
do a 1 to 1 encoding from native OS encoding to Zip file name (thus,
not using cp437 and hence making invalid zip files - e.g., the file
names of a zip file made on a mac are in fully decomposed Unicode,
which then breaks on Windows)... but that is irrelevant here because
in the spec we just say if it's not UTF-8, then it's cp437 -
implementers, fix your darn Zip implementations!)
3. In the spec, we call the above a "zip relative path". It is not
strictly a path in the URI sense because the encoding can be in UTF-8
without being "URL encoded" or whatever, right?
4. a zip relative path cannot contain any of the following characters:
<,>,:,",/,\,|,?,*,^,`,{,}. I introduced that restriction, not Zip. The
reason why these characters are banned is because they are not allowed
on some OSs.
I assume that a zip relative path must be converted (URL-encoded or
whatever it is called) to be used as an URI or IRI (or LEIRI) to be
useful in any XML context that accepts URIs (e.g., a content element's
src attribute)?
>The XML Base specification defines
> xml:base as a LEIRI-valued attribute.
Argh! The LEIRIs note (or xml:base) is confusing, why does LEIRIs say
"New protocols and formats should not use Legacy Extended IRIs." Yet
xml:base Second ed. still uses LEIRIs?! And why did the W3C allow
something to go to rec that has a dependency on a non-normative note
that discourages it's own use? (Ok, that's just me venting, I don't
expect a real answer but just seems really stupid).
So, (again pretending I have never heard of URIs) what do I need to
specify in the specification to convert a zip relative path into an
one of these things called "URI reference" or "IRI reference" (or a
LEIRIs) that would be useful in the context of xml:base?
I also don't understand: does xml:base work with IRIs at all? Are IRIs
are stricter subset of LEIRIs? How can a zip relative path become a
LEIRI? does it have to? or, given our understading of what a zip
relative path is, can a zip relative path just be an IRI?
> So, the relative URI reference would
> be evaluated with respect to whatever base URI the configuration document
> has anyway.
This is _not_ what we want. In Marcos speak (not URI speak!) there are
two kinds of path:
1. A relative path: a path that looks like this: hello/there.file or
./hello/there.file or ../hello/there.file.
that is, a path that has no slash at the front, or has a dot or two
at the front. What is this called in URI speak?
2. An absolute path: a path that looks like this /hello/there.file.
That is, a path that has a slash on the front. The slash at the front
denotes the root of a package. What is that called in URI speak?
Ok, again pretending I know nothing of these "URI" things... so here
is what I want:
1. the (Marcos) relative paths in a configuration document to be
resolved to a (Marcos) absolute path based on the locale folder (a zip
relative path) that matches the user agent's locale. So, for instance,
the user agent's locale is "en":
widget.wgt
config.xml
locales/en/myfile.gif
I want config.xml to be parsed (and all Marcos relative paths to be
converted to start with /locales/en/ - i.e., they become Marcos
absolute paths based on xml:base).
So:
<widget>
<icon src="myfile.gif">
</widget>
Becomes the following DOM:
widget
|_ xml:base = "/locales/en/" (set dynamically)
|_icon
|_src = "/locales/en/myfile.gif"
So, the zip relative path (cp437 encoded octets) "locales/en" must be
converted to a URI (ascii) or IRI.
What happens if zip relative path has characters outside the ASCII
range? What does that IRI thing look like? Does it contain #%%, where
%% are UTF-8 code points?
>> If the said folder does not exist inside the widget
>> package, then the user agent must ignore this attribute, meaning that
>> the user agent must continue to either use the configuration
>> document's location within the package as the value of xml:base;
Ok, the above is wrong.
>> or
>> continue to use the value of any correctly declared xml:base attribute
>> in the ancestor chain.
Ok, I guess the equiv of a 404 is fine here. Will remove.
>> When the xml:base attribute is absent, the base
>> URI will be the folder in which the configuration document resides.
Ok, this is now wrong too, because of the new i18n model. I can't
fully fix this text until I finish working out the i18n model
behavior. This is the subject for a different email but I will try to
work on this.
>> The value of xml:base attribute must be declared as a URL encoded zip
>> relative path (the term URL encoded is defined in the [URI]
>> specification).
>> ]]
>
> -1
>
> You're redefining the meaning of xml:base completely; the likely effect is
> that XML processors will get all confused. With these restrictions, I'd
> suggest to use a separate widget:base attribute -- or just a configuration
> element that says where things will sit.
No, I want to use xml:base here. I don't want to respecify what
xml:base in our spec.
>> The use case here is that an author might want to override
>> element-based localization so URIs are dereferenced to a folder of
>> their choice. This might be the case if the author has the following
>> folder/file structure:
>
> I think the use case is reasonable, but the use of xml:base to solve it
> isn't.
Why?
--
Marcos Caceres
http://datadriven.com.au
Received on Thursday, 26 March 2009 11:19:33 UTC