Re: [widgets] restrictions on XML base

Hi Thomas,
On Fri, Mar 20, 2009 at 1:31 PM, Thomas Roessler <tlr@w3.org> wrote:
> On 20 Mar 2009, at 10:46, Marcos Caceres wrote:
>
>> To compliment the new i18n model, I've added the following
>> restrictions on XML base:
>> [[
>> xml:base attribute
>> The xml:base attribute may be used in a configuration document to
>> specify a base URI other than the base URI of the document. For the
>> purpose of this specification, the value of xml:base attribute is
>> restricted to an absolute path to a folder that must exist inside the
>> widget package.
>
> That would be a relative URI reference.
>

Ok, things get a bit screwy here (for me) because we are again
entering URI land. Basically, I need your help here translating the
following from "Marcos speak" into URI speak. URIs are complex and I
don't know the intricacies of both the URI and IRIs specs so I need
help.

Assuming I know _nothing_ of URIs, which is mostly true it seems, I'm
going to break the problem down as much as I can, in the hope that we
can specify something useful. (I've added a "Converting a zip relative
path to a URI reference" section to the widget spec, where I hope this
discussion will end up)

Basically in the spec we have this:

1. In a zip file, the identity of a file is stored as a text string of
variable length that always looks like: "path/to/file.ext" or
"file.ext" or "folder/". The only parts that are reserved by the zip
specification are:

  a. the "/" character as the delimiter

  b. the character encoding of string.

There is no notion of a "file name" per se. Implementations just
translate anything before a "/" into a folder on a file system and
anything after the last "/" into a file. If there is nothing after the
slash, it is an empty folder. The data is (obviously) written into the
file whose name is represented by the string after the last "/".

2. In a zip file, the text string that identifies the file is a
sequence of octets encoded as either CP437 or UTF-8  (which encoding
was used is dependent on whether a bit flag is set to 1 or 0 - where 1
means UTF-8 - cp437 is the default but most OSs ignore this and just
do a 1 to 1 encoding from native OS encoding to Zip file name (thus,
not using cp437 and hence making invalid zip files - e.g., the file
names of a zip file made on a mac are in fully decomposed Unicode,
which then breaks on Windows)... but that is irrelevant here because
in the spec we just say if it's not UTF-8, then it's cp437 -
implementers, fix your darn Zip implementations!)

3. In the spec, we call the above a "zip relative path". It is not
strictly a path in the URI sense because the encoding can be in UTF-8
without being "URL encoded" or whatever, right?

4. a zip relative path cannot contain any of the following characters:
<,>,:,",/,\,|,?,*,^,`,{,}. I introduced that restriction, not Zip. The
reason why these characters are banned is because they are not allowed
on some OSs.

I assume that a zip relative path must be converted (URL-encoded or
whatever it is called) to be used as an URI or IRI (or LEIRI) to be
useful in any XML context that accepts URIs (e.g., a content element's
src attribute)?

>The XML Base specification defines
> xml:base as a LEIRI-valued attribute.

Argh! The LEIRIs note (or xml:base) is confusing, why does LEIRIs say
"New protocols and formats should not use Legacy Extended IRIs." Yet
xml:base Second ed. still uses LEIRIs?! And why did the W3C allow
something to go to rec that has a dependency on a non-normative note
that discourages it's own use? (Ok, that's just me venting, I don't
expect a real answer but  just seems really stupid).

So, (again pretending I have never heard of URIs) what do I need to
specify in the specification to convert a zip relative path into an
one of these things called "URI reference" or "IRI reference" (or a
LEIRIs) that would be useful in the context of xml:base?

I also don't understand: does xml:base work with IRIs at all? Are IRIs
are stricter subset of LEIRIs? How can a zip relative path become a
LEIRI? does it have to? or, given our understading of what a zip
relative path is, can a zip relative path just be an IRI?

> So, the relative URI reference would
> be evaluated with respect to whatever base URI the configuration document
> has anyway.

This is _not_ what we want. In Marcos speak (not URI speak!) there are
two kinds of path:

1. A relative path: a path that looks like this: hello/there.file or
./hello/there.file or ../hello/there.file.
 that is, a path that has no slash at the front, or has a dot or two
at the front. What is this called in URI speak?

2. An absolute path: a path that looks like this /hello/there.file.
That is, a path that has a slash on the front. The slash at the front
denotes the root of a package. What is that called in URI speak?

Ok, again pretending I know nothing of these "URI" things... so here
is what I want:

1.  the (Marcos) relative paths in a configuration document to be
resolved to a (Marcos) absolute path based on the locale folder (a zip
relative path) that matches the user agent's locale. So, for instance,
the user agent's locale is "en":

widget.wgt
    config.xml
    locales/en/myfile.gif

I want config.xml to be parsed (and all Marcos relative paths to  be
converted to start with /locales/en/ - i.e., they become Marcos
absolute paths based on xml:base).

So:

<widget>
   <icon src="myfile.gif">
 </widget>

Becomes the following DOM:

widget
    |_ xml:base = "/locales/en/" (set dynamically)
    |_icon
         |_src  = "/locales/en/myfile.gif"

So,  the zip relative path (cp437 encoded octets) "locales/en" must be
converted to a URI (ascii) or IRI.

What happens if zip relative path has characters outside the ASCII
range? What does that IRI thing look like?  Does it contain #%%, where
%% are UTF-8 code points?

>> If the said folder does not exist inside the widget
>> package, then the user agent must ignore this attribute, meaning that
>> the user agent must continue to either use the configuration
>> document's location within the package as the value of xml:base;

Ok, the above is wrong.

>> or
>> continue to use the value of any correctly declared xml:base attribute
>> in the ancestor chain.

Ok, I guess the equiv of a 404 is fine here. Will remove.

>> When the xml:base attribute is absent, the base
>> URI will be the folder in which the configuration document resides.

Ok, this is now wrong too, because of the new i18n model. I can't
fully fix this text until I finish working out the i18n model
behavior. This is the subject for a different email but I will try to
work on this.

>> The value of xml:base attribute must be declared as a URL encoded zip
>> relative path (the term URL encoded is defined in the [URI]
>> specification).
>> ]]
>
> -1
>
> You're redefining the meaning of xml:base completely; the likely effect is
> that XML processors will get all confused.  With these restrictions, I'd
> suggest to use a separate widget:base attribute -- or just a configuration
> element that says where things will sit.

No, I want to use xml:base here. I don't want to respecify what
xml:base in our spec.

>> The use case here is that an author might want to override
>> element-based localization so URIs are dereferenced to a folder of
>> their choice. This might be the case if the author has the following
>> folder/file structure:
>
> I think the use case is reasonable, but the use of xml:base to solve it
> isn't.

Why?

-- 
Marcos Caceres
http://datadriven.com.au

Received on Thursday, 26 March 2009 11:19:33 UTC