Re: Request for Comments: Last Call WD of Widgets 1.0: Packaging & Configuration spec; deadline 31 Jan 2009 from Marcos Caceres on 2009-01-23 (public-webapps@w3.org from January to March 2009)

From: Marcos Caceres <marcosscaceres@gmail.com>
Date: Fri, 23 Jan 2009 07:28:02 +0000
To: Boris Zbarsky <bzbarsky@mit.edu>
Cc: public-webapps <public-webapps@w3.org>
Message-ID: <b21a10670901222328s60f4e64bn35b16c26443d364@mail.gmail.com>
Hi Boris,

On Thu, Jan 22, 2009 at 3:14 PM, Boris Zbarsky <bzbarsky@mit.edu> wrote:
> Marcos Caceres wrote:
>>
>> Ok, I've removed it. This may cause implementations to override files
>> on systems that don't support case insensitive file names. This should
>> not be a real problem, as most file system won't let you create files
>> with the same name but different cases. And, on Windows at least, if
>> you try to add a file to a zip archive that already contains a file
>> with the same name (regardless of case), it will ask you to override
>> the file.
>
> That sounds fine to me.  An informative note may be merited.  On the other
> hand, you could require widget UAs to not do such overwriting (e.g. use
> memory if the filesystem can't deal).  Might be worth it.
>
>>> 3)  When parsing a non-negative integer (Section 8.2, step 8), what's the
>>> expected behavior for integers larger than 2^32?  2^64?  Are
>>> implementations
>>> of this specification required to do integer arithmetic on arbitrarily
>>> large
>>> integers?  If not, is the behavior just implementation-dependent?
>>
>> I think that is an implementation detail.
>
> That should be mentioned explicitly, in my opinion.

Ok. I'll need to run this by the working group as I had something like
this in very early drafts of the spec and received criticism for being
overly prescriptive (It could have been that I wrote the text
incorrectly). Can you please suggest some text that we could use?

>>> 4)  Section 8.2, step 8, it would be good to make sure that the image
>>> identification table matches the one in HTML5 (possibly by having both
>>> specifications refer to a single table, if that's workable).
>>
>> The tables match (because I ripped the values straight from HTML5).
>> HIxie and A. Barth are working on a separate internet draft for
>> sniffing [1]. We will probably end up referencing that.
>
> Sounds good.
>
>>> 5)  Section 8.2, step 8, I'm not sure why image/svg+xml is required to be
>>> processed according to SVGTiny.  This means that an SVG 1.1 or SVG 1.2
>>> Full
>>> (whenever that happens) user-agent cannot implement this specification,
>>> as
>>> far as I can see.
>>
>> Hmmm... that's not what I meant. Is SVGTiny a subset of 1.1 or 1.2?
>
> The SVGTiny _language_ you cite is a subset of SVG 1.1, unless someone
> screwed up.  Content authored within the constrains of SVG Tiny 1.1 should
> render identically in SVG Tiny 1.1 and SVG Full 1.1 UAs.
>
> However, since SVG requires that "unknown" attributes and tags cause things
> not to render, a UA that processes according to SVG Tiny 1.1 will in fact
> render various SVG Full 1.1 markup differently than a UA that processes
> according to SVG Full 1.1, if I understand the setup correctly.
>
>> How do you recommend we proceed here?
>
> That really depends on what the goal is.  What _is_ the goal?

The goals are as follows:
  1. Widget engines optionally support SVG Tiny for the icon format
(though they can have the capability to render full SVG).
  2. For the purpose of widgets, icons are written by authors to
conform to SVG Tiny (not full)
  3. Widget engines that support full, can render icons in SVG Tiny...
but, for interop, widget engines should not render icons written in
SVG Full 1.1 (unless the icon also conforms to SVG Tiny).

>>> 6)  Section 6.2 talks about using file extensions followed by
>>> content-type
>>> sniffing to determine MIME types.  This sounds to me like the exact
>>> process
>>> is up to the UA.  Then Section 8.2, step 8, has specific lists of
>>> extensions
>>> and magic numbers that UAs need to recognize.  Is the sniffing allowed in
>>> Section 6.2 required to be a superset of what Section 8.2 allows?  If so,
>>> this should be made clearer.
>>
>> Understood. I added the following text:
>> "For sniffing the content type of images formats supported by this
>> specification, a widget user agent must use the Rules for Identifying
>> the MIME type of an Image. For other file formats supported by the
>> specification, a widget user agent must use the Rules for Identifying
>> the MIME Type of a file."
>
> Sounds good.
>
>> We might need a manifest format... something like:
>>   <manifest>
>>      <resource type="some/type" src="/path/to/file" />
>>  <manifest>
>>
>>  Or, better still...
>>
>> <mediatypes>
>>   <type name="some/type" extension="gif"/>
>> </mediatypes>
>>
>> Or a mix of both solutions.
>
> Sure.  I have no real opinions on the form this would take, to be honest.

Just to be clear, do you feel strongly that this should be a feature
in Widgets 1.0?

>> We had thought about deferring that feature to version 2.0 (not widget
>> engine on the market has required such a manifest thus far because
>> they all seem to just rely on sniffing).
>
> A number of them presumably do sniffing by extension.  Gecko certainly does
> for its jar: handling.  This specification explicitly prohibits that,
> though.

Sorry, I don't understand - we make file extension to MIME mapping a
priority over sniffing: Step 1  of section "Rules for Identifying the
MIME Type of a file" reads as follows:

"1. If the file entry has a file extension, attempt to match the file
extension to one in the first column in the file identification table.
If there is a match, then return the MIME Type value. "

>> Because the Zip spec mandates CP437 unless the implementation supports
>> version 6.3 or above of the Zip spec. Sadly, most Zip implementations
>> do whatever they want when it comes to character encoding. This is
>> probably the biggest barrier to interoperability of packaging.
>
> That seems truly unfortunate, especially since from what I can tell ZIP
> libraries _also_ do whatever they want with character encodings.  If people
> are going to be forced to write their ZIP decompressors from scratch to
> implement this specification, what exactly are the benefits of using ZIP at
> all?

I guess the thing would be to lobby Microsoft, Apple, and others to
change/update their Zip implementations. I imagine Microsoft will do
this anyway because OOXML's packaging format has the same issues we
have here... who knows, maybe it's not too late to have this fixed in
Windows 7:) The other thing is that widgets this will only be a
problem in some small segments of the market. Most people will only
write widgets in one language and distribute it amongst people who use
the same character encoding on their systems. This would mirror
today's reality I guess. And as you said, it does open an opportunity
for a vendor to create conforming packaging tools.

>>> In the same algorithm, there's mention of "the input's text nodes". This
>>> relationship is not defined in this specification or elsewhere.  I assume
>>> you mean the text nodes which have input as their ancestor, right?
>>
>> The "input" is the element being processed.
>
> That doesn't answer my question.  There is no concept of "this element's
> text nodes" in the DOM that I know of.  There are concepts like "parent
> node", "child nodes", "next sibling", "previous sibling", etc.  You
> presumably want to express whatever you're trying to say in terms of those.

>> Agreed. Ok, that section was totally screwed:) I've rewritten the the
>> algorithm and added a new algorithm that normalizes the white space:
>
>> 2. If the widget user agent supports [ITS]: If the element has the dir
>> attribute from the [ITS] namespace with a valid its:dir value, then
>> process its text nodes in accordance to the [ITS] specification.
>
> This still has this "its text nodes" thing.  Presumably you mean "its
> descendant text nodes" or something?

Ok, yes: the word "descendant" is needed. Sorry about that.

>> 3. In result, convert any sequence of one or more U+000A LINE FEED
>> (LF) or U+000D CARRIAGE RETURN (CR) or U+0009 CHARACTER TABULATION
>> (tab) character into a single U+0020 SPACE.
>
> You probably want to include U+0020 SPACE in your list of things which are
> to be collapsed.  That said, why not just use the existing "space
> characters" that's already defined in this spec?

I guess I wrote it that way so single spaces don't get replaced with
single spaces. However, you raise a good point (that there will be
sequences of two or more space characters after the substitutions in
step 3 above has taken place). I added the following as step 4 "In
result, convert any sequence of two or more U+0020 SPACE characters
into a single U+0020 SPACE."

>> Ok, turns out that the Rules for Removing White Space are not actually
>> needed anywhere (and would have cause problems because "10 00 11"
>> would have been interpreted as "100011" instead of an error). I
>> rewrote the Rules for Parsing Non-Negative Integer  skip space
>> characters instead (as should have been in the first place, and as if
>> defined in HTML5).
>
> Sounds good (and algorithm looks good).

Thank you again for the quick response and for all your help!

Kind regards,
Marcos
-- 
Marcos Caceres
http://datadriven.com.au
Received on Friday, 23 January 2009 07:28:42 UTC