Re: More clarifications on object processing

Hello

I'm sorry that this has been languishing in my in box for so long, but
here are some comments. One thing is for sure, it's a complicated
business and the current structure of the document doesn't make it easy
to follow ...

... it could be that an explanatory note is needed. But that would mean
finding an editor....

... my comments in-line.

Jo

On 12/06/2008 16:56, Dominique Hazael-Massieux wrote:
> Le jeudi 12 juin 2008 à 17:37 +0200, Francois Daoust a écrit :
>> Dominique Hazael-Massieux wrote:
>>> * when analyzing external resources (in ContentFormatSupport,
>>> PageSizeLimit, ExternalResources), the objects and images that are set
>>> as fallback of an object that is in an acceptable format shouldn't be
>>> counted. For instance,
>>> <object data="myimage.gif"><img src="myimage.png" alt=""/></object>
>>> shouldn't trigger an error in ContentFormatSupport, the weight of
>>> myimage.png shouldn't be counted in PageSizeLimit and ExternalResources
>> Part to clarify:
>>   "object means "object" and "img" in the note on objects to include in 
>> 2.4.6 Included Resources"
> 
> Yes, I agree that replacing "object elements" with "object and img
> elements" in the note under 2.4.6 should suffice to clarify the
> situation.

I'm not sure how this helps. According to the definition of included
resources (those that form part of the final representation) the <img>
in the  above case is not "included". Nor is it tasted for its content 
type. I don't think there are any circumstances in which an IMG is 
tasted for its content type, without subsequently either causing a FAIL 
or becoming an included resource.


> 
> In the said note (copied below):
>> "Note: object elements that are accessed in order to test their 
>> Content-Type HTTP header, but do not form part of the ultimate 
>> representation of the resource under test (see 3.15 OBJECTS_OR_SCRIPT ), "
> 
> .... it would be nice if "form[ing] part of the ultimate representation"
> was actually defined (in 3.15.1, most probably).

Pls see proposal below, in which this offensive wording is eradicated.

As above I think there is the opportunity to write some (probably non 
normative) wording about this. Either as an appendix or, perhaps, given 
that we want to release this doc on a short timeline, as a Group Note.

> 
>>> * similarly, I don't think we want to raise a ContentFormatSupport error
>>> on <object data="myimage.png"><img src="myimage.gif" alt="" /></object>
>>> since this is using correctly the fallback mechanism; while this gets
>>> accepted by ObjectsOrScript, this would currently raise an error in the
>>> way I read ContentFormatSupport; 
>> Part to clarify:
>>   "ultimate representation of the resource under test"
>>
>> [Side note:
>> The Accept header precises that only image/gif and image/jpeg are supported.
>> Is it still correct to send a PNG image even with a fallback mechanism?
>> I think it is, I'm raising the point in case it's not...]
> 
> (that's actually the gist of my comment, I think :)
> 
>>  From the note above, since "myimage.png" does "not form part of the 
>> ultimate representation of the resource under test", the test should not 
>> apply on "myimage.png".
> 
> That is true, indeed; but this reiterates the need for clarifying
> "ultimate representation of the resource under test" (as you note). 

all right, already, see below!

> 
> 
>>> * I don't think "myimage.gif" should be counted as external
>>> resources/page size limit in the following instance:
>>> <object data="myimage.gif" type="image/png">Hello</object> - the current
>>> text says to "include those objects whose content type is either
>>> "image/jpeg" or "image/gif" irrespective of whether the type attribute
>>> is specified.", but it's not clear why.
>> Part to clarify:
>>   The note on how to select objects at the beginning of 
>> EXTERNAL_RESOURCES and PAGE_SIZE_LIMIT
>>
>> It may deserve clarification, but this restriction applies to the set of 
>> "objects retrieved under the 3.15.1 Object Element Processing Rule". An 
>> object with an "image/png" type won't be retrieved by the Object Element 
>> Processing Rule, so "myimage.gif" won't be counted.

the object with type="image/png" _will_ be retrieved under 3.15.1 
"Retrieve the object (ignoring the type attribute)"



> 
> I think there is a risk of confusion on the meaning of "type" here:
>  * there is the actual Content-Type under which the image is served (and
> which is authoritative)
>  * and there is the type attribute on the <object> element, which serves
> as a hint to browsers to determine whether they need to bother
> downloading the referenced image

There's an inconsistency in 3.15.1 where 'type' attribute should be in 
mono space to be consistent with the rest of the document.

there's also an inconsistency as to how Content-Type etc. are referred 
to - I think the style should be "HTTP 'Content-Type' header" with 
Content-Type in mono space.

I'll fix that up independently of the rest of this discussion.

> 
> Assuming that myimage.gif is indeed a GIF image (and served as such in
> HTTP), but that the hint given in the type attribute is wrong (set to
> image/png), my reading of the current algorithm is that myimage.gif
> would be counted in the PAGE_SIZE_LIMIT and EXTERNAL_RESOURCES since
> 3.15.1 says to include GIF and JPEG images "irrespective of whether the
> type attribute is specified".
> 
> I think that a GIF image embedded in an object with a type attribute set
> to "image/png" wouldn't actually be part of the ultimate representation
> of the document; the current algorithm suggests that it would.

We don't know in any real implementations whether it would or not. I 
recall that earlier discussion around this point suggested that some 
real implementations do, and some don't. We ended up with the current 
wording because in general we thought that it would reduce the number of 
FAILs to a minimum. As noted at the time of the discussion, the carve 
out on PAGE_SIZE_LIMIT etc. says to include in the count those that are 
definitely included resources and only include tasted resources whose 
retrieval could not be avoided because they lack the type attribute.


> 
>>> * if I hit an HTTP redirect, does the size of the page served as the
>>> redirect page counts in PAGE_SIZE_LIMIT-1 or only
>>> under PAGE_SIZE_LIMIT-2? I've implemented the latter since I find it
>>> less confusing, but the spec could be clearer about it
>> Part to clarify:
>>   2.4.3 HTTP_RESPONSE
>>   3.16 PAGE_SIZE_LIMIT
>>
>> Suggestion:
>>   In 2.4.3 HTTP_RESPONSE, precise the total to which the size of the 
>> response should be included (I propose the second as well)
>>   In 3.16 PAGE_SIZE_LIMIT, link back to 2.4.3 HTTP_RESPONSE to precise 
>> what we mean by "the size of the document" and "the size of the response 
>> body". This would be consistent with 3.6 EXTERNAL_RESOURCES that 
>> includes such a link.

Hmmm, yes, I agree that this is indeed unclear.

I think I'd like to offer the following change:


PAGE_SIZE_LIMIT

Retrieve the document under test, if its size (excluding any 
redirections discussed under 2.4.3 HTTP Response) exceeds 10 kilobytes, FAIL

Add to a running total (total size) the size of all the HTTP response 
bodies that are required to retrieve the document under test (see 2.4.3 
HTTP Response).

For each unique included resource (as defined in 2.4.6 Included Resources):

 Add the size of all the response bodies that are required to retrieve 
the resource (see 2.4.3 HTTP Response) to the running total. Include in 
the total only those objects retrieved under the 3.15.1 Object Element 
Processing Rule whose type attribute is not specified, and those whose 
content type is either "image/jpeg" or "image/gif" irrespective of 
whether the type attribute is specified.

If the total size exceeds 20 kilobytes, FAIL

Note:

In the case of resources that are referenced more than once in the 
document under test, and where, as discussed under 2.4.6, they are 
cached, it is the initial retrieval of that resource (as determined by 
the first reference in document order) that counts towards the total.

Note:

Where the Object Processing Rule (see 3.15.1) yields a resource that is 
found to be cached, objects that must be assessed in the course of 
yielding the cached resource count towards the total.

====

Proposed Change to 2.4.3 - change the wording of

Include the size of the response in the total as described under 3.16 
PAGE_SIZE_LIMIT

to

Include the size of the response in the "total size" as described under 
3.16 PAGE_SIZE_LIMIT

in the various places in which it is found.

====

Proposed Change to EXTERNAL_RESOURCES

(just to move the note to make consistent with PAGE_SIZE_LIMIT (above)

Retrieve the resource under test, and add the number of retrievals 
required to obtain the resource (see 2.4.3 HTTP Response) to a running 
total.

For each unique included resource, as defined in 2.4.6 Included Resources:

 Request the referenced resource

 Add the number of HTTP requests that are required to retrieve the 
resource (see 2.4.3 HTTP Response) to the running total. Include in the 
count only those objects retrieved under the 3.15.1 Object Element 
Processing Rule whose type attribute is not specified, and those whose 
content type is either "image/jpeg" or "image/gif" irrespective of 
whether the type attribute is specified.

If the total exceeds 10, warn

If this total exceeds 20, FAIL

===

Proposed Change to 3.15.1 Object Processing Rule

(addition of caching)

change

 Retrieve the object (ignoring the type attribute)

to

 If the resource is not already cached (see 2.4.6 Included Resources), 
retrieve the object (ignoring the type attribute)


===

Proposed Change to 2.4.6 Included Resources

Current Text

Note:

object elements that are accessed in order to test their Content-Type 
HTTP header, but do not form part of the ultimate representation of the 
resource under test (see 3.15 OBJECTS_OR_SCRIPT ), are not considered to 
be included resources. Their treatment, as regards 3.16 PAGE_SIZE_LIMIT 
and 3.6 EXTERNAL_RESOURCES , is described in the relevant section.

Propsoed Text:

Note:

Resources that are retrieved as references from object elements and 
whose Content-Type HTTP header is not set to image/jpeg or image/gif are 
not considered to be included resources (see 3.15 OBJECTS_OR_SCRIPT). 
Their treatment, as regards 3.16 PAGE_SIZE_LIMIT and 3.6 
EXTERNAL_RESOURCES , is described in the relevant section.

Received on Wednesday, 2 July 2008 11:53:11 UTC