RE: Proposed changes to Moki - External Resources Test from Jo Rabin on 2007-07-25 (public-mobileok-checker@w3.org from July 2007)

From: Jo Rabin <jrabin@mtld.mobi>
Date: Wed, 25 Jul 2007 17:23:42 +0100
To: "public-mobileok-checker" <public-mobileok-checker@w3.org>
Message-ID: <C8FFD98530207F40BD8D2CAD608B50B4484198@mtldsvr01.DotMobi.local>
Hi Laura

 

Sorry about taking a long time to get back, you ask good questions! Some
thoughts.

 

We should include all resources pointed to by img tags. I noticed in the
preprocess method that the checker seems to want to assess content type
by looking for the file extension. But that is really not right at all.
It's not necessarily the case that images will have a file extension,
and even when they do, it's an error to infer the content type from them
- see e.g. [1] and [2] which make it clear that the resource must be
retrieved and its Content-Type header examined in order to determine its
type.

 

[1] http://www.w3.org/2001/tag/doc/metaDataInURI-31.html#erroneous

[2] http://www.w3.org/2001/tag/doc/mime-respect#missing

 

Even though the object element allows the specification of content type,
browsers typically taste the content of nested objects even in the
presence of this information to determine the actual content type. Given
that they stop when they find something they like, it's a good question
to ask whether the checker should continue and whether and where it
should put those references in moki. It's an even better question to
wonder how the xslt would differentiate between those objects that
should be counted and those that should not.

 

So some thoughts about the code:

 

1. Given that the image type is not known in advance of retrieving it,
and given that the image may not be of a known type, there seems to be
the need for a factory somewhere which constructs a JPEG resource, a GIF
resource or a generic image resource depending on the result of the
retrieval. It looks like the image element in moki needs to be extended
to include an image type which should be set to the media type of the
response under the imageInfo element.

 

2. When processing images (and links and so on) in the primary document,
I think that duplicates should not be suppressed and the duplicate
detection should be handled in the preprocess method. Aside from
anything else, the detection of duplicates should be done on a canonical
URI not just a text match (and on the absolute version of the URI, for
that matter). Though as we saw from a little test that Dom put together
real browsers do appear to do a textual match, so that aspect of the
behaviour needs to be centralized so we can change it easily or control
it by a switch.

 

3. In the CSSResource class, an image list needs to be constructed and
then processed as above.

 

4. The same observation applies to link elements as to images. Since CSS
files can include other CSS files they need to have a list of included
CSS and that needs to be preprocessed according to the same URI matching
strategy.

 

5. Ideally, each of the lists of URIs should provide a reference to
where they were found in the source of whatever document they were found
in for error reporting purposes. (Did I hear a collective groan about
line and column number references :-() and so that the moki document can
provide the info that an image/css was in error and is referenced in 7
rather than just one place.

 

6. I think there is a need for an objects element in moki. It should
contain objects and the objects should say a) what their content type is
and b) whether they should be counted as an external reference. That
should be easy enough to do. What's not so obvious is what to do about
text/html when it is found in an object and I think the answer is that
it should be counted and skipped.

 

7. Oh, and finally, before I forget. There is the case (401
Authentication) where both the page presented with the response and the
primary document are tested and the external resources from the
authentication page are added to the total. On reflection, I think we
should think again about this behaviour before we go to the next last
call of the mobileOK doc. And not worry about it in the code for now.
(Famous last words)

 

I've just checked in some updates with a couple of TODOs in the relevant
places, I hope.

 

I'll also update the moki example doc with the suggestions I made. And
while I am about it I will generate a schema for moki. It's about time.

 

Hope this helps. Oh and these are just my suggestions, you or anyone
else may have better ones.

 

Jo

 

________________________________

From: public-mobileok-checker-request@w3.org
[mailto:public-mobileok-checker-request@w3.org] On Behalf Of Laura
Holmes
Sent: 24 July 2007 23:54
To: public-mobileok-checker
Subject: Proposed changes to Moki - External Resources Test

 

Hi all,
I just wanted to run some changes by you all and get some feed back.
Currently, I'm working on the ExternalResourcesTest and am running into
conditions that haven't been accounted for in the existing code. These
conditions include: 

1) counting references contained in objects that are not jpeg or gif:
there are many other image types and other types of objects (such as
applications or audio) that may be included on a page. I'm assuming that
we want to include these references even if they can't be rendered on a
mobile phone due to a comment made regarding nested objects: "For nested
object elements, count only the number of objects that need to be
assessed before content matching the request header defined in 2.3.2
HTTP Request <http://www.w3.org/TR/mobileOK-basic10-tests/#http_request>
is found." So, we want to assess other content types other than jpeg and
gif when counting external resources.

2) keeping track of unique references to resources that are other than
jpeg or gif: 
If two references are made in the primary document to the same image, it
is only counted once, but if we reference the same image in css, we
currently don't have a way of tracking this.

3) references contained in nested objects are counted regardless of
whether or not the reference is actually reached: 
We only identify object nodes by name, not in serial order. 

Here are my proposed changes I want to make, which would entail changing
the shape of the moki doc a bit:

We create an ArrayList of URIs that is maintained throughout the
entirety of the parsing process. When a reference to a resource is
encountered, we check to see if the list already contains that URI. This
list will contain a list of all the resources contained in both the
primary doc and css files. At the end of the parsing process, we can add
an additional node to any location in the moki that states the length of
the list ( i.e. how many unique resources were encountered). I propose
adding this as it's own node under moki, as it spans information in the
primary doc, images, and css. Because we only want to record the number
of unique references, I can't see any other way to pull it from the moki
document using xsl. I'm open to any other suggestions. 

As to the nested object problem, I'm at a loss for solutions given our
current implementation of the DOM. Suggestions?

Thanks for your input in advance, 
Laura
Received on Wednesday, 25 July 2007 16:24:17 UTC