Re: Proposed changes to Moki - External Resources Test from Sean Owen on 2007-07-30 (public-mobileok-checker@w3.org from July 2007)

From: Sean Owen <srowen@google.com>
Date: Mon, 30 Jul 2007 18:34:15 -0400
To: "Jo Rabin" <jrabin@mtld.mobi>
Cc: public-mobileok-checker <public-mobileok-checker@w3.org>
Message-ID: <e920a71c0707301534j6400f42em100cd70826aaa6b4@mail.gmail.com>
My general reaction is that this is getting too complicated -- for
version 1.0 at the very least. Retrieve images, examine their
Content-Type. If missing, assume it's not a supported image. If
present and it's GIF or JPEG, great, parse it. If it's something else,
assume it's not supported.

I strongly believe we need to favor simple solutions that solves the
problem of "implement mobileOK Basic 1.0" first. It's good to develop
this into a more general platform for evaluating a web resource but we
haven't quite signed on for that just yet. At the moment scope and
complexity appears to be outpacing progress towards an implementation.
This particular issue -- unidentified content types -- feels
corner-case-ish to me. I am happy for version 1.0 to go out with a
crude reaction to this situation as long as it's handling the 99% of
other cases usefully. And then this can be tackled.

On 7/25/07, Jo Rabin <jrabin@mtld.mobi> wrote:
>
>
>
>
> Hi Laura
>
>
>
> Sorry about taking a long time to get back, you ask good questions! Some
> thoughts.
>
>
>
> We should include all resources pointed to by img tags. I noticed in the
> preprocess method that the checker seems to want to assess content type by
> looking for the file extension. But that is really not right at all. It's
> not necessarily the case that images will have a file extension, and even
> when they do, it's an error to infer the content type from them – see e.g.
> [1] and [2] which make it clear that the resource must be retrieved and its
> Content-Type header examined in order to determine its type.
>
>
>
> [1]
> http://www.w3.org/2001/tag/doc/metaDataInURI-31.html#erroneous
>
> [2] http://www.w3.org/2001/tag/doc/mime-respect#missing
>
>
>
> Even though the object element allows the specification of content type,
> browsers typically taste the content of nested objects even in the presence
> of this information to determine the actual content type. Given that they
> stop when they find something they like, it's a good question to ask whether
> the checker should continue and whether and where it should put those
> references in moki. It's an even better question to wonder how the xslt
> would differentiate between those objects that should be counted and those
> that should not.
>
>
>
> So some thoughts about the code:
>
>
>
> 1. Given that the image type is not known in advance of retrieving it, and
> given that the image may not be of a known type, there seems to be the need
> for a factory somewhere which constructs a JPEG resource, a GIF resource or
> a generic image resource depending on the result of the retrieval. It looks
> like the image element in moki needs to be extended to include an image type
> which should be set to the media type of the response under the imageInfo
> element.
>
>
>
> 2. When processing images (and links and so on) in the primary document, I
> think that duplicates should not be suppressed and the duplicate detection
> should be handled in the preprocess method. Aside from anything else, the
> detection of duplicates should be done on a canonical URI not just a text
> match (and on the absolute version of the URI, for that matter). Though as
> we saw from a little test that Dom put together real browsers do appear to
> do a textual match, so that aspect of the behaviour needs to be centralized
> so we can change it easily or control it by a switch.
>
>
>
> 3. In the CSSResource class, an image list needs to be constructed and then
> processed as above.
>
>
>
> 4. The same observation applies to link elements as to images. Since CSS
> files can include other CSS files they need to have a list of included CSS
> and that needs to be preprocessed according to the same URI matching
> strategy.
>
>
>
> 5. Ideally, each of the lists of URIs should provide a reference to where
> they were found in the source of whatever document they were found in for
> error reporting purposes. (Did I hear a collective groan about line and
> column number references L) and so that the moki document can provide the
> info that an image/css was in error and is referenced in 7 rather than just
> one place.
>
>
>
> 6. I think there is a need for an objects element in moki. It should contain
> objects and the objects should say a) what their content type is and b)
> whether they should be counted as an external reference. That should be easy
> enough to do. What's not so obvious is what to do about text/html when it is
> found in an object and I think the answer is that it should be counted and
> skipped.
>
>
>
> 7. Oh, and finally, before I forget. There is the case (401 Authentication)
> where both the page presented with the response and the primary document are
> tested and the external resources from the authentication page are added to
> the total. On reflection, I think we should think again about this behaviour
> before we go to the next last call of the mobileOK doc. And not worry about
> it in the code for now. (Famous last words)
>
>
>
> I've just checked in some updates with a couple of TODOs in the relevant
> places, I hope.
>
>
>
> I'll also update the moki example doc with the suggestions I made. And while
> I am about it I will generate a schema for moki. It's about time.
>
>
>
> Hope this helps. Oh and these are just my suggestions, you or anyone else
> may have better ones.
>
>
>
> Jo
>
>
>
>
>  ________________________________
>
>
> From: public-mobileok-checker-request@w3.org
> [mailto:public-mobileok-checker-request@w3.org] On Behalf
> Of Laura Holmes
>  Sent: 24 July 2007 23:54
>  To: public-mobileok-checker
>  Subject: Proposed changes to Moki - External Resources Test
>
>
>
>
> Hi all,
>  I just wanted to run some changes by you all and get some feed back.
> Currently, I'm working on the ExternalResourcesTest and am running into
> conditions that haven't been accounted for in the existing code. These
> conditions include:
>
>  1) counting references contained in objects that are not jpeg or gif:
>  there are many other image types and other types of objects (such as
> applications or audio) that may be included on a page. I'm assuming that we
> want to include these references even if they can't be rendered on a mobile
> phone due to a comment made regarding nested objects: "For nested object
> elements, count only the number of objects that need to be assessed before
> content matching the request header defined in 2.3.2 HTTP Request is found."
> So, we want to assess other content types other than jpeg and gif when
> counting external resources.
>
>  2) keeping track of unique references to resources that are other than jpeg
> or gif:
>  If two references are made in the primary document to the same image, it is
> only counted once, but if we reference the same image in css, we currently
> don't have a way of tracking this.
>
>  3) references contained in nested objects are counted regardless of whether
> or not the reference is actually reached:
>  We only identify object nodes by name, not in serial order.
>
>  Here are my proposed changes I want to make, which would entail changing
> the shape of the moki doc a bit:
>
>  We create an ArrayList of URIs that is maintained throughout the entirety
> of the parsing process. When a reference to a resource is encountered, we
> check to see if the list already contains that URI. This list will contain a
> list of all the resources contained in both the primary doc and css files.
> At the end of the parsing process, we can add an additional node to any
> location in the moki that states the length of the list ( i.e. how many
> unique resources were encountered). I propose adding this as it's own node
> under moki, as it spans information in the primary doc, images, and css.
> Because we only want to record the number of unique references, I can't see
> any other way to pull it from the moki document using xsl. I'm open to any
> other suggestions.
>
>  As to the nested object problem, I'm at a loss for solutions given our
> current implementation of the DOM. Suggestions?
>
>  Thanks for your input in advance,
>  Laura
>
>
Received on Monday, 30 July 2007 22:39:43 UTC