Re: Proposed changes to Moki - External Resources Test from Laura Holmes on 2007-07-26 (public-mobileok-checker@w3.org from July 2007)

From: Laura Holmes <holmes@google.com>
Date: Thu, 26 Jul 2007 19:08:50 -0400
To: "Jo Rabin" <jrabin@mtld.mobi>, public-mobileok-checker <public-mobileok-checker@w3.org>
Message-ID: <135a9f560707261608n3e0704f3h548e607f84d4d151@mail.gmail.com>
> 3. In the CSSResource class, an image list needs to be constructed and
then processed as above.

>It sounds as if you're suggesting building up a list of references within
the style sheet independent of the images contained in the primary document.
I proposed the idea of a universal list of references because it's very
possible that a >stylesheet could reference the same images as referenced in
the primary document. If we construct an independent resource list from the
css and the primary document, we might get some duplications.

 >Well, what I am suggesting is that rather than passing a single list
around the place, each resource is responsible for managing and maintaining
its own list. I agree that there needs to be a central "cache" list so that
when about to >check a resource you look at the list and see if you already
did it. I think that it should be noted specifically in the moki document
that you didn't retrieve it because you think it's the same as some
specified other resource.
What I think you mean is that each resource will manage it's own list of
resources, but we'll pull all those resources together to create a master
list before we process them. If each resource manages it's own list, as a
list of HTTPResource objects, duplicate resources will exist under different
resources' lists. In order to make sure we get the whole master list of
resources before processing them, we'd probably have to completely dive into
the css, create the list of image and object resources to be processed, and
then process that list.

Just so I'm completely certain of the structure of the moki, here's an
example as how I see things being structured right now. Correct me if I'm
thinking about this the wrong way or missing any nuances.

<moki>
   <primaryDoc/>
   <stylesheets>
        <stylesheet type="external"/>
                (references the stylesheet listed below, but you'd only know
if you looked at the source of this css document)
        <stylesheet type="external"/>
   </stylesheets>
   <images>
         <image>
               <reference/>
               <reference/>
         </image>
         <image>
               <reference/>
               <reference/>
         </image>
    </images>
    <objects>
          <object>
                <reference/>
                <reference/>
           </object>
    </objects>
</moki>

The method of keeping track of resources list, that most closely resembles
the final output as represented by this model, would be an array list of
external resources - and each entry keeps track of where it is referenced.
This might involve a simple external resource class, such as:

public class ExternalResource {
    private ArrayList<reference> references;
    private HTTP____Resource;
}

I'm still figuring out how this would affect the flow of preprocessing, but
I think fully processing the stylesheets as we encounter them would work,
while each css style sheet maintains its own list of resource URIs. If one
of those resources is a stylesheet, then we process the stylesheet while
maintaining a simple cache list of stylesheets already processed. Then, as
soon as all stylesheets are processed, we coalesce all the external
resources that are either images or objects into a master list, and then we
create the master list of fully fledged HTTPResources.

Thoughts?

Cheers,
Laura

P.S. Jo - I'd recommend Coupa Cafe as it's one of my favs and has free
wireless internet. :)

On 7/26/07, Jo Rabin <jrabin@mtld.mobi> wrote:
>
>   *From:* Laura Holmes [mailto:holmes@google.com]
> *Sent:* 26 July 2007 16:13
> *To:* Jo Rabin
> *Subject:* Re: Proposed changes to Moki - External Resources Test
>
>
>
> > It looks like the image element in moki needs to be extended to include
> an image type which should be set to the media type of the response under
> the imageInfo element.
> Given that there may be a difference between declared image type and the
> actual image type, if we include an element the states the retrieved file
> type and there's an element that states the declared image type, should we
> issue a warning if these two pieces of information don't match?
>
> Sounds like a good idea, though this will only happen for Objects, right?
> Img doesn't allow you to state the content type.
>
> > 2. When processing images (and links and so on) in the primary document,
> I think that duplicates should not be suppressed and the duplicate detection
> should be handled in the preprocess method.
>
> I'm unclear as to what we do after we've detected a duplicate, or how it
> should be handled. I think we had a conversation a while ago about how
> exactly multiple references should be represented within the moki document,
> but I'm not entirely sure what we decided on. What I felt was the most
> likely conclusion was that we record all references to that image, request
> the image once, and then in the corresponding image moki information have
> the image info mentioned once and all the listed references with line
> numbers included.
>
>  Yes, that sounds good to me
>
> Is some of this answered with the moki example doc? Sean's out of town and
> I don't have the link. If someone could send that to me, that'd be great.
>
>  No, it's not, as I haven't updated it to do so L. I will do if it is not
> too late over the next couple of days. Fwiw the current example doc is
> referenced from the TF home page [1].
>
> [1] http://www.w3.org/2005/MWI/BPWG/Group/TaskForces/Checker/Overview.html
>
>
> > 3. In the CSSResource class, an image list needs to be constructed and
> then processed as above.
>
> It sounds as if you're suggesting building up a list of references within
> the style sheet independent of the images contained in the primary document.
> I proposed the idea of a universal list of references because it's very
> possible that a stylesheet could reference the same images as referenced in
> the primary document. If we construct an independent resource list from the
> css and the primary document, we might get some duplications.
>
>  Well, what I am suggesting is that rather than passing a single list
> around the place, each resource is responsible for managing and maintaining
> its own list. I agree that there needs to be a central "cache" list so that
> when about to check a resource you look at the list and see if you already
> did it. I think that it should be noted specifically in the moki document
> that you didn't retrieve it because you think it's the same as some
> specified other resource.
>
>
> > 4. The same observation applies to link elements as to images. Since CSS
> files can include other CSS files they need to have a list of included CSS
> and that needs to be preprocessed according to the same URI matching
> strategy.
>
> How deep are we diving as far as included resources? If a stylesheet
> @imports another css file, do we evaluate that stylesheet as well? Or do we
> just include the URI as an external resource?
>
>  We go as far as it takes (modulo media type restrictions) as that is what
> a browser would (should) do.
>
>
> > (Did I hear a collective groan about line and column number references L)
>
> Yes, but not because it's impossible, it's just imperfect. As for final
> recording of errors, we can reference line numbers (not column numbers) from
> the primaryDoc/docContent of the moki document. This would give a roughly
> accurate location, except that if the original source document had reduced
> white space (as we suggest), the line numbers we report would not be the
> actual line numbers of the source document. However, if we chose to report
> the snippet of code that corresponds to that line number, that snippet would
> provide much more specific and useful information.
>
> However, right now we only contain the docContent for the primary doc, not
> the included CSSResources. For our currently line reporting solution, we
> would also have to include the source for the css pages. For the sake of
> error reporting but at the expense of keeping the moki smaller, would
> everyone like to include that information?
>
> Yes, I think that is quite important. If we report an error in a resource
> that is not actually referenced from the primary document it could leave a
> developer scratching their head for quite a long time, unless there is some
> way of finding what resource the error is in and where it is in that
> resource.
>
> > 6. b) whether they should be counted as an external reference
>
> I think counting objects as an external resource is a good idea, but I'm
> not sure what the criteria for being counted as an external resource would
> include. As soon as I know, I can start working on it.
>
> That it would actually be retrieved in realistic situations. So each
> object and its fall-back gets retrieved until one is found that matches the
> request criteria.
>
>
>
> Thanks for your detailed response, Jo.
>
>  Sorry I didn't update the moki example doc yet, I will do that soon –
> flying to Palo Alto tomorrow will do once there. From a coffee shop in
> University Avenue, maybe J
>
> Jo
>
>
>
>  On 7/25/07, *Jo Rabin* <jrabin@mtld.mobi> wrote:
>
> Hi Laura
>
>
>
> Sorry about taking a long time to get back, you ask good questions! Some
> thoughts.
>
>
>
> We should include all resources pointed to by img tags. I noticed in the
> preprocess method that the checker seems to want to assess content type by
> looking for the file extension. But that is really not right at all. It's
> not necessarily the case that images will have a file extension, and even
> when they do, it's an error to infer the content type from them – see e.g.
> [1] and [2] which make it clear that the resource must be retrieved and its
> Content-Type header examined in order to determine its type.
>
>
>
> [1] http://www.w3.org/2001/tag/doc/metaDataInURI-31.html#erroneous
>
> [2] http://www.w3.org/2001/tag/doc/mime-respect#missing
>
>
>
> Even though the object element allows the specification of content type,
> browsers typically taste the content of nested objects even in the presence
> of this information to determine the actual content type. Given that they
> stop when they find something they like, it's a good question to ask whether
> the checker should continue and whether and where it should put those
> references in moki. It's an even better question to wonder how the xslt
> would differentiate between those objects that should be counted and those
> that should not.
>
>
>
> So some thoughts about the code:
>
>
>
> 1. Given that the image type is not known in advance of retrieving it, and
> given that the image may not be of a known type, there seems to be the need
> for a factory somewhere which constructs a JPEG resource, a GIF resource or
> a generic image resource depending on the result of the retrieval. It looks
> like the image element in moki needs to be extended to include an image type
> which should be set to the media type of the response under the imageInfo
> element.
>
>
>
> 2. When processing images (and links and so on) in the primary document, I
> think that duplicates should not be suppressed and the duplicate detection
> should be handled in the preprocess method. Aside from anything else, the
> detection of duplicates should be done on a canonical URI not just a text
> match (and on the absolute version of the URI, for that matter). Though as
> we saw from a little test that Dom put together real browsers do appear to
> do a textual match, so that aspect of the behaviour needs to be centralized
> so we can change it easily or control it by a switch.
>
>
>
> 3. In the CSSResource class, an image list needs to be constructed and
> then processed as above.
>
>
>
> 4. The same observation applies to link elements as to images. Since CSS
> files can include other CSS files they need to have a list of included CSS
> and that needs to be preprocessed according to the same URI matching
> strategy.
>
>
>
> 5. Ideally, each of the lists of URIs should provide a reference to where
> they were found in the source of whatever document they were found in for
> error reporting purposes. (Did I hear a collective groan about line and
> column number references L ) and so that the moki document can provide the
> info that an image/css was in error and is referenced in 7 rather than just
> one place.
>
>
>
> 6. I think there is a need for an objects element in moki. It should
> contain objects and the objects should say a) what their content type is and
> b) whether they should be counted as an external reference. That should be
> easy enough to do. What's not so obvious is what to do about text/html when
> it is found in an object and I think the answer is that it should be counted
> and skipped.
>
>
>
> 7. Oh, and finally, before I forget. There is the case (401
> Authentication) where both the page presented with the response and the
> primary document are tested and the external resources from the
> authentication page are added to the total. On reflection, I think we should
> think again about this behaviour before we go to the next last call of the
> mobileOK doc. And not worry about it in the code for now. (Famous last
> words)
>
>
>
> I've just checked in some updates with a couple of TODOs in the relevant
> places, I hope.
>
>
>
> I'll also update the moki example doc with the suggestions I made. And
> while I am about it I will generate a schema for moki. It's about time.
>
>
>
> Hope this helps. Oh and these are just my suggestions, you or anyone else
> may have better ones.
>
>
>
> Jo
>
>
>   ------------------------------
>
> *From:* public-mobileok-checker-request@w3.org [mailto:
> public-mobileok-checker-request@w3.org] *On Behalf Of *Laura Holmes
> *Sent:* 24 July 2007 23:54
> *To:* public-mobileok-checker
> *Subject:* Proposed changes to Moki - External Resources Test
>
>
>
> Hi all,
> I just wanted to run some changes by you all and get some feed back.
> Currently, I'm working on the ExternalResourcesTest and am running into
> conditions that haven't been accounted for in the existing code. These
> conditions include:
>
> 1) counting references contained in objects that are not jpeg or gif:
> there are many other image types and other types of objects (such as
> applications or audio) that may be included on a page. I'm assuming that we
> want to include these references even if they can't be rendered on a mobile
> phone due to a comment made regarding nested objects: "For nested objectelements, count only the number of objects that need to be assessed before
> content matching the request header defined in *2.3.2 HTTP Request*<http://www.w3.org/TR/mobileOK-basic10-tests/#http_request>is found." So, we want to assess other content types other than jpeg and gif
> when counting external resources.
>
> 2) keeping track of unique references to resources that are other than
> jpeg or gif:
> If two references are made in the primary document to the same image, it
> is only counted once, but if we reference the same image in css, we
> currently don't have a way of tracking this.
>
> 3) references contained in nested objects are counted regardless of
> whether or not the reference is actually reached:
> We only identify object nodes by name, not in serial order.
>
> Here are my proposed changes I want to make, which would entail changing
> the shape of the moki doc a bit:
>
> We create an ArrayList of URIs that is maintained throughout the entirety
> of the parsing process. When a reference to a resource is encountered, we
> check to see if the list already contains that URI. This list will contain a
> list of all the resources contained in both the primary doc and css files.
> At the end of the parsing process, we can add an additional node to any
> location in the moki that states the length of the list ( i.e. how many
> unique resources were encountered). I propose adding this as it's own node
> under moki, as it spans information in the primary doc, images, and css.
> Because we only want to record the number of unique references, I can't see
> any other way to pull it from the moki document using xsl. I'm open to any
> other suggestions.
>
> As to the nested object problem, I'm at a loss for solutions given our
> current implementation of the DOM. Suggestions?
>
> Thanks for your input in advance,
> Laura
>
>
>
Received on Thursday, 26 July 2007 23:09:19 UTC