Re: Defining website scope using Sitemap protocol

Hi all, 

The maximum Sitemap file size is apparently 10 MB (roughly 50,000 URLs) - so it is common practice for website owners to define large sites using 'sitemap indexes'. The use of 'sitemap indexes' (please, please read http://en.wikipedia.org/wiki/Sitemap_index and http://en.wikipedia.org/wiki/Sitemaps) would I hope answer your first reservation.   

In short, a 'Sitemap index' is an XML file that can be used to reference multiple sitemap files. The XML format of a Sitemap index file is very similar to the XML format of a Sitemap file - as it is an extension to the sitemap protocol. It allows webmasters to include additional information about each sitemap (e.g. when it was last updated).  Again, use of / support for 'sitemap indexes' appears widespread...

If you were then to provide a summary of the sitemap index - in plain language - this could answer your second reservation. This simple summary could even be generated automatically for a user (though xslt stylesheets) as the sitemap index and sitemap files are in xml...

Thoughts / comments

Alistair Garrison

On 28 Nov 2011, at 09:44, Wilco Fiers wrote:

> Hi all,
> 
> I like the idea of using the sitemap protocol. But I have my reservations about using it to define the scope of a website. A few problems come to mind here, I'd be interested to see if we can solve them. First up the sitemap protocol seems to be about identifying specific pages, where as in scope it seems to me that we should find a way to express large collections of pages. We can't exactly define the scope of a site with a million pages by simply listing every URI. So that's where our needs differ from that of the sitemap protocol.
> 
> The second problem I see is about clearity to users. An XML format is a very transparant format, but it isn't exactly user friendly. I think visitors of the website should be able to understand the scope of the conformance claim without having an understanding of XML. So perhaps we could find an alternative way to express the scope that is understandable for users without a tech degree.
> 
> Wilco
> ________________________________________
> Van: Alistair Garrison [alistair.j.garrison@gmail.com]
> Verzonden: zondag 27 november 2011 22:00
> Aan: RichardWarren; Eval TF
> Onderwerp: Re: Defining website scope using Sitemap protocol
> 
> Hi Richard, EVAL TF,
> 
> It was with the exact hope of defining a super clear, unambiguous and verifiable scope that I suggested using the sitemap protocol (or possibly something like it)...  The sitemap protocol is, I believe, an XML based format for clearly stating relevant descriptions of URI's for inclusion in a sitemap (or in fact anything which is a collection of web pages), and it is already apparently widely used and adopted.  To be very clear the sitemap protocol is not in itself a tool, however, output in the sitemap protocol format is widely supported by many crawling / site mapping tools.
> 
> I suppose what I was thinking of was to use the sitemap protocol to allow the owner, the evaluation team and any future evaluators or moderators to define exactly what was included in the evaluation and what was not.  In a manner which they are already familiar with...
> 
> Still, I think, central to the scoping issue is the question "Are we seeking to attest a conformance claim for a web site (or other) made by someone (the developer, the owner, etc...) using WCAG 2.0 Understanding Conformance document  http://www.w3.org/TR/UNDERSTANDING-WCAG20/conformance; or are we seeking to test what we think is good to test (Section 7) then provide a conformance claim for that?"
> 
> If we are seeking to attest a conformance claim made by someone else - Section 7 could be defined in a simple line: "The scope of the evaluation is defined as all urls for which a conformance claim is being made, formatted using the sitemap or other similar protocol".  Otherwise, if we are seeking to test what we think is good to test then provide a conformance claim for that - I would support Richard's Section 7 concept as a reasonable place to start.
> 
> All the best
> 
> Alistair
> 
> On 27 Nov 2011, at 17:25, RichardWarren wrote:
> 
> Dear Kathy and Alistair,
> 
> The purpose of the scope statement in the eventual evaluation should be to help future evaluators or moderators know what has been evaluated. A clear scope statement ensures that :-
> a) The owner, the evaluation team and any future evaluators or moderators know exactly what was included in the evaluation and what was not.
> b) Users of the website can distinguish which parts are compliant and which parts may not be compliant.
> 
> We cannot dictate what tools, if any, are used to help define the scope. Our concern is that the eventual scope statement is clear, unambiguous and verifiable. That is why I suggested the following as an initial draft for section 7
> 
> 
> 7. Procedure to express the Scope of the evaluation
> 
> While the WCAG 2.0 Recommendation focus on webpages, the Evaluation
> Methodology focuses on evaluating the conformance of a website. This means
> that it is important to define what is considered to be part of the website
> and what is not. To be more precise, to define what is part of the coherent
> collection of one or more related web pages that together provide common use
> or functionality. This can include static web pages, dynamically generated
> web pages, and/or web applications.
> 
> A clear scope statement ensures that
> a) The owner, the evaluation team and any future evaluators or moderators
> know exactly what was included in the evaluation and what was not.
> b) Users of the website can distinguish from the URI and any link text which
> parts are compliant and which are not.
> 
> 7.1 Key functionalities
> The primary purpose of the website should be defined. This sets the context
> of the evaluation. It is important that any exceptions (see 7.3 below) do
> not prevent the website from performing this function in a compliant way.
> 
> 7.2 Base URI
> The use of the URI is the clearest way to define the scope of an evaluation.
> The base URI is the most appropriate starting point. This would normally be
> a domain name such as mysite.com<http://mysite.com>, but it could be a subsection such as
> mysite.com/education/<http://mysite.com/education/>. If everything within that domain or subdomain is to
> be included in the evaluation then that single statement is sufficient.
> 
> 7.3 Exceptions
> Where part of a site or sub-site is to be excluded from the evaluation it is
> important to clearly state the relevant descriptions and URI's of sections
> to be excluded or included. The decision regarding whether to specify the
> inclusion or exclusion areas will depend upon the size and detail of the
> part to be excluded or included. In addition to the URI the
> inclusion/exclusion statement should include a text description. For example
> "The application process initiated by the form at
> mysite.com/forms/application/appform.php<http://mysite.com/forms/application/appform.php> is excluded from this evaluation"
> 
> 7.4 Complete Processes
> Whilst it is desirable to test individual components of an application
> during development this approach is not supported by this evaluation
> methodology. If any part of a process is to be excluded from the evaluation
> then the whole process should be excluded.
> 
> 7.5 If the overall evaluation process is to be divided into teams of
> evaluators (e.g. for a large site) then each team will have its' own
> specific scope statement for the relevant task in hand. A single, combined,
> meaningful scope statement must also be prepared to cover the whole
> evaluation.
> 
> Regards
> 
> Richard
> 
> 
> 
> 
> 
> From: Kathy Wahlbin<mailto:kathy@interactiveaccessibility.com>
> Sent: Sunday, November 27, 2011 3:29 PM
> To: 'Alistair Garrison'<mailto:alistair.j.garrison@gmail.com> ; 'Eval TF'<mailto:public-wai-evaltf@w3.org>
> Subject: RE: Defining website scope using Sitemap protocol
> 
> Hi Alistair –
> 
> I think that is a good place to get started and to see the pages on the site but we need to keep in mind that these are only for the areas that webmasters want search engines to crawl.  There may be areas of the site that need to be reviewed that were excluded from this sitemap.xml file.
> 
> Another way to get an idea of scope is to use the functions in the content management system (CMS) to see all the pages on the site.  Within the CMS, the number of templates and common components could also be identified which could help defined scope.
> 
> Regards,
> 
> Kathy
> 
> Phone:  978.443.0798
> Cell:  978.760.0682
> Fax:  978.560.1251
> KathyW@ia11y.com<mailto:KathyW@ia11y.com>
> 
> <image001.jpg><http://www.facebook.com/wahlbin><image002.jpg><http://twitter.com/wahlbin> <image003.jpg><http://www.linkedin.com/in/kwahlbin> <image004.jpg><http://gplus.to/wahlbin>
> 
> NOTICE: This communication may contain privileged or other confidential information. If you are not the intended recipient, please reply to the sender indicating that fact and delete the copy you received. Thank you.
> 
> From: Alistair Garrison [mailto:alistair.j.garrison@gmail.com]
> Sent: Saturday, November 26, 2011 8:06 AM
> To: Eval TF
> Subject: Defining website scope using Sitemap protocol
> 
> Dear all,
> 
> It seems that a good number of website owners / webmasters submit a sitemap to search engines.  The schema of choice for this sitemap (file) is apparently the Sitemap protocol - http://www.sitemaps.org/ (supported by Google, Yahoo!, and Microsoft.)
> 
> What would your thoughts be on using this sitemap file generated for a website, by the website owners / webmaster, to define the scope of a website? Possibly answering the question - How can an evaluator express the scope of a website.
> 
> Would be interested to hear thoughts / comments.
> 
> All the best
> 
> Alistair
> 
> 
> 
> 
> 
> 

Received on Monday, 28 November 2011 10:07:14 UTC