Re: Action Item: WCL URI matching

Carlos Iglesias wrote:

>>>- LESS THAN A PAGE
>>>
>>>A snippet of code:
>>>
>>>e.g. A claim about the following snippet
>>>
>>>...
>>><h3>Snippet section</h3>
>>><p class="foo-paragraph">Some dummy test and an <img 
>>>class="beauty-image" alt="Beautiful image" / ></p> ...
>>>
>>>which can be found at http://www.example.org/foo.xhtml
>>>
>>>--> Apparently this is NOT COVERED in any of the WCL URI 
>>
>>matching requirements, but as discussed before within the 
>>group, it may not be necessary since we have a snippet pointer.
>>
>>For XML resources this could be done using xpointer in the 
>>URI fragment.
> 
> Unfortunately the model must be applicable to HTML and maybe other non
> XML resources.

Yep. As you said, use the snippet pointer for non-XML resources.

>>>- GROUPS OF PAGES
>>>
>>>  * A domain (All the resources within the specified domain)
>>>
>>>e.g. A claim about the http://www.helloworld.net/ domain
>>
>>What is the domain here? www.helloworld.net? helloworld.net? net?
> 
> If I do a claim about the www.helloworld.net domain, this is the domain.
> If I do a claim about the helloworld.net domain, this is the domain.
> If I do a claim about the .net domain, this is the domain.

SCNR :-)

>>>--> It is COVERED by 3 [Match a (sub-)domain and all sub-domains, 
>>>--> except for those sub-domain patterns given by a list.]
>>>
>>>A potential issue at this point is that the XG has decided 
>>
>>[4] to adopt RDF-CL [5] in which subdomains of given host are 
>>always in scope [6], but as noted at the group minutes they 
>>will carry out whatever changes needed to make RDF-CL meet 
>>their requirements.
>>
>>You mean, this is an issue with exclusions?
> 
> I mean this could be a potential issue with sub-domain exclusions. 

I see

>>>Additionally the XG has requirements on scheme, port, query 
>>
>>and fragment patterns, but as CarlosV noted in the past there 
>>are other options, frequently used by crawler tools (e.g. 
>>path depth limits), that are not covered with the current 
>>requirements.
>>
>>AFAIR, there should be a way to compress statements in an 
>>EARL report, so that not every resource/web unit has to be 
>>listed explicitly. This compression is most likely not 
>>lossless. I very much doubt that we can create a lossless 
>>compression. There are too many parameters. And what would be 
>>the benefit? If we wanted to know whether a specific resource 
>>is part of the subject of the compressed statement, we would 
>>have to run a crawler with all the specified parameters first? Hmm
> 
> AFAIR, this has something to do with semantics (logical groups of
> resources) and not only compression.

I can group resources by listing them explicitly, or I can compress this 
by using some shorter form (regexp or WCL/matching).

-- 
Johannes Koch - Competence Center BIKA
Fraunhofer Institute for Applied Information Technology (FIT.LIFE)
Schloss Birlinghoven, D-53757 Sankt Augustin, Germany
Phone: +49-2241-142628

Received on Wednesday, 19 July 2006 12:30:41 UTC