Re: Lars's comments on the BP document (was: BP document is FROZEN pending vote to release next WD)

Hi Lars,

regarding 2): 

You are correct about the nesting of sitemaps and also that the current "will not work for larger datasets" is oversimplifying things. However, while your proposed text is correct, I think we should add a bit more context and explanation to guide data providers. If a dataset contains millions of spatial things (e.g. many building, address or cadastral parcel datasets), generating and maintaining the sitemaps is at the very least quite complex and typically resource intensive, also considering that the dataset will see frequent changes (although most of the spatial things rarely change). Basically the sitemaps contain a register of all spatial things, datasets, etc. on a site and using standard sitemap builder tools will often not work, i.e. a custom approach is required. At least this was our experience when we looked at it. 

If others have found a way to make it work for such cases, that would indeed be a good example. Also, it would be good to have some practical experience, if such sitemap structures with millions of entries (siginificantly) help getting such larger sites indexed.

Jeremy, Linda, maybe something to discuss in the BP call next week?

Best regards,
Clemens

 
> On 8 Feb 2017, at 11:43, Svensson, Lars <L.Svensson@dnb.de> wrote:
> 
> All,
> 
> On Monday, February 06, 2017 12:01 PM, Jeremy Tandy [mailto:jeremy.tandy@gmail.com] wrote:
> 
>> BP document is FROZEN and ready for people to read/review (see emails in this thread
>> [1] for the change-log).
> 
> First of all: The changes have made the document much easier to read and it's much clearer, what is the proposed outcome when someone wants to implement the BPs. A large bunch of kudos to the editors and contributors! And +1 from me to publish this as a WD.
> 
> And I have some comments.
> 
> 1) What has happened to the references? I cannot find them in the github version... [1]
> 
> 2) BP4 [2] says that "sitemaps currently are limited to several thousands of entries and will not work for larger datasets". IMHO this is not correct. The sitemap specification [3] says that "each Sitemap file that you provide must have no more than 50,000 URLs and must be no larger than 50MB (52,428,800 bytes)". It then goes on to state that you can provide multiple sitemaps and list them in an index file and that "index files may not list more than 50,000 Sitemaps and must be no larger than 50MB (52,428,800 bytes)". You can, however, have multiple index files, too. But even using just one index file means that you can list 50.000^^2 URLs in your sitemaps which should be enough for most applications. For the next iteration, I propose the following text:
> [[
> You may also consider using Sitemaps to direct the Web-crawler; please refer to the sitemap protocol specification [https://www.sitemaps.org/protocol.html] for more information.
> ]]
> 
> 3) BP4 (again) in sec 3 (Decide what spatial relationships to use) says "The geographical, topological and social hierarchy should be described with clear semantics and registered with IANA Link relations." What exactly should be registered with IANA link relations? Is the following meant:
> [[
> The geographical, topological and social hierarchy should be described with clear semantics and use relations registered in the IANA Link relations registry.
> ]]
> or
> [[
> The geographical, topological and social hierarchy should be described with clear semantics. If you use relations not registered with IANA Link relations registry, please register them there.
> ]]
> Put differently: Is the BP to use only relations already registered with IANA, or is the BP to register new relations with IANA?
> 
> The rest of my comments are only editorial:
> 1) In §5 [4] you refer to the Deutsche Nationalbibliothek (yay!). Please don't use the URL you see in the browser. Instead use the CMS-independent one [5].
> 2) There are two places in the document where references start with two square brackets "[[". As a result there are no hyperlinks to the (missing) references section.
> 3) s/converstion/conversion/ (somewhere in sec 8)
> 4) §8 and BP 17 say "Alternatively you can re-project your coordinates to WGS84 Long/Lat using many available tools online." Do we want to point to specific tools?
> 5) §8 says "So we are now at the point where 99.9% of people can stop reading". If we really assume that 99.9% of all readers at that point they will never reach the very interesting information about the surface of the earth moving and the impact of that on self-driving cars that is two paragraphs further down... Maybe we should put the final paragraph as number three in §8.
> 
> [1] https://w3c.github.io/sdw/bp/

> [2] https://w3c.github.io/sdw/bp/#indexable-by-search-engines

> [3] https://www.sitemaps.org/protocol.html#index

> [4] https://w3c.github.io/sdw/bp/#spatial-things-features-and-geometry

> [5] http://www.dnb.de/

> 
> Talk to you later,
> 
> Lars
> 
> 
> *** Lesen. Hören. Wissen. Deutsche Nationalbibliothek *** 
> -- 
> Dr. Lars G. Svensson
> Deutsche Nationalbibliothek
> Informationsinfrastruktur
> Adickesallee 1
> 60322 Frankfurt am Main
> Telefon: +49 69 1525-1752
> Telefax: +49 69 1525-1799
> mailto:l.svensson@dnb.de 
> http://www.dnb.de

> 
> 
> 

Received on Wednesday, 8 February 2017 11:42:16 UTC