- From: Svensson, Lars <L.Svensson@dnb.de>
- Date: Thu, 9 Feb 2017 07:59:11 +0000
- To: Clemens Portele <portele@interactive-instruments.de>
- CC: Jeremy Tandy <jeremy.tandy@gmail.com>, SDW WG Public List <public-sdw-wg@w3.org>
Good Morning Clemens, On Wednesday, February 08, 2017 12:42 PM, Clemens Portele [mailto:portele@interactive-instruments.de] wrote: > You are correct about the nesting of sitemaps and also that the current "will not work > for larger datasets" is oversimplifying things. However, while your proposed text is > correct, I think we should add a bit more context and explanation to guide data > providers. Yes, sitemaps look simple but their creation can indeed be quite complex... > If a dataset contains millions of spatial things (e.g. many building, address > or cadastral parcel datasets), generating and maintaining the sitemaps is at the very > least quite complex and typically resource intensive, also considering that the dataset > will see frequent changes (although most of the spatial things rarely change). Basically > the sitemaps contain a register of all spatial things, datasets, etc. on a site and using > standard sitemap builder tools will often not work, i.e. a custom approach is required. > At least this was our experience when we looked at it. That is my experience, too. We had a custom sitemap generator for a subset of our data that could only generate one sitemap file, so when the subset grew the search engines simply stopped crawling it... There will be a re-implementation sometime this year, I hope. > If others have found a way to make it work for such cases, that would indeed be a > good example. Also, it would be good to have some practical experience, if such > sitemap structures with millions of entries (siginificantly) help getting such larger sites > indexed. We'll have two fairly large sitemaps: one with ~10M, one with about ~27M URLs in them, so while I can't provide any experience now, I hope that I can in about six months. Best, Lars > > On 8 Feb 2017, at 11:43, Svensson, Lars <L.Svensson@dnb.de> wrote: > > > > All, > > > > On Monday, February 06, 2017 12:01 PM, Jeremy Tandy > [mailto:jeremy.tandy@gmail.com] wrote: > > > >> BP document is FROZEN and ready for people to read/review (see emails in this > thread > >> [1] for the change-log). > > > > First of all: The changes have made the document much easier to read and it's > much clearer, what is the proposed outcome when someone wants to implement the > BPs. A large bunch of kudos to the editors and contributors! And +1 from me to > publish this as a WD. > > > > And I have some comments. > > > > 1) What has happened to the references? I cannot find them in the github version... > [1] > > > > 2) BP4 [2] says that "sitemaps currently are limited to several thousands of entries > and will not work for larger datasets". IMHO this is not correct. The sitemap > specification [3] says that "each Sitemap file that you provide must have no more than > 50,000 URLs and must be no larger than 50MB (52,428,800 bytes)". It then goes on to > state that you can provide multiple sitemaps and list them in an index file and that > "index files may not list more than 50,000 Sitemaps and must be no larger than 50MB > (52,428,800 bytes)". You can, however, have multiple index files, too. But even using > just one index file means that you can list 50.000^^2 URLs in your sitemaps which > should be enough for most applications. For the next iteration, I propose the following > text: > > [[ > > You may also consider using Sitemaps to direct the Web-crawler; please refer to the > sitemap protocol specification [https://www.sitemaps.org/protocol.html] for more > information. > > ]] > > > > 3) BP4 (again) in sec 3 (Decide what spatial relationships to use) says "The > geographical, topological and social hierarchy should be described with clear semantics > and registered with IANA Link relations." What exactly should be registered with IANA > link relations? Is the following meant: > > [[ > > The geographical, topological and social hierarchy should be described with clear > semantics and use relations registered in the IANA Link relations registry. > > ]] > > or > > [[ > > The geographical, topological and social hierarchy should be described with clear > semantics. If you use relations not registered with IANA Link relations registry, please > register them there. > > ]] > > Put differently: Is the BP to use only relations already registered with IANA, or is the > BP to register new relations with IANA? > > > > The rest of my comments are only editorial: > > 1) In §5 [4] you refer to the Deutsche Nationalbibliothek (yay!). Please don't use the > URL you see in the browser. Instead use the CMS-independent one [5]. > > 2) There are two places in the document where references start with two square > brackets "[[". As a result there are no hyperlinks to the (missing) references section. > > 3) s/converstion/conversion/ (somewhere in sec 8) > > 4) §8 and BP 17 say "Alternatively you can re-project your coordinates to WGS84 > Long/Lat using many available tools online." Do we want to point to specific tools? > > 5) §8 says "So we are now at the point where 99.9% of people can stop reading". If > we really assume that 99.9% of all readers at that point they will never reach the very > interesting information about the surface of the earth moving and the impact of that > on self-driving cars that is two paragraphs further down... Maybe we should put the > final paragraph as number three in §8. > > > > [1] https://w3c.github.io/sdw/bp/ > > [2] https://w3c.github.io/sdw/bp/#indexable-by-search-engines > > [3] https://www.sitemaps.org/protocol.html#index > > [4] https://w3c.github.io/sdw/bp/#spatial-things-features-and-geometry > > [5] http://www.dnb.de/ > > > > Talk to you later, > > > > Lars > > > > > > *** Lesen. Hören. Wissen. Deutsche Nationalbibliothek *** > > -- > > Dr. Lars G. Svensson > > Deutsche Nationalbibliothek > > Informationsinfrastruktur > > Adickesallee 1 > > 60322 Frankfurt am Main > > Telefon: +49 69 1525-1752 > > Telefax: +49 69 1525-1799 > > mailto:l.svensson@dnb.de > > http://www.dnb.de > > > > > >
Received on Thursday, 9 February 2017 07:59:51 UTC