- From: Maik Riechert <maik.riechert@arcor.de>
- Date: Tue, 6 Oct 2015 16:22:18 +0100
- To: public-ceo-ld@w3.org
Hi all, to get things rolling, I'd like to copy some text that I wrote a few days ago for MELODIES. Please read it and say to which parts you agree or disagree. Disclaimer: This is just my own (and Jon's) opinion, so don't take this as a final thing! Here it comes: # A vision of the future Metadata for coverages can be given on many levels and different forms. A useful way to concentrate on the essentials is to imagine a future global search engine for geospatial datasets similar to simplicity and usability of Google. What would such a search engine crawl? Which details would it need? Which users would use it? What are their needs? How domain-specific does it have to be? How does it crawl datasets? Which formats would it look at? Which would it very likely ignore? How do typical queries from users look like? ... From this vision, a common denominator must emerge in terms of metadata. Ultimately, one extensible format should exist from which search engines can easily choose to which level they want to go down. They could start with indexing basic metadata like title, keywords, producer, spatial region, time range; and over time they could extent to more complex details if they are provided: technical data source (which sensor), data resolution, measured properties (like temperature) etc. This first vision stops at the metadata level and does not go further. Since actual measurement data can be expressed in many ways and different domains require different formats it is not feasible to select a winner just now. If a search engine still wants to dig into data and provide live previews or other things, then it has to support custom formats. Whichever actual formats for data are used must be clearly identified in the metadata and a link to the data established if possible. Since not all datasets are public this can end at a website for ordering the data. No matter which links are provided they should always be unambiguous such that a search engine would know what they mean. # What is a coverage? What is a dataset? To come closer to the vision, we first need to be clear what a coverage and what a dataset is. The definition of coverages is quite clear, citing Wikipedia: "A coverage is represented by its "domain" (the universe of extent) and a number of range of values representing the coverage's value at each defined location." This includes metadata of the range values, that is, what the values represent, like wind speed or temperature. On the other hand, there is no clear definition of what a dataset is. DCAT describes a dataset as a "collection of data, published or curated by a single agent, and available for access or download in one or more formats". Clearly, a coverage fits the definition of a DCAT dataset. But what if a provider groups several coverages as one "dataset" (e.g. many "granules" make up a global dataset)? Is this itself then again a DCAT dataset which consists of those coverage datasets? These are questions to be asked and answered. From the point of view of a search engine it would make a lot of sense if it could understand such a parent-child relationship. If it didn't understand that and would index both the parent and all its children datasets equally, then it may cause confusion when users search for datasets by geographical region etc. If a search returns a particular coverage described as DCAT dataset, then from an end user point of view it would be very helpful to discover that this coverage is part of a parent dataset, from which the user could explore sibling coverages. Hence, anything that "contains" data, even if nested within some hierarchy where only the last layer represents the actual coverages, is a dataset. Handling this generic concept is only possible if rich metadata can be given about the contents and relationships of such datasets. # Subsets of Coverages It is a reoccuring use case to be able to reference subsets of coverages with a unique URI. An example is to annotate a region of a coverage (let's say it is a global grid) with some information, which can even be user feedback. In terms of metadata, what would live under such subset URLs if they are dereferencable (which they should be)? Since a subset is again a dataset, it only makes sense to have similar DCAT metadata for that subset, apropriately adjusting the information relevant to the subset (bounding box etc.). And again, to establish the connection to the full "parent" coverage, there must be some link to it. That's it. By the way, I don't usually send such long emails, expect shorter ones soon But I felt this was kind of important for us to clarify, just so that we have the same thing in our heads. So please comment on that and agree/disagree on parts or all of it. Cheers Maik
Received on Tuesday, 6 October 2015 15:22:49 UTC