Re: ACTION-155 Review domains and ranges as implied/stated by the spec and encoded in the schema

Thanks to Phil for the helpful review.

+1 to Richard's comments.

While it's true that specifically :theme, :keyword and :contactPoint 
could be made general purpose concepts, dcat doesn't seem like the right 
place to do that.

Dave

On 22/11/13 15:39, Richard Cyganiak wrote:
> Phil,
>
> 1. I object to the current situation where we have properties with implicitly stated domains in prose definitions and undefined domains in the RDFS.
>
> If the definition says “The XXX of the dataset”, then that’s an implicit domain declaration. In that case, either the explicit RDFS definition has to agree with the prose text (in other words, formally add a domain of dcat:Dataset), or the definition has to be changed (in other words, change the definition to “The XXX of the resource/entity/whatever”).
>
> I will accept either fix, but I see no way to intellectually justify semantic doublespeak.
>
>
> 2. Even though I will accept either way of fixing the situation, I would like to put the following on the record, so that I can point to it when people will inevitably complain about DCAT’s lack of domain declarations in the future:
>
> You want to leave domains in DCAT undeclared because “the properties might be useful in other contexts”. This reasoning seems flawed to me. Vocabularies have scope and purpose. The scope of DCAT is data catalogs. Hence the name: “Data Catalog Vocabulary”. The problem we need to solve is representing data catalogs as RDF, not fixing perceived omissions in SKOS. That would be a different WG with a different charter and different skills and backgrounds present at the table.
>
> If I was, let’s say, managing metadata about medieval paintings and would like to add index terms to the metadata, the suggestion that there’s a handy little property in the “data catalog vocabulary” for that purpose does more harm than good. People who do metadata for medieval paintings should not be expected to look at vocabularies for data catalogs. That just makes their life harder.
>
> If you disagree with this, then explain to me why DCAT doesn’t re-use the mpv:theme property, which would be perfectly suitable here if we only consider its formal RDFS definition. MPV is the Medieval Paintings Vocabulary [1].
>
> Best,
> Richard
>
>
> [1] It’s fictional, but let’s assume for the sake of argument that it exists, is well-designed, well-documented, well-maintained, well-established in the relevant community, and you’ve never heard of it before.
>
>
> On 22 Nov 2013, at 11:51, Phil Archer <phila@w3.org> wrote:
>
>> I took an action on yesterday's call  to review domain and range statements for DCAT properties. This was sparked by a comment by Luke Blaney [1] who said "... I found the inclusion of rdfs:domain on Properties quite inconsistent.  In my view, all rdfs:Properties should have rdfs:domain and rdfs:range specified."
>>
>> Let's see.
>>
>> dcat:theme
>> ==========
>> is defined as a sub property of dcterms:subject and has a range of skos:Concept. The comment is "The main category of the dataset. A dataset can have multiple themes." and the usage note says: "The set of skos:Concepts used to categorize the datasets are organized in a skos:ConceptScheme describing all the categories and their relations in the catalog."
>>
>> So we have an unambiguous range of skos:Concept.
>>
>> Domain? It has always seemed odd to me that SKOS doesn't include a property like this (no doubt there were reasons) - bototm line - this looks like a property that could be useful for linking a class other than a dcat:Dataset to a skos:Concept.
>>
>> Recommendation: - leave domain undefined.
>>
>> dcat:keyword
>> ============
>> Comment: "A keyword or tag describing the dataset."
>> (No usage note)
>> Range: rdfs:Literal
>>
>> Domain is undefined. My immediate thought is that this is a very useful little property in any number of circumstances within and outwith a data catalogue and so we should leave the domain undefined.
>>
>> *However* the definition clearly says "it's for describing the dataset" and that will make some people think it's not for them when describing something like a PDF. Now, we know that's not the case - we have defined dcat:Dataset as "A collection of data, published or curated by a single source, and available for access or download in one or more formats" and we agreed that this is about as close to the definition of "anything digital" as makes little difference - but the perception would be that dcat:keyword is not usable outside a catalogue when actually it is.
>>
>> Recommendation: - leave domain undefined.
>>
>> dcat:contactPoint
>> =================
>> Comment: Links a dataset to relevant contact information which is provided using VCard.
>>
>> The range is defined as v:VCard but domain is undefined.
>>
>> VCard has been updated recently [2] with the VCard class being deprecated in favour of v:Kind - except that old and new are declared being equivalent classes. That means that we could leave the definition the same and change the range to v:Kind - I have no strong feeling either way but tend towards changing it to v:Kind.
>>
>> The domain is undefined and I would say that the same arguments apply here as for dcat:keyword. Defining the domain as dcat:Dataset actually wouldn't restrict the usage but might *appear* to do so in a way that is probably unhelpful.
>>
>> Recommendation: Update range to v:Kind and leave domain undefined.
>>
>> dcat:accessURL, dcat:downloadURL
>> ================================
>>
>> accessURL definition: Could be any kind of URL that gives access to a distribution of the dataset. E.g. landing page, download, feed URL, SPARQL endpoint. Use when your catalog does not have information on which it is or when it is definitely not a download.
>>
>> downloadURL definition: This is a direct link to a downloadable file in a given format. E.g. CSV file or RDF file. The format is described by the distribution's dc:format and/or dcat:mediaType
>>
>> The range is currently defined as rdfs:Resource for both which seems sensible, but there's no domain defined and I'm struggling to think of a use case where either would not refer to a dcat:Distribution. In the absence of that it seems to me that the domain for both properties should be defined as dcat:Distribution.
>>
>> Recommendation: define the domain of dcat:accessURL and dcat:downloadIRL as dcat:Distribution
>>
>> N.B. This is orthogonal to the resolution to adopt Dave's second suggestion on how to handle Luke's comments an these two properties.
>>
>> dcat:byteSize & dcat:mediaType
>> ==============================
>>
>> dcat:byteSize definition The size of a distribution in bytes.
>> Usage Note: The size in bytes can be approximated when the precise size is not known. The literal value of dcat:byteSize should by typed as xsd:decimal
>>
>> dcat:mediaType definition This property SHOULD be used when the media type of the distribution is defined in IANA, otherwise dct:format MAY be used with different values.
>>
>> Both of these have defined ranges but no defined domain. Either could be used in contexts other than a catalogue and the definition of a Distribution, i.e. "Represents a specific available form of a dataset. Each dataset might be available in different forms, these forms might represent different formats of the dataset or different endpoints. Examples of distributions include a downloadable CSV file, an API or an RSS feed" - seems more restrictive than we might wish for these two.
>>
>> Recommendation: leave domain undefined for these two.
>>
>> For the sake of completeness, I'll list the properties for which both domain and range are already specified:
>>
>> dcat:themeTaxonomy
>> dcat:dataset
>> dcat:record
>> dcat:distribution
>> dcat:landingPage
>>
>>
>> [1] http://lists.w3.org/Archives/Public/public-gld-comments/2013Nov/0017.html
>>
>> [2] http://www.w3.org/TR/2013/WD-vcard-rdf-20130924/
>>
>> --
>>
>> Phil Archer
>> W3C eGovernment
>>
>> http://philarcher.org
>> +44 (0)7887 767755
>> @philarcher1
>>
>
>

Received on Friday, 22 November 2013 16:00:17 UTC