W3C home > Mailing lists > Public > public-gld-wg@w3.org > November 2013

Re: ACTION-155 Review domains and ranges as implied/stated by the spec and encoded in the schema

From: Phil Archer <phila@w3.org>
Date: Fri, 22 Nov 2013 16:38:45 +0000
Message-ID: <528F8895.1060406@w3.org>
To: Dave Reynolds <dave.e.reynolds@gmail.com>, public-gld-wg@w3.org
Thanks everyone, that's

a) helpful
b) clear

I will act on that in the schema in the coming days.

Phil.

On 22/11/2013 15:59, Dave Reynolds wrote:
> Thanks to Phil for the helpful review.
>
> +1 to Richard's comments.
>
> While it's true that specifically :theme, :keyword and :contactPoint
> could be made general purpose concepts, dcat doesn't seem like the right
> place to do that.
>
> Dave
>
> On 22/11/13 15:39, Richard Cyganiak wrote:
>> Phil,
>>
>> 1. I object to the current situation where we have properties with
>> implicitly stated domains in prose definitions and undefined domains
>> in the RDFS.
>>
>> If the definition says “The XXX of the dataset”, then that’s an
>> implicit domain declaration. In that case, either the explicit RDFS
>> definition has to agree with the prose text (in other words, formally
>> add a domain of dcat:Dataset), or the definition has to be changed (in
>> other words, change the definition to “The XXX of the
>> resource/entity/whatever”).
>>
>> I will accept either fix, but I see no way to intellectually justify
>> semantic doublespeak.
>>
>>
>> 2. Even though I will accept either way of fixing the situation, I
>> would like to put the following on the record, so that I can point to
>> it when people will inevitably complain about DCAT’s lack of domain
>> declarations in the future:
>>
>> You want to leave domains in DCAT undeclared because “the properties
>> might be useful in other contexts”. This reasoning seems flawed to me.
>> Vocabularies have scope and purpose. The scope of DCAT is data
>> catalogs. Hence the name: “Data Catalog Vocabulary”. The problem we
>> need to solve is representing data catalogs as RDF, not fixing
>> perceived omissions in SKOS. That would be a different WG with a
>> different charter and different skills and backgrounds present at the
>> table.
>>
>> If I was, let’s say, managing metadata about medieval paintings and
>> would like to add index terms to the metadata, the suggestion that
>> there’s a handy little property in the “data catalog vocabulary” for
>> that purpose does more harm than good. People who do metadata for
>> medieval paintings should not be expected to look at vocabularies for
>> data catalogs. That just makes their life harder.
>>
>> If you disagree with this, then explain to me why DCAT doesn’t re-use
>> the mpv:theme property, which would be perfectly suitable here if we
>> only consider its formal RDFS definition. MPV is the Medieval
>> Paintings Vocabulary [1].
>>
>> Best,
>> Richard
>>
>>
>> [1] It’s fictional, but let’s assume for the sake of argument that it
>> exists, is well-designed, well-documented, well-maintained,
>> well-established in the relevant community, and you’ve never heard of
>> it before.
>>
>>
>> On 22 Nov 2013, at 11:51, Phil Archer <phila@w3.org> wrote:
>>
>>> I took an action on yesterday's call  to review domain and range
>>> statements for DCAT properties. This was sparked by a comment by Luke
>>> Blaney [1] who said "... I found the inclusion of rdfs:domain on
>>> Properties quite inconsistent.  In my view, all rdfs:Properties
>>> should have rdfs:domain and rdfs:range specified."
>>>
>>> Let's see.
>>>
>>> dcat:theme
>>> ==========
>>> is defined as a sub property of dcterms:subject and has a range of
>>> skos:Concept. The comment is "The main category of the dataset. A
>>> dataset can have multiple themes." and the usage note says: "The set
>>> of skos:Concepts used to categorize the datasets are organized in a
>>> skos:ConceptScheme describing all the categories and their relations
>>> in the catalog."
>>>
>>> So we have an unambiguous range of skos:Concept.
>>>
>>> Domain? It has always seemed odd to me that SKOS doesn't include a
>>> property like this (no doubt there were reasons) - bototm line - this
>>> looks like a property that could be useful for linking a class other
>>> than a dcat:Dataset to a skos:Concept.
>>>
>>> Recommendation: - leave domain undefined.
>>>
>>> dcat:keyword
>>> ============
>>> Comment: "A keyword or tag describing the dataset."
>>> (No usage note)
>>> Range: rdfs:Literal
>>>
>>> Domain is undefined. My immediate thought is that this is a very
>>> useful little property in any number of circumstances within and
>>> outwith a data catalogue and so we should leave the domain undefined.
>>>
>>> *However* the definition clearly says "it's for describing the
>>> dataset" and that will make some people think it's not for them when
>>> describing something like a PDF. Now, we know that's not the case -
>>> we have defined dcat:Dataset as "A collection of data, published or
>>> curated by a single source, and available for access or download in
>>> one or more formats" and we agreed that this is about as close to the
>>> definition of "anything digital" as makes little difference - but the
>>> perception would be that dcat:keyword is not usable outside a
>>> catalogue when actually it is.
>>>
>>> Recommendation: - leave domain undefined.
>>>
>>> dcat:contactPoint
>>> =================
>>> Comment: Links a dataset to relevant contact information which is
>>> provided using VCard.
>>>
>>> The range is defined as v:VCard but domain is undefined.
>>>
>>> VCard has been updated recently [2] with the VCard class being
>>> deprecated in favour of v:Kind - except that old and new are declared
>>> being equivalent classes. That means that we could leave the
>>> definition the same and change the range to v:Kind - I have no strong
>>> feeling either way but tend towards changing it to v:Kind.
>>>
>>> The domain is undefined and I would say that the same arguments apply
>>> here as for dcat:keyword. Defining the domain as dcat:Dataset
>>> actually wouldn't restrict the usage but might *appear* to do so in a
>>> way that is probably unhelpful.
>>>
>>> Recommendation: Update range to v:Kind and leave domain undefined.
>>>
>>> dcat:accessURL, dcat:downloadURL
>>> ================================
>>>
>>> accessURL definition: Could be any kind of URL that gives access to a
>>> distribution of the dataset. E.g. landing page, download, feed URL,
>>> SPARQL endpoint. Use when your catalog does not have information on
>>> which it is or when it is definitely not a download.
>>>
>>> downloadURL definition: This is a direct link to a downloadable file
>>> in a given format. E.g. CSV file or RDF file. The format is described
>>> by the distribution's dc:format and/or dcat:mediaType
>>>
>>> The range is currently defined as rdfs:Resource for both which seems
>>> sensible, but there's no domain defined and I'm struggling to think
>>> of a use case where either would not refer to a dcat:Distribution. In
>>> the absence of that it seems to me that the domain for both
>>> properties should be defined as dcat:Distribution.
>>>
>>> Recommendation: define the domain of dcat:accessURL and
>>> dcat:downloadIRL as dcat:Distribution
>>>
>>> N.B. This is orthogonal to the resolution to adopt Dave's second
>>> suggestion on how to handle Luke's comments an these two properties.
>>>
>>> dcat:byteSize & dcat:mediaType
>>> ==============================
>>>
>>> dcat:byteSize definition The size of a distribution in bytes.
>>> Usage Note: The size in bytes can be approximated when the precise
>>> size is not known. The literal value of dcat:byteSize should by typed
>>> as xsd:decimal
>>>
>>> dcat:mediaType definition This property SHOULD be used when the media
>>> type of the distribution is defined in IANA, otherwise dct:format MAY
>>> be used with different values.
>>>
>>> Both of these have defined ranges but no defined domain. Either could
>>> be used in contexts other than a catalogue and the definition of a
>>> Distribution, i.e. "Represents a specific available form of a
>>> dataset. Each dataset might be available in different forms, these
>>> forms might represent different formats of the dataset or different
>>> endpoints. Examples of distributions include a downloadable CSV file,
>>> an API or an RSS feed" - seems more restrictive than we might wish
>>> for these two.
>>>
>>> Recommendation: leave domain undefined for these two.
>>>
>>> For the sake of completeness, I'll list the properties for which both
>>> domain and range are already specified:
>>>
>>> dcat:themeTaxonomy
>>> dcat:dataset
>>> dcat:record
>>> dcat:distribution
>>> dcat:landingPage
>>>
>>>
>>> [1]
>>> http://lists.w3.org/Archives/Public/public-gld-comments/2013Nov/0017.html
>>>
>>>
>>> [2] http://www.w3.org/TR/2013/WD-vcard-rdf-20130924/
>>>
>>> --
>>>
>>> Phil Archer
>>> W3C eGovernment
>>>
>>> http://philarcher.org
>>> +44 (0)7887 767755
>>> @philarcher1
>>>
>>
>>
>
>
>

-- 

Phil Archer
W3C eGovernment

http://philarcher.org
+44 (0)7887 767755
@philarcher1
Received on Friday, 22 November 2013 16:39:17 UTC

This archive was generated by hypermail 2.3.1 : Friday, 22 November 2013 16:39:18 UTC