Re: Organizing Use Cases for F2F / Agenda proposal / DCAT Usage from Jaroslav Pullmann on 2017-07-14 (public-dxwg-wg@w3.org from July 2017)

From: Jaroslav Pullmann <jaroslav.pullmann@fit.fraunhofer.de>
Date: Fri, 14 Jul 2017 11:02:52 +0200
To: Dataset Exchange Working Group <public-dxwg-wg@w3.org>
Message-ID: <998e0fe4-8e0b-d351-6f60-a43d5860cdc3@fit.fraunhofer.de>
   Dear Riccardo, dear Makx, and all,

    even a brief evaluation of DCAT annotation habits and portal queries
   bears inspiration for new use cases, e.g. to clarify the relationship
   of Catalog(s) to portals they are deployed in and among the Catalogs.

   I started looking at the sources provided by Makx [1], which are mostly
   governmental initiatives. Are you aware of further, e.g. commercial or
   scientific data portals based on DCAT and exposing a SPARQL-endpoint?

     Thank you!
    Jaro

   [1] https://ec.europa.eu/isa2/solutions/dcat-application-profile-data-portals-europe_en
   
   
   
On 13.07.2017 12:26, Riccardo Albertoni wrote:
> Dear Makx,  and all,
> 
> The following  SPARQL  query should help in the analysis you are suggesting ( see http://yasgui.org/short/Hy8HtaVS-  for a nicer view). It counts the use of properties for DCAT Entities.
> 
> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
> PREFIX dcat: <http://www.w3.org/ns/dcat#>
> PREFIX dct: <http://purl.org/dc/terms/>
> 
> SELECT  ?prop (Count( distinct ?sub) as ?numOfDCATEntityUsingTheProperty)  WHERE {
>    {
>    ?sub a  dcat:Dataset;
>    ?prop ?obj.
>    } union
>    {
>    ?sub a dcat:Distribution;
>    ?prop ?obj.
>    } union
>    {
>    ?sub a  dcat:Catalog;
>    ?prop ?obj.
>    }Union
>    {
>    ?sub a dcat:CatalogRecord;
>    ?prop ?obj.
>    }
> } Group by  ?prop
> 
> For example,
> If I try the above query on the https://www.europeandataportal.eu/sparql-manager/en/, it returns  the following results
> 
> prop,numOfDCATEntityUsingTheProperty
> http://purl.org/dc/terms/conformsTo,75482
> http://purl.org/dc/terms/provenance,414727
> http://www.w3.org/ns/adms#identifier,749805
> http://www.w3.org/ns/dcat#themeTaxonomy,79
> http://xmlns.com/foaf/0.1/primaryTopic,749805
> http://purl.org/dc/terms/temporal,99731
> http://www.w3.org/ns/adms#status,749805
> http://www.w3.org/ns/dcat#byteSize,13568
> http://spdx.org/rdf/terms#checksum,6507
> http://purl.org/dc/terms/publisher,309489
> http://purl.org/dc/terms/language,418733
> http://www.w3.org/ns/dcat#theme,453902
> http://purl.org/dc/terms/relation,1
> http://www.w3.org/1999/02/22-rdf-syntax-ns#type,2418170
> http://purl.org/dc/terms/modified,1367877
> http://purl.org/dc/terms/format,105731
> http://purl.org/dc/terms/issued,1143662
> http://xmlns.com/foaf/0.1/homepage,18
> http://xmlns.com/foaf/0.1/page,74839
> http://www.w3.org/ns/dcat#mediaType,240863
> http://www.w3.org/ns/dcat#accessURL,909707
> http://purl.org/dc/terms/accrualPeriodicity,342936
> http://www.w3.org/ns/dcat#distribution,367600
> http://purl.org/dc/terms/spatial,527782
> http://purl.org/dc/terms/rights,67717
> http://purl.org/dc/terms/description,1256889
> http://www.w3.org/ns/dcat#contactPoint,332807
> http://www.w3.org/ns/dcat#keyword,701558
> http://www.w3.org/ns/dcat#landingPage,281865
> http://www.w3.org/ns/dcat#downloadURL,239767
> http://purl.org/dc/terms/title,1457491
> http://www.w3.org/ns/dcat#record,77
> http://purl.org/dc/terms/license,253977
> http://purl.org/dc/terms/identifier,749707
> http://www.w3.org/ns/dcat#dataset,75
> Cheers,
> Riccardo
> 
> On 13 July 2017 at 11:11, Makx Dekkers <mail@makxdekkers.com <mailto:mail@makxdekkers.com>> wrote:
> 
>     Jaroslav, Karen,
> 
>     It is indeed good to consider actual usage of DCAT in real life.
> 
>      >From my point of view, it would be really interesting to have a statistical analysis on the use of the various elements of DCAT. There are large collections of DCAT data available for analysis -- for example https://www.europeandataportal.eu/ <https://www.europeandataportal.eu/> makes available descriptions of three-quarters of a million datasets in DCAT through their SPARQL endpoint. It's just a matter of someone having the time and resources to do such an analysis.
> 
>     For a list of countries and organisations in Europe that have DCAT profiles already in operation, see https://ec.europa.eu/isa2/solutions/dcat-application-profile-data-portals-europe_en <https://ec.europa.eu/isa2/solutions/dcat-application-profile-data-portals-europe_en>.
> 
>     Makx.
> 
> 
>     -----Original Message-----
>     From: Karen Coyle [mailto:kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>]
>     Sent: 13 July 2017 00:45
>     To: Jaroslav Pullmann <jaroslav.pullmann@fit.fraunhofer.de <mailto:jaroslav.pullmann@fit.fraunhofer.de>>
>     Cc: public-dxwg-wg@w3.org <mailto:public-dxwg-wg@w3.org>
>     Subject: Re: Organizing Use Cases for F2F / Agenda proposal
> 
>     Thanks, Jaroslav,
> 
>     You may have seen that Caroline and I have put together an agenda for the meeting. Unfortunately I begin traveling tomorrow so she and I will probably not meet again before Monday, but we will compare your list with ours and we can always make adjustments as we discuss during the F2F. I think that the categorizing of the use cases into groups is very useful, and seeing them from different points of view also helps.
> 
>     As for the introductory part that you propose, some of that may be best suited to the DCAT 1.1 sub-group, which hopefully will begin meeting soon. It does seem logical that a first step would be to survey the uses of DCAT and the DCAT-related APs that exist today. I would definitely encourage the sub-group to get started on such an analysis as soon as possible. That group, of course, can also come back with new use cases and requirements to propose to the "plenary" working group.
> 
>     Let's try to use some of our time (including coffee breaks) to get some action going in the sub-groups that are forming.
> 
>     kc
>     p.s. From my Dublin Core perspective I totally agree with your "simpler is better" hypothesis. For DC I did some similar work as you propose:
>     * https://kcoyle.blogspot.com/2013/10/who-uses-dublin-core-original-15.html <https://kcoyle.blogspot.com/2013/10/who-uses-dublin-core-original-15.html>
>     * https://kcoyle.blogspot.com/2013/10/who-uses-dublin-core-dcterms.html <https://kcoyle.blogspot.com/2013/10/who-uses-dublin-core-dcterms.html>
>     * https://kcoyle.blogspot.com/2013/10/dublin-core-usage-in-lod.html <https://kcoyle.blogspot.com/2013/10/dublin-core-usage-in-lod.html>
> 
>     That last one seems to prove your point.
> 
>     On 7/12/17 3:17 PM, Jaroslav Pullmann wrote:
>      >
>      >   Dear Karen, dear all
>      >
>      >   within an introductory part we may want to summarize the Status quo. Some questions to look at in a session 1) were:
>      >
>      >      - How is DCAT currently used?
>      >      - Do data portals, catalogs etc. use DCAT core model or rely on a specific application profile?
>      >      - What is the average complexity of DCAT Dataset descriptions?
>      >      - Are some partitions of the vocabulary seldomly used or not at all (e.g. CatalogRecord, theme)? - This will inform about deprecation candidates
>      >      - What user/service interfaces and search options are available in order to exploit the DCAT potential?
>      >
>      >   Would the above support a hypothesis, that the main and obvious benefit of DCAT is its simplicity
>      >   and brevity where, in the end, only a subset of the vocabulary is used? If this is the case, does
>      >   it imply to be conservative in extending the core model and add only few, well argumented properties
>      >   while substantially extending guidance on using the existing ones?
>      >
>      >  Afterwards I suggest to sort and discuss the UCs according to a
>      > user/task-oriented perspective, i.e. how would the various
>      > stakeholders make use of the DCAT concepts in order to perform their
>      > task (describe/publish/find/retrieve a data set etc.)
>      >
>      >  *Catalog*
>      >
>      >    Some motivating questions:
>      >
>      >     - How are the Catalogs found at all, e.g. using Web search engines that evaluate RDFa metadata?
>      >     - How does DCAT support a data publisher to identify an appropriate (specialized) Catalog to publish her data?
>      >     - E.g. are Concept schemes available/used as a means of annotation and browsing?
>      >
>      >    Session 2)
>      >
>      >    - ID40: Discoverability by mainstream search engines
>      >    - ID35: Datasets and catalogues
>      >    - ID25: Distribution and synchronization of catalog information
>      >
>      >    - Identification of missing UC with regard to Catalog
>      >
>      >  *DataSet*
>      >
>      >    Some motivating questions:
>      >
>      >     - Is there a general guidance on (minimal) amount of detail for describing a data set?
>      >     - What are the search strategies to look for a data set?
>      >     - How does temporal/spatial, keyword or theme annotation help data consumers to localise relevant data sets given a particular information need?
>      >
>      >   Session 3 +4)
>      >
>      >    Annotation properties, description and documentation
>      >    - ID33: Summarization/Characterization of datasets
>      >
>      >   Dataset concept analysis
>      >   - ID8: Scope or type of dataset with a DCAT description
>      >   - ID20: Modelling resources different from datasets
>      >   - ID36: Cross-vocabulary relationships (comparision of Dataset
>      > concepts)
>      >
>      >   Dataset (& Distribution) versioning
>      >   - ID6: Dataset Versioning Information
>      >
>      >   Dataset co-relation and organization
>      >   - ID32:Relationships between Datasets
>      >
>      >   Dataset semantics
>      >   - ID7: Support associating fine-grained semantics for datasets and
>      > resources within a dataset
>      >
>      >   Session 5 + 6)
>      >
>      >   Data quality, precision and accuracy
>      >   - ID15: Modeling data precision and accuracy
>      >   - ID16: Modeling conformance test results on data quality
>      >   - ID14: Data quality modeling patterns
>      >   - ID23: Data Quality Vocabulary
>      >
>      >   Scope and context
>      >   - ID28: Modeling reference systems
>      >   - ID29: Modeling spatial coverage
>      >   - ID38: Time-related aspects
>      >   - ID27: Modeling temporal coverage
>      >
>      >   Provenance, actors and obligations
>      >    - ID12: Modeling data lineage
>      >    - ID13: Modeling agent roles
>      >    - ID31: Modeling funding sources
>      >
>      >    Usage control
>      >    - ID17: Data access restrictions
>      >
>      >    - Identification of missing UC with regard to Dataset
>      >
>      >  Tuesday
>      >
>      >   Session 7)
>      >
>      >   *Distribution*
>      >
>      >    Some motivating questions:
>      >    - Are there distribution patterns evident from 1)?
>      >    - How is an interactive Distribution endpoint to be described to enable human or service-based interaction?
>      >    - Should we consider PUB/SUB protocols like MQTT?
>      >
>      >   - ID1: DCAT packaged distributions
>      >
>      >   Interactive, dynamic access
>      >   - ID6: DCAT Distribution to describe web services
>      >   - ID18: Modeling service-based data access
>      >   - ID21: Machine actionable link for a mapping client
>      >   - ID22: Template link in metadata
>      >
>      >   - ID34: Relationships between Distributions of a Dataset
>      >
>      >   - Identification of missing UC with regard to Distributon
>      >
>      >  Session 8 + 9)
>      >
>      >  *DCAT Profiles*
>      >
>      >   - ID10: Requirements for data citation
>      >   - ID10: Common requirements for scientific data
>      >   - ID24: Harmonising INSPIRE-obligations and DCAT-distribution
>      >   - ID37: Europeana profile ecosystem
>      >
>      >  *Profile querying and negotiation*
>      >
>      >   - ID2: Specifying media type interpretation beyond the content type
>      >   - ID3: Combining multiple types of content sections in a single response
>      >   - ID5: Discover available content profiles
>      >   - ID30: Standard APIs for metadata profile negotiation
>      >
>      >   - Identification of missing UC with regard to Profile handling
>      >
>      >  Session 10)
>      >
>      >  *Metamodel*
>      >   - ID11: Modeling identifiers and making them actionable
>      >   - ID19: Guidance on the use of qualified forms
>      >   - ID26: Extension points to 3rd party vocabularies
>      >
>      >   - Identification of missing UC with regard to meta-modeling, methodology etc.
>      >
>      >    Given the overall time frame of 12h I assumed approx. 60min + 10 min break per session.
>      >
>      >      Best regards
>      >    Jaroslav
>      >
>      >
>      >
>      >
>      > On Thursday, July 6, 2017 21:13 CEST, Karen Coyle <kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>> wrote:
>      >
>      >> In preparing for the face-to-face, Caroline and I would like to ask
>      >> the group, especially the UCR editors, to suggest what they see as
>      >> the logical groupings for our discussion sessions. It would be ideal
>      >> for us to have this by the end of the working day (European time) on Tuesday.
>      >>
>      >> We have eight 90-minute slots that we can make use of. If we assume
>      >> that at least part of the first slot will be introductions and
>      >> establishing an overall working hypothesis, then we have 7 slots in
>      >> which to discuss actual use cases. We may also wish to reserve 30
>      >> minutes at the end of the second day to prepare a list of missing use
>      >> cases and immediate tasks relating to this deliverable.
>      >>
>      >> Remember that the primary goal of the F2F meeting is to provide the
>      >> UCR editors with the information and decisions that they need to
>      >> create a First Public Working Draft of the Use Cases and
>      >> Requirements. A FPWD is a "heart-beat" document that is not expected
>      >> to be final but that gives the W3C management and community an
>      >> indication of the direction of the group, as well as proof that it is
>      >> indeed getting its work done. We will expect the UCR to be issued in
>      >> additional versions as the work progresses. Our goal for the FPWD is
>      >> to meet the August 9 W3C deadline for publishing documents, which
>      >> means that the group needs to approve the document before that.
>      >>
>      >> Also, it would be good to have by the end of the Oxford meeting an
>      >> idea of how the DCAT group will proceed once the UCR FPWD is in
>      >> place. We
>      >
>      >> should also determine if the work so far informs the Profile and
>      >> Content Negotiation groups, or if we have more to do in gathering use
>      >> cases in those areas.
>      >> --
>      >> Karen Coyle
>      >> kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> http://kcoyle.net
>      >> m: 1-510-435-8234 <tel:1-510-435-8234> (Signal)
>      >> skype: kcoylenet/+1-510-984-3600 <tel:%2B1-510-984-3600>
>      >>
>      >
>      >
>      >
>      >
> 
>     --
>     Karen Coyle
>     kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> http://kcoyle.net
>     m: 1-510-435-8234 <tel:1-510-435-8234> (Signal)
>     skype: kcoylenet/+1-510-984-3600 <tel:%2B1-510-984-3600>
> 
> 
> 
> 
> 
> -- 
> ----------------------------------------------------------------------------
> Riccardo Albertoni
> Istituto per la Matematica Applicata e Tecnologie Informatiche "Enrico Magenes"
> Consiglio Nazionale delle Ricerche
> via de Marini 6 - 16149 GENOVA - ITALIA
> tel. +39-010-6475624 - fax +39-010-6475660
> e-mail: Riccardo.Albertoni@ge.imati.cnr.it <mailto:Riccardo.Albertoni@ge.imati.cnr.it>
> Skype: callto://riccardoalbertoni/
> LinkedIn: http://www.linkedin.com/in/riccardoalbertoni
> www: _http://www.imati.cnr.it/_
> http://pers.ge.imati.cnr.it/albertoni/PersonalPage/albertoni.html
> FOAF:http://purl.oclc.org/NET/RiccardoAlbertoni/foaf

-- 
Jaroslav Pullmann
Fraunhofer Institute for Applied Information Technology FIT
User-Centered Ubiquitous Computing
Schloss Birlinghoven | D-53757 Sankt Augustin | Germany
Phone: +49-2241-143620 | Fax: +49-2241-142146
Received on Friday, 14 July 2017 09:03:37 UTC