W3C home > Mailing lists > Public > semantic-web@w3.org > June 2017

Re: using DCAT for scraped data

From: Gray, Alasdair J G <A.J.G.Gray@hw.ac.uk>
Date: Tue, 6 Jun 2017 13:04:37 +0000
To: "Simon.Cox@csiro.au" <Simon.Cox@csiro.au>
CC: Phil Archer <phila@w3.org>, "public-dxwg-comments@w3.org" <public-dxwg-comments@w3.org>, "semantic-web@w3.org" <semantic-web@w3.org>, "longo@dmi.unict.it" <longo@dmi.unict.it>
Message-ID: <F9541BEE-3444-4CA6-BB13-7AAAD9AC61A7@hw.ac.uk>
We had some similar use cases in the open phacts project where we wanted to capture that one dataset had been derived from another one. We used pay properties to capture where and when the source data was retrieved and by whom. See section 4 of

Best regards,


On 4 Jun 2017, at 22:12, Simon.Cox@csiro.au<mailto:Simon.Cox@csiro.au> wrote:

Yes, looks like a well thought out use case.

From: Phil Archer <phila@w3.org<mailto:phila@w3.org>>
Sent: Sunday, 4 June 2017 7:29:57 PM
To: public-dxwg-comments@w3.org<mailto:public-dxwg-comments@w3.org>
Cc: Semantic Web IG; Cristiano Longo
Subject: Fwd: using DCAT for scraped data

Forwarding to the Dataset Exchange WG [1], recently launched, which is
chartered to work on DCAT. This may be a use case.


[1] https://www.w3.org/2017/dxwg/

-------- Forwarded Message --------
Subject: using DCAT for scraped data
Resent-Date: Sat, 03 Jun 2017 14:33:25 +0000
Resent-From: semantic-web@w3.org<mailto:semantic-web@w3.org>
Date: Sat, 3 Jun 2017 16:32:30 +0200
From: Cristiano Longo <longo@dmi.unict.it<mailto:longo@dmi.unict.it>>
To: semantic-web@w3.org<mailto:semantic-web@w3.org>, Alessio Cimarelli <alessio.cimarelli@gmail.com<mailto:alessio.cimarelli@gmail.com>>

Dear All,

I'm writing from an Hackaton at the Open Data Fest 2017
(opendatafest.it<http://opendatafest.it>)  in Sicily. We are building an ontology of Albo POP
(http://albopop.it<http://albopop.it/>) using DCAT and its specialization DCAT_ap_it.
Roughly speaking, an Albo POP is an automated tool which provides an RSS
feed a set of notices and advices from a Public Administration (usually
a municipality) by scraping the notices from the web site of the Public
Administration itself.

We model using dcat the RSS feed we provide as a distribution, but we
would like to make explicit that the data come from the public
administration. We adopted the followings:

a) put the notices web page of the public administration by using the
source property of the dublin core terms  vocabulary, attached to the

b) as rights Holder we specify the municipality and

c) as publisher we indicate the developer who created the scaper which
converts the notices page to RSS.

An example is attached to this mail.

We would like to know if this approach may be considered acceptable. Any
suggestion is welcome.

Thanks in advance,

Cristiano Longo

Alasdair J G Gray
Fellow of the Higher Education Academy
Assistant Professor in Computer Science,
School of Mathematical and Computer Sciences
(Athena SWAN Bronze Award)
Heriot-Watt University, Edinburgh UK.

Email: A.J.G.Gray@hw.ac.uk<mailto:A.J.G.Gray@hw.ac.uk>
Web: http://www.macs.hw.ac.uk/~ajg33
ORCID: http://orcid.org/0000-0002-5711-4872
Office: Earl Mountbatten Building 1.39
Twitter: @gray_alasdair


Founded in 1821, Heriot-Watt is a leader in ideas and solutions. With campuses and students across the entire globe we span the world, delivering innovation and educational excellence in business, engineering, design and the physical, social and life sciences.

This email is generated from the Heriot-Watt University Group, which includes:

  1.  Heriot-Watt University, a Scottish charity registered under number SC000278
  2.  Edinburgh Business School a Charity Registered in Scotland, SC026900. Edinburgh Business School is a company limited by guarantee, registered in Scotland with registered number SC173556 and registered office at Heriot-Watt University Finance Office, Riccarton, Currie, Midlothian, EH14 4AS
  3.  Heriot- Watt Services Limited (Oriam), Scotland's national performance centre for sport. Heriot-Watt Services Limited is a private limited company registered is Scotland with registered number SC271030 and registered office at Research & Enterprise Services Heriot-Watt University, Riccarton, Edinburgh, EH14 4AS.

The contents (including any attachments) are confidential. If you are not the intended recipient of this e-mail, any disclosure, copying, distribution or use of its contents is strictly prohibited, and you should please notify the sender immediately and then delete it (including any attachments) from your system.
Received on Tuesday, 6 June 2017 13:05:25 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:50 UTC