[dxwg] Monthly DBpedia releases (#1085) from Sebastian Hellmann via GitHub on 2019-09-18 (public-dxwg-wg@w3.org from September 2019)

From: Sebastian Hellmann via GitHub <sysbot+gh@w3.org>
Date: Wed, 18 Sep 2019 16:27:33 +0000
To: public-dxwg-wg@w3.org
Message-ID: <issues.opened-495331509-1568824052-sysbot+gh@w3.org>

kurzum has just created a new issue for https://github.com/w3c/dxwg:

== Monthly DBpedia releases ==
### DBpedia Releases
Status:
Identifier: https://databus.dbpedia.org/dbpeda/
Creator: Sebastian Hellmann

### Description
We are releasing several thousand files per month now and I have specific questions about `dcat:Distribution` .

I our case group each version according to the generating Scala code: https://databus.dbpedia.org/dbpedia/mappings/mappingbased-objects/2019.09.01
In this example, each month the code is run over 40 different wikipedia dumps and generates 40 different files according to their language variant. All these files together make up the dataset and each file is a partial distribution. See the metadata here:
https://downloads.dbpedia.org/repo/lts/mappings/mappingbased-objects/2019.09.01/dataid.ttl#Dataset

I could not find an appropriate model in the current draft to describe this properly. It is more structured than the bag of file approach as the data uses the Maven model with group/artifact/version and then content/format/compression - variants.

Note that we consider language/different source a variant. All files make up the version snapshot dataset, while you would only need a subset of files for any given use case. A similar example would be the split of files into consequent compressed parts (e.g. 20 * 50mb of 1GB data) with the difference that you would need all files there to get the distribution. How would this be modelled in the current draft?

Please view or discuss this issue at https://github.com/w3c/dxwg/issues/1085 using your GitHub account

Received on Wednesday, 18 September 2019 16:27:35 UTC