Re: Notes from today's meeting

Hi Jerven,


On Mon, Jun 3, 2013 at 6:30 PM, Jerven Bolleman <me@jerven.eu> wrote:

> Hi All,
>
> I wanted to discuss one more thing that has been decided in an earlier
> meeting.
> And that is the choice for dcterms:created or pav:createdOn.
> As a large data provider I want to only share the date that I published
> the data on.
> i.e. dcterms:issued. Could we change the must to include issued next to
> created or createdOn.
>
> This is also a crucial date for the general public while created is not.
> (e.g. for patent court cases date of publication is critical, the day the
> file was internally ready is not)
>
>
under the availability section, we have yet to discuss "issued".  From a
provenance perspective, "created" is primary metadata, and may coincide
with issued for some cases.



> Also continuing the discussion on e-mail that started on the call.
> We should have a clear definition of data item if we are going to record
> information about such things. e.g. baseURI, what happens if we have 2 data
> item types in a single dataset?
>
>
ultimately, what i want is to :
i) to validate the syntax of identifier in some dataset or cross reference
(legacy, RDF)
ii) to compose a URI from a preferred or alternative prefix and an
identifier   (legacy to RDF)
iii) to decompose a URI to a preferred prefix and identifier pair (RDF to
legacy)
iv) to translate one URI pattern to another URI pattern (RDF)



> About void:inDataset I personally don't like it. I suspect it would cost
> me a 13% growth in triple size for negligible benefits. This also means
> that the dataset description starts to affect the data. Although I could
> only present this in the rest / linked data interface and not in the sparql
> endpoint. I am worried that I can not put it into the FTP data dump rdf. As
> the data item concept does not map 1:1 on a set of triples that are atomic.
>
>
i'm not sure that i completely understand your objection. the primary use
of void:inDataset is to link data items to the dataset description, and as
such supports linked data applications without looking at the graph for a
potential, but un-guaranteed provenance description. Using void:inDataset
is normal practice in the RDF / linked data community. It would be strange
to not include it in any RDF dataset if you have the dataset description.

http://www.w3.org/TR/void/#backlinks



> e.g. someone can use just the UniProtKB sequences. Once they did that is
> it still the same dataset that I published it as? I don't think so. Which
> means uniprot end users need to be careful to remove more triples. Which
> why I disagree with alasdair's call for MUST.
>
>
if one wanted to know which version/issue of uniprot that the sequences
came from, it would be necessary to provide access to the dataset
description. if the void:inDataset predicate is used, the user need not
even retrieve that to store locally, as you should provide resolution
services to those dataset descriptions.

m.


> Regards,
> Jerven
>
> Regards,
> Jerven
>
>
> Regards,
> Jerven
>
>
>
>
>
>
>
> On Mon, Jun 3, 2013 at 6:11 PM, Alasdair J G Gray <
> Alasdair.Gray@manchester.ac.uk> wrote:
>
>>  Hi All,
>>
>> Seems that fuse threw everyone out at the crucial moment. We will pick up
>> the void:inDataset discussion next week as well as addressing the
>> provenance section.
>>
>> Attached are the notes I made during today's call.
>>
>> Alasdair
>>
>>
>>
>> Dr Alasdair J G Gray
>> Research Associate
>> Alasdair.Gray@manchester.ac.uk
>> +44 161 275 0145
>>
>> http://www.cs.man.ac.uk/~graya/
>>
>> Please consider the environment before printing this email.
>>
>>
>>
>
>
> --
> Jerven Bolleman
> me@jerven.eu
>



-- 
Michel Dumontier
Associate Professor of Bioinformatics, Carleton University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
http://dumontierlab.com

Received on Monday, 3 June 2013 16:52:30 UTC