Re: Some thoughts on the Q&G vocab

Hi Bernadette,

Hi Jeremy,

Great work! I agree with you that we shouldn't duplicate efforts and that we should consider a domain-independent approach. I guess that categories and dimensions are domain-independent, however metrics may be domain-dependent. For example, some of the metrics described in [1] are specific to linked data. In this case, we may have more than one metric for the same dimension defined according to the domain (or data model, for example). Does it make sense for you?

In a way, categories and dimensions might also be domain-dependent (one dimension might not apply for two different use-cases). The metrics we are defining there are based on [1]. Basically, in the project I’m working on, we identified the required metrics for each of the pilots use-cases. For the same dimension we can have more than one metric (yes you are right), for example the Availability dimension has RDFAvailability, SPARQLEndPointAvailability etc.. metrics. A dimension per se can be assessed using different metrics

I saw that there are more metrics than dimensions. Could you please explain the relation between metrics and dimensions? I'd like to understand if the different matric values are combined to produce a single value, for example: IntensionalConcisenessMetric and ExtensionalConcisenessMetric are combined to produce a value for Conciseness

The relationship between a metric and dimension is that a dimension can have one or more metrics. Similarly, a category can have one or more dimensions. This is explained in the paper I published at LDOW [2]. What we are doing here is that we calculate different metrics on their own, and then present the user with views of the dataset quality from a metric level (and eventually from a dimensional level). We still need to figure out how to correctly produce an aggregate value for the dimension (eg. Conciseness), as an averaged out value is a naive approach and might not show the truth.

I was also wondering why you didn't consider metrics like completeness and correctness in your proposal. In [2], we discussed the use of those dimensions to evaluate the relevance of web data sources.

Still in progress :) We will consider completeness, as it is described in [1]. Wrt correctness, I think there are some metrics described in that survey paper (i think they are under the verifiability dimension). Having said that, daQ is still in its early stages and we will improve it further. So any ideas are welcome :)

Thanks for sharing your paper. Do you have an implementation for those metrics which we can reuse?

I think it is also possible to include the quality metadata (using daQ)  to CKAN together with the information about data usage.

Yes sure. I also think that it is possible.

Cheers,
Jer


[1] http://www.semantic-web-journal.net/system/files/swj556.pdf
[2] http://events.linkeddata.org/ldow2014/papers/ldow2014_paper_09.pdf

Received on Tuesday, 6 May 2014 13:48:25 UTC