The meaning of publishing Data - quality issues

Hi,

Following on what I've asked at the last telecon...

The discussion on what counts as data for publication on the web seems to reach some agreement. For PDF file actually the 'semantic gap' is not so big between their content (esp. text, tables) and what is targeted in more 'conservative' approach to data (CSV, relational tables, RDF...), Even if it's more cumbersome to extract data from some formats.

I feel however we're facing a different range of issues for other files that were mentioned on Friday, which also appear in the use cases: videos, pictures, etc.

So should we count them as data too? I will not try to answer myself here, but I'm interested on the consequence for the quality and granularity vocabulary.

So these of you who think JPEG, AVI etc would count as data: should these be included in data quality assessments?
And if yes, what sort of metrics would you expect for testing 'media' files along quality dimensions such as the ones listed in our UCR doc [1]?

Best,

Antoine

[1] http://www.w3.org/TR/dwbp-ucr/#requirements-for-quality-and-granularity-description-vocabulary

Received on Sunday, 29 March 2015 15:11:24 UTC