Re: ISSUE-12 (valuesForDataFormat): What values to use to describe formats of dcat:Distribution? [DCAT]

I did forget one didn't I - the PRONOM work that Dave Reynolds pointed 
us to. That looks like the kind of thing we need but I'll look at it in 
more detail.



On 14/02/2012 17:51, Phil Archer wrote:
> Pls see below.
>
> On 14/02/2012 17:04, Richard Cyganiak wrote:
>> On 10 Feb 2012, at 16:27, Phil Archer wrote:
>>> I think I'm happiest with, again, pointing to the "this is what we
>>> mean by a stable URI scheme" in the best practices doc and then maybe
>>> giving one or two examples and finally saying that if they can't find
>>> one then "foo/bar" is a reasonable fall-back.
>>
>> Can't we do better than this?
>
> Let's try,
>
>>
>> The goal is interoperability. If four catalogs use these four
>> different ways of denoting RDF/XML:
>>
>> <http://dbpedia.org/resource/RDF/XML>
>> <http://www.w3.org/ns/formats/RDF_XML>
>> "RDF/XML"
>> "application/rdf+xml"
>> "rdf"
>>
>> … then we have failed to reach the goal of interoperability.
>
> True.
>
>>
>> IMO the requirements for our recommended representation of formats are:
>>
>> 1. Easy to convert from the de facto standard identifiers (IETF media
>> types) to our chosen representation (ideally incl. possibility to
>> validate)
>
> Yes. From your list I'd say that leaves us with any of
>
> <http://dbpedia.org/resource/RDF/XML>
> <http://www.w3.org/ns/formats/RDF_XML>
> "application/rdf+xml"
>
> i.e. remove rdf as it's way too ambiguous. Also RDF/XML. If we just
> suggest people use a string it's not controlled enough and we'll get
> junk (the Excel example being an excellent case in point).
>
>>
>> 2. Reasonably complete coverage of file formats
>
> Yes. That knocks out /ns/formats which is currently very small.
>
> <http://dbpedia.org/resource/RDF/XML>
> "application/rdf+xml"
>
>>
>> 3. Ability to handle existing data that doesn't use a controlled
>> vocabulary (e.g., what if you have all of “XLS”, “Excel”, “Excel 95”,
>> “MS Office Spreadsheet” in the input data?)
>
> And here's where we hit the reason why Ivan created /ns/formats. MIME
> types don't provide a 1-1 mapping to actual file formats. So I'm going
> to put it back in and knock out the plain MIME type.
>
> <http://dbpedia.org/resource/RDF/XML>
> <http://www.w3.org/ns/formats/RDF_XML>
>
>>
>> 4. Some recommendation for how to deal with file formats that are not
>> registered anywhere, e.g., shapefiles
>
> Wikipedia doesn't distinguish between the different versions of Excel
> (and presumably the same is true for Word, PPT etc. I didn't bother to
> check). Also, is there a wiki/DBpedia entry for every file format?
>
> Which drives me, with all due respect and acknowledgement to the person
> I'm about to say this to, to knock out DBpedia to leave us with:
>
> <http://www.w3.org/ns/formats/RDF_XML>
>
> Another issue is the push-back from governments on using DBpedia. It may
> not be considered stable enough (I know, I know...) and if we propose
> something that people don't like well, the outcome is obvious.
>
> The /ns/formats solution has come up in the ADMS work as being
> "something it would be really good to have extended." It's not a huge
> job to do this, but it is a human task. And that means it needs
> maintenance.
>
> We would need to set up a system that made it easy to add new entries
> (at the moment you have to write a few files, update a .htacees file and
> what have you). So, OK, let's take this out again which leaves us with:
>
> <empty />
>
>
> Have we overlooked anything?
>
> Well, there's Ed Summers' work that Michael pointed us to
> http://mediatypes.appspot.com/ which gets around most of the problems
> but still leaves us with the lack of 1-1 mapping. But that might be the
> best we can do.
>
> To be usable, we'd need to bring it into w3.org or some other
> über-stable domain. And that means sys team support. It's not
> impossible, especially as the code exists, and if there's a community
> willing to maintain it, OK, but I wonder if this is the time to test out
> Sandro's idea of a single domain for a single purpose.
>
> That would mean setting up, say, fileformats.org (it's for sale) and
> then managing it as part of the W3C 'estate'. That's probably an even
> higher hurdle than getting the necessary permissions to run Ed's code on
> w3.org.
>
> I really hope that others can blow a hole in my thinking and point out
> the easy answer!
>
> Phil.
>
>
>
>
>
>
>
>
>
>>
>>>
>>> On 10/02/2012 15:51, Michael Hausenblas wrote:
>>>>
>>>> Or, why not re-deploy Ed's excellent http://mediatypes.appspot.com/
>>>> under an W3C domain? :)
>>>>
>>>> Cheers,
>>>> Michael
>>>> --
>>>> Dr. Michael Hausenblas, Research Fellow
>>>> LiDRC - Linked Data Research Centre
>>>> DERI - Digital Enterprise Research Institute
>>>> NUIG - National University of Ireland, Galway
>>>> Ireland, Europe
>>>> Tel. +353 91 495730
>>>> http://linkeddata.deri.ie/
>>>> http://sw-app.org/about.html
>>>>
>>>> On 10 Feb 2012, at 15:19, Phil Archer wrote:
>>>>
>>>>> I'm getting some push-back from gov data publishers on using DBpedia
>>>>> sadly (it's third party, it's not real, it's not stable, not like all
>>>>> our wonderful government department Web sites that sometimes stay on
>>>>> line for whole months!). The PROMOM effort that Dave has highlighted
>>>>> looks like the kind of thing they'd like more - government agency to
>>>>> government agency - as long as there's no ".uk" anywhere in the URIs I
>>>>> guess.
>>>>>
>>>>> How about "use a stable URI scheme for file formats if available,
>>>>> falling back to the MIME type if not available" ?
>>>>>
>>>>> Phil.
>>>>>
>>>>>
>>>>>
>>>>> On 10/02/2012 15:06, John Erickson wrote:
>>>>>>>> The Right Thing to do would be to get IETF to mint URIs for all
>>>>>>>> media
>>>>>>>> types, and get ESRI to register a media type for their file format,
>>>>>>>> etc.
>>>>>>>> This may not be feasible.
>>>>>>
>>>>>> ...or maybe we could just follow the same, de facto convention we've
>>>>>> been following of using URIs from A Certain Third party:
>>>>>>
>>>>>> http://dbpedia.org/resource/TIFF
>>>>>> http://dbpedia.org/resource/JPEG
>>>>>> http://dbpedia.org/resource/GZIP
>>>>>>
>>>>>> ...etc. ;)
>>>>>>
>>>>>
>>>>> --
>>>>>
>>>>>
>>>>> Phil Archer
>>>>> W3C eGovernment
>>>>> http://www.w3.org/egov/
>>>>>
>>>>> http://philarcher.org
>>>>> +44 (0)7887 767755
>>>>> @philarcher1
>>>>>
>>>>
>>>>
>>>
>>> --
>>>
>>>
>>> Phil Archer
>>> W3C eGovernment
>>> http://www.w3.org/egov/
>>>
>>> http://philarcher.org
>>> +44 (0)7887 767755
>>> @philarcher1
>>>
>>
>>
>

-- 


Phil Archer
W3C eGovernment
http://www.w3.org/egov/

http://philarcher.org
+44 (0)7887 767755
@philarcher1

Received on Tuesday, 14 February 2012 23:08:03 UTC