W3C home > Mailing lists > Public > public-gld-wg@w3.org > February 2012

Re: ISSUE-12 (valuesForDataFormat): What values to use to describe formats of dcat:Distribution? [DCAT]

From: Phil Archer <phila@w3.org>
Date: Tue, 14 Feb 2012 17:51:47 +0000
Message-ID: <4F3A9F33.4060407@w3.org>
To: Richard Cyganiak <richard@cyganiak.de>
CC: Michael Hausenblas <michael.hausenblas@deri.org>, Government Linked Data Working Group WG <public-gld-wg@w3.org>
Pls see below.

On 14/02/2012 17:04, Richard Cyganiak wrote:
> On 10 Feb 2012, at 16:27, Phil Archer wrote:
>> I think I'm happiest with, again, pointing to the "this is what we mean by a stable URI scheme" in the best practices doc and then maybe giving one or two examples and finally saying that if they can't find one then "foo/bar" is a reasonable fall-back.
> Can't we do better than this?

Let's try,

> The goal is interoperability. If four catalogs use these four different ways of denoting RDF/XML:
>      <http://dbpedia.org/resource/RDF/XML>
>      <http://www.w3.org/ns/formats/RDF_XML>
>      "RDF/XML"
>      "application/rdf+xml"
>      "rdf"
> … then we have failed to reach the goal of interoperability.


> IMO the requirements for our recommended representation of formats are:
> 1. Easy to convert from the de facto standard identifiers (IETF media types) to our chosen representation (ideally incl. possibility to validate)

Yes. From your list I'd say that leaves us with any of


i.e. remove rdf as it's way too ambiguous. Also RDF/XML. If we just 
suggest people use a string it's not controlled enough and we'll get 
junk (the Excel example being an excellent case in point).

> 2. Reasonably complete coverage of file formats

Yes. That knocks out /ns/formats which is currently very small.


> 3. Ability to handle existing data that doesn't use a controlled vocabulary (e.g., what if you have all of “XLS”, “Excel”, “Excel 95”, “MS Office Spreadsheet” in the input data?)

And here's where we hit the reason why Ivan created /ns/formats. MIME 
types don't provide a 1-1 mapping to actual file formats. So I'm going 
to put it back in and knock out the plain MIME type.


> 4. Some recommendation for how to deal with file formats that are not registered anywhere, e.g., shapefiles

Wikipedia doesn't distinguish between the different versions of Excel 
(and presumably the same is true for Word, PPT etc. I didn't bother to 
check). Also, is there a wiki/DBpedia entry for every file format?

Which drives me, with all due respect and acknowledgement to the person 
I'm about to say this to, to knock out DBpedia to leave us with:


Another issue is the push-back from governments on using DBpedia. It may 
not be considered stable enough (I know, I know...)  and if we propose 
something that people don't like well, the outcome is obvious.

The /ns/formats solution has come up in the ADMS work as being 
"something it would be really good to have extended." It's not a huge 
job to do this, but it is a human task. And that means it needs 

We would need to set up a system that made it easy to add new entries 
(at the moment you have to write a few files, update a .htacees file and 
what have you). So, OK, let's take this out again which leaves us with:

<empty />

Have we overlooked anything?

Well, there's Ed Summers' work that Michael pointed us to 
http://mediatypes.appspot.com/ which gets around most of the problems 
but still leaves us with the lack of 1-1 mapping. But that might be the 
best we can do.

To be usable, we'd need to bring it into w3.org or some other 
über-stable domain. And that means sys team support. It's not 
impossible, especially as the code exists, and if there's a community 
willing to maintain it, OK, but I wonder if this is the time to test out 
Sandro's idea of a single domain for a single purpose.

That would mean setting up, say, fileformats.org (it's for sale) and 
then managing it as part of the W3C 'estate'. That's probably an even 
higher hurdle than getting the necessary permissions to run Ed's code on 

I really hope that others can blow a hole in my thinking and point out 
the easy answer!


>> On 10/02/2012 15:51, Michael Hausenblas wrote:
>>> Or, why not re-deploy Ed's excellent http://mediatypes.appspot.com/
>>> under an W3C domain? :)
>>> Cheers,
>>> Michael
>>> --
>>> Dr. Michael Hausenblas, Research Fellow
>>> LiDRC - Linked Data Research Centre
>>> DERI - Digital Enterprise Research Institute
>>> NUIG - National University of Ireland, Galway
>>> Ireland, Europe
>>> Tel. +353 91 495730
>>> http://linkeddata.deri.ie/
>>> http://sw-app.org/about.html
>>> On 10 Feb 2012, at 15:19, Phil Archer wrote:
>>>> I'm getting some push-back from gov data publishers on using DBpedia
>>>> sadly (it's third party, it's not real, it's not stable, not like all
>>>> our wonderful government department Web sites that sometimes stay on
>>>> line for whole months!). The PROMOM effort that Dave has highlighted
>>>> looks like the kind of thing they'd like more - government agency to
>>>> government agency - as long as there's no ".uk" anywhere in the URIs I
>>>> guess.
>>>> How about "use a stable URI scheme for file formats if available,
>>>> falling back to the MIME type if not available" ?
>>>> Phil.
>>>> On 10/02/2012 15:06, John Erickson wrote:
>>>>>>> The Right Thing to do would be to get IETF to mint URIs for all media
>>>>>>> types, and get ESRI to register a media type for their file format,
>>>>>>> etc.
>>>>>>> This may not be feasible.
>>>>> ...or maybe we could just follow the same, de facto convention we've
>>>>> been following of using URIs from A Certain Third party:
>>>>> http://dbpedia.org/resource/TIFF
>>>>> http://dbpedia.org/resource/JPEG
>>>>> http://dbpedia.org/resource/GZIP
>>>>> ...etc. ;)
>>>> --
>>>> Phil Archer
>>>> W3C eGovernment
>>>> http://www.w3.org/egov/
>>>> http://philarcher.org
>>>> +44 (0)7887 767755
>>>> @philarcher1
>> --
>> Phil Archer
>> W3C eGovernment
>> http://www.w3.org/egov/
>> http://philarcher.org
>> +44 (0)7887 767755
>> @philarcher1


Phil Archer
W3C eGovernment

+44 (0)7887 767755
Received on Tuesday, 14 February 2012 17:52:19 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 19:52:00 UTC