Re: Sketch of an idea to address widget/package addressing with fragID syntax and media-type defn.

Marcos Caceres wrote:
> On Mon, Dec 1, 2008 at 5:31 PM, Dan Brickley <danbri@danbri.org> wrote:
>> Williams, Stuart (HP Labs, Bristol) wrote:
>>
>>>>> Well there are ways around that, add a package description
>>>>> or meta-data file either at the root of the package or at
>>>>> each directory level and have it carry media-type information
>>>>> - or use 'magic numbers' or (if you really must - in the
>>>>> absense of other authoritative information), sniff/guess
>>>>> though I think that should be the least preferred option.
>>>>>
>>>> Right. The new proposal is that we use file extension mappings to MIME
>>>> types, and if that fails, result to sniffing. We are reluctant to
>>>> introduce a meta-data format at this point.
>> (Just allow RDFa+XHTML and leave it to the marketplace...)
>>
> 
> right :)

Really? So we are clear here .... does the widgets spec allow   <content 
src="index.html"/>  to point to an XHTML document that begins with 
something like

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" 
"http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" 
xmlns:foaf="http://xmlns.com/foaf/0.1/"
   xmlns:dc="http://purl.org/dc/elements/1.1/">
<head>
...?

(see design 4 below)

>>>> For version 2 of widgets,
>>>> it might be useful to either introduce the meta-data format or  have
>>>> an Apache-like file extensions to MIME type mapping. For example:
>>>>
>>>> image/gif .gif
>>>>
>>>> Note however, that widget engine in the wild have no problem working
>>>> without MIME info. From what I have seen, they all do just fine either
>>>> sniffing or using file extensions to derive the content types.
>>>>
>>>>> Anyway - that zip files don't intrinically maintain such
>>>>> info is not a show stopper - though I would have thought that
>>>>> carrying media-type information is a natural requirement for
>>>>> a packaging format for the web.
>>>>>
>>>> I'm not sure it is. When a MIME type is registered with IANA, the file
>>>> extension is also registered.
>>> What is registered (RFC 4288 section 4.11) is a list of file name
>>> extensions commonly used with the media-type.
>>> It does *not* reserve the extension for exclusive use with that
>>> media-type.
>>> It does *not* prevent other arbitrary file name extension or indeed
>>> no-extension being used.
>>>
>>> So... yes not a bad hint, but nothing is certain.
>>>
>>>> So one has a standardized way to derive
>>>> the media type for a file by the file extension.
>>> Not with certainty...
>> So this seems like a very small piece of metadata ('this filetree follows
>> the IANA filename to media type mappings') has a lot of value. If the
>> versions of the IANA mapping are easily identified, the metadata becomes a
>> URI rather than a single bit. Either way, you can gain a lot from not a lot,
>> I think.
>>
> 
> So we are clear, what do you have in mind here?

some strawpeople:

1.
<mediatypes iana_mappings="true"/>

simple. It basically means, "if this is set to true, the filenames 
you'll find in this zip correspond to (some / latest) version of IANA, 
at time of widget zip creation. Would need some rules re 
precedence/ordering.

2.
<mediatypes url="..."/> (except i can't find a single URI for versions 
of their registry)

3.
<mediatypes iana_mappings="true" iana_as_of_date="2008-12-01"/>

allows to be more explicit about which version of the IANA registry

4.
An alternative design would be to lean entirely on RDFa, and put the 
media type information into the hyperlinks:

index.html might have

<div typeof="foaf:Person">  This widget made by ...
   <img rel="foaf:depiction" src="marcos.jpg" property="dc:format" 
content="image/jpg" alt="Marcos!" />
</div>



So designs 1-3 are based on IANA specifying the filename to media type 
mapping. I'm not sure how this handles contention if three or four 
registrations all claim associations with eg. "*.png".

Design 4 is based on RDF statements that use the dc:format property, 
whose definition (see 
http://dublincore.org/documents/dcmi-terms/#terms-format) explicitly 
covers this ("Examples of dimensions include size and duration. 
Recommended best practice is to use a controlled vocabulary such as the 
list of Internet Media Types [MIME]."). The pedants amongst us will note 
that the mere use of dc:format doesn't guarantee that its values be 
interpreted as IANA media types, but I'm going to ignore that for now 
since other vocab (XMP etc) could equally be used without changes to the 
core spec.


If I run an RDF parser against
<div typeof="foaf:Person"><img rel="foaf:depiction" src="marcos.jpg" 
property="dc:format" content="image/jpg" /></div> I get the following:

_:bnode0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://xmlns.com/foaf/0.1/Person> .
<file:///Users/danbri/working/rdfa/marcos.jpg> 
<http://purl.org/dc/elements/1.1/format> "image/jpg"@en .

This seems enough to work with. So media type metadata could be 
collecting by parsing RDFa from all likely files in the ZIP and 
aggregating the results. The parser could of course have a base URI 
passed to it, but that's another story (albeit the one this thread 
started with.

Give me a shout if anything's unclear,

cheers,

Dan

--
http://danbri.org/

Received on Monday, 1 December 2008 18:21:15 UTC