- From: Dan Brickley <danbri@danbri.org>
- Date: Mon, 01 Dec 2008 18:20:29 +0000
- To: Marcos Caceres <marcosscaceres@gmail.com>
- Cc: "Williams, Stuart (HP Labs, Bristol)" <skw@hp.com>, "www-tag@w3.org" <www-tag@w3.org>, public-webapps <public-webapps@w3.org>
Marcos Caceres wrote: > On Mon, Dec 1, 2008 at 5:31 PM, Dan Brickley <danbri@danbri.org> wrote: >> Williams, Stuart (HP Labs, Bristol) wrote: >> >>>>> Well there are ways around that, add a package description >>>>> or meta-data file either at the root of the package or at >>>>> each directory level and have it carry media-type information >>>>> - or use 'magic numbers' or (if you really must - in the >>>>> absense of other authoritative information), sniff/guess >>>>> though I think that should be the least preferred option. >>>>> >>>> Right. The new proposal is that we use file extension mappings to MIME >>>> types, and if that fails, result to sniffing. We are reluctant to >>>> introduce a meta-data format at this point. >> (Just allow RDFa+XHTML and leave it to the marketplace...) >> > > right :) Really? So we are clear here .... does the widgets spec allow <content src="index.html"/> to point to an XHTML document that begins with something like <?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" xmlns:foaf="http://xmlns.com/foaf/0.1/" xmlns:dc="http://purl.org/dc/elements/1.1/"> <head> ...? (see design 4 below) >>>> For version 2 of widgets, >>>> it might be useful to either introduce the meta-data format or have >>>> an Apache-like file extensions to MIME type mapping. For example: >>>> >>>> image/gif .gif >>>> >>>> Note however, that widget engine in the wild have no problem working >>>> without MIME info. From what I have seen, they all do just fine either >>>> sniffing or using file extensions to derive the content types. >>>> >>>>> Anyway - that zip files don't intrinically maintain such >>>>> info is not a show stopper - though I would have thought that >>>>> carrying media-type information is a natural requirement for >>>>> a packaging format for the web. >>>>> >>>> I'm not sure it is. When a MIME type is registered with IANA, the file >>>> extension is also registered. >>> What is registered (RFC 4288 section 4.11) is a list of file name >>> extensions commonly used with the media-type. >>> It does *not* reserve the extension for exclusive use with that >>> media-type. >>> It does *not* prevent other arbitrary file name extension or indeed >>> no-extension being used. >>> >>> So... yes not a bad hint, but nothing is certain. >>> >>>> So one has a standardized way to derive >>>> the media type for a file by the file extension. >>> Not with certainty... >> So this seems like a very small piece of metadata ('this filetree follows >> the IANA filename to media type mappings') has a lot of value. If the >> versions of the IANA mapping are easily identified, the metadata becomes a >> URI rather than a single bit. Either way, you can gain a lot from not a lot, >> I think. >> > > So we are clear, what do you have in mind here? some strawpeople: 1. <mediatypes iana_mappings="true"/> simple. It basically means, "if this is set to true, the filenames you'll find in this zip correspond to (some / latest) version of IANA, at time of widget zip creation. Would need some rules re precedence/ordering. 2. <mediatypes url="..."/> (except i can't find a single URI for versions of their registry) 3. <mediatypes iana_mappings="true" iana_as_of_date="2008-12-01"/> allows to be more explicit about which version of the IANA registry 4. An alternative design would be to lean entirely on RDFa, and put the media type information into the hyperlinks: index.html might have <div typeof="foaf:Person"> This widget made by ... <img rel="foaf:depiction" src="marcos.jpg" property="dc:format" content="image/jpg" alt="Marcos!" /> </div> So designs 1-3 are based on IANA specifying the filename to media type mapping. I'm not sure how this handles contention if three or four registrations all claim associations with eg. "*.png". Design 4 is based on RDF statements that use the dc:format property, whose definition (see http://dublincore.org/documents/dcmi-terms/#terms-format) explicitly covers this ("Examples of dimensions include size and duration. Recommended best practice is to use a controlled vocabulary such as the list of Internet Media Types [MIME]."). The pedants amongst us will note that the mere use of dc:format doesn't guarantee that its values be interpreted as IANA media types, but I'm going to ignore that for now since other vocab (XMP etc) could equally be used without changes to the core spec. If I run an RDF parser against <div typeof="foaf:Person"><img rel="foaf:depiction" src="marcos.jpg" property="dc:format" content="image/jpg" /></div> I get the following: _:bnode0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <file:///Users/danbri/working/rdfa/marcos.jpg> <http://purl.org/dc/elements/1.1/format> "image/jpg"@en . This seems enough to work with. So media type metadata could be collecting by parsing RDFa from all likely files in the ZIP and aggregating the results. The parser could of course have a base URI passed to it, but that's another story (albeit the one this thread started with. Give me a shout if anything's unclear, cheers, Dan -- http://danbri.org/
Received on Monday, 1 December 2008 18:21:15 UTC