- From: Marcos Caceres <marcosscaceres@gmail.com>
- Date: Mon, 1 Dec 2008 20:13:16 +0000
- To: "Dan Brickley" <danbri@danbri.org>
- Cc: "Williams, Stuart (HP Labs, Bristol)" <skw@hp.com>, "www-tag@w3.org" <www-tag@w3.org>, public-webapps <public-webapps@w3.org>
Hi Dan, On Mon, Dec 1, 2008 at 6:20 PM, Dan Brickley <danbri@danbri.org> wrote: > Marcos Caceres wrote: >> >> On Mon, Dec 1, 2008 at 5:31 PM, Dan Brickley <danbri@danbri.org> wrote: >>> >>> Williams, Stuart (HP Labs, Bristol) wrote: >>> >>>>>> Well there are ways around that, add a package description >>>>>> or meta-data file either at the root of the package or at >>>>>> each directory level and have it carry media-type information >>>>>> - or use 'magic numbers' or (if you really must - in the >>>>>> absense of other authoritative information), sniff/guess >>>>>> though I think that should be the least preferred option. >>>>>> >>>>> Right. The new proposal is that we use file extension mappings to MIME >>>>> types, and if that fails, result to sniffing. We are reluctant to >>>>> introduce a meta-data format at this point. >>> >>> (Just allow RDFa+XHTML and leave it to the marketplace...) >>> >> >> right :) > > Really? So we are clear here .... does the widgets spec allow <content > src="index.html"/> to point to an XHTML document that begins with something > like > > <?xml version="1.0" encoding="utf-8"?> > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" > "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> > <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" > xmlns:foaf="http://xmlns.com/foaf/0.1/" > xmlns:dc="http://purl.org/dc/elements/1.1/"> > <head> > ...? > > (see design 4 below) > It sure does. But, and this is a big *but*, XHTML support is purely OPTIONAL. However, authors can certainly do the following: <content src="index.html" content-type="application/xhtml+xml"/> >>>>> For version 2 of widgets, >>>>> it might be useful to either introduce the meta-data format or have >>>>> an Apache-like file extensions to MIME type mapping. For example: >>>>> >>>>> image/gif .gif >>>>> >>>>> Note however, that widget engine in the wild have no problem working >>>>> without MIME info. From what I have seen, they all do just fine either >>>>> sniffing or using file extensions to derive the content types. >>>>> >>>>>> Anyway - that zip files don't intrinically maintain such >>>>>> info is not a show stopper - though I would have thought that >>>>>> carrying media-type information is a natural requirement for >>>>>> a packaging format for the web. >>>>>> >>>>> I'm not sure it is. When a MIME type is registered with IANA, the file >>>>> extension is also registered. >>>> >>>> What is registered (RFC 4288 section 4.11) is a list of file name >>>> extensions commonly used with the media-type. >>>> It does *not* reserve the extension for exclusive use with that >>>> media-type. >>>> It does *not* prevent other arbitrary file name extension or indeed >>>> no-extension being used. >>>> >>>> So... yes not a bad hint, but nothing is certain. >>>> >>>>> So one has a standardized way to derive >>>>> the media type for a file by the file extension. >>>> >>>> Not with certainty... >>> >>> So this seems like a very small piece of metadata ('this filetree follows >>> the IANA filename to media type mappings') has a lot of value. If the >>> versions of the IANA mapping are easily identified, the metadata becomes >>> a >>> URI rather than a single bit. Either way, you can gain a lot from not a >>> lot, >>> I think. >>> >> >> So we are clear, what do you have in mind here? > > some strawpeople: > > 1. > <mediatypes iana_mappings="true"/> > > simple. It basically means, "if this is set to true, the filenames you'll > find in this zip correspond to (some / latest) version of IANA, at time of > widget zip creation. Would need some rules re precedence/ordering. > I like this, but I think this should be the default (always on). I think we should introduce the override in Widgets version 2. > 2. > <mediatypes url="..."/> (except i can't find a single URI for versions of > their registry) > Yeah. I did a search for this too... no luck. Would be helpful. > 3. > <mediatypes iana_mappings="true" iana_as_of_date="2008-12-01"/> > > allows to be more explicit about which version of the IANA registry > This is getting a bit too fancy I think. > 4. > An alternative design would be to lean entirely on RDFa, and put the media > type information into the hyperlinks: > > index.html might have > > <div typeof="foaf:Person"> This widget made by ... > <img rel="foaf:depiction" src="marcos.jpg" property="dc:format" > content="image/jpg" alt="Marcos!" /> > </div> > > This would require that all widgets be written this way. Seems very labor intensive to have to specify the content for every resource one references. I think it would also be more prone to errors, as it puts the burden on authors to identify the content types. For references that dereference locally (i.e., not HTTP, but widget:// or whatever scheme we end up with), it would be better to have the widget engine resolve the type via the file extension or sniffing. Like I said, today's widget engines work just fine by deriving types from file extensions and/or sniffing. > > So designs 1-3 are based on IANA specifying the filename to media type > mapping. I'm not sure how this handles contention if three or four > registrations all claim associations with eg. "*.png". > I think we resolve this by baking the types and extensions into the Widget spec. There is not that many formats that are supported by widget engines. We list the media types in the widget landscape [1] that most widget engines support. Worst case, it will be something like 20 file extensions. Proprietary types can be supported by implementers, if they so choose (e.g., flash). > Design 4 is based on RDF statements that use the dc:format property, whose > definition (see http://dublincore.org/documents/dcmi-terms/#terms-format) > explicitly covers this ("Examples of dimensions include size and duration. > Recommended best practice is to use a controlled vocabulary such as the list > of Internet Media Types [MIME]."). The pedants amongst us will note that the > mere use of dc:format doesn't guarantee that its values be interpreted as > IANA media types, but I'm going to ignore that for now since other vocab > (XMP etc) could equally be used without changes to the core spec. > > > If I run an RDF parser against > <div typeof="foaf:Person"><img rel="foaf:depiction" src="marcos.jpg" > property="dc:format" content="image/jpg" /></div> I get the following: > > _:bnode0 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> > <http://xmlns.com/foaf/0.1/Person> . > <file:///Users/danbri/working/rdfa/marcos.jpg> > <http://purl.org/dc/elements/1.1/format> "image/jpg"@en . > > This seems enough to work with. So media type metadata could be collecting > by parsing RDFa from all likely files in the ZIP and aggregating the > results. The parser could of course have a base URI passed to it, but that's > another story (albeit the one this thread started with. > > Give me a shout if anything's unclear, That's pretty clear. Thank you. [1] http://www.w3.org/TR/widgets-land/#authoring -- Marcos Caceres http://datadriven.com.au
Received on Monday, 1 December 2008 20:13:57 UTC