Re: Media Type Sub-Sub-types? from Xiaoshu Wang on 2010-04-06 (www-tag@w3.org from April 2010)

From: Xiaoshu Wang <xiao@renci.org>
Date: Tue, 06 Apr 2010 11:19:19 -0400
To: "nathan@webr3.org" <nathan@webr3.org>
CC: Larry Masinter <LMM@acm.org>, "'Story Henry'" <henry.story@bblfish.net>, "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <4BBB50F7.2010500@renci.org>
On 4/5/10 9:15 PM, Nathan wrote:
> Xiaoshu Wang wrote:
>    
>> On 4/5/10 7:47 PM, Nathan wrote:
>>      
>>> Following suite with the top-posting :)
>>>
>>> In all honesty I only see need for this in very specific use cases,
>>> going through most of the HTTP methods here's how I (personally) see it:
>>>
>>> GET - You either GET a resource blindly and when you get a successful
>>> response you process the information you receive if you understand it,
>>> and in the case of rdf, xml, json etc you only process what you
>>> understand and ignore the rest, so it's a non-issue. (Because you're not
>>> going to keep on re-requesting the same resource expecting it to
>>> suddenly change to what you want, no amount of negotiation will do that
>>> - and any indication of sub-negotiation where representations contain
>>> different content is imho a blatant miss-use or some kind of weird
>>> anti-pattern)
>>>
>>>        
>> Well, I don't think software program works like that. When we write a
>> software, we pretty much know exactly what we can or cannot process.
>> Having a semantic type makes that possible. If I know that I can
>> understand a document of type X written in syntax Y, I explicitly
>> request that. Giving me another type X1 in Y or X in Y1 doesn't suite
>> the purpose.
>>
>> Take XML as an example. The schema of an XML file is, in fact, its
>> semantic type. If I get an XML type of a different schema, my XML parser
>> can work to parse it in a tree. But that is all I can do.
>>      
> Perhaps we misunderstand each other..
>    
It could be because all problems are communication problems. :-)
> Are you indicating that a GET to a single resource, might return not
> only different representations of the same information, but different
> information in different representations too?
>    
Here, the definition of different needs a bit clear definition. For 
instance, if I own a URI -- http://what.I.know.org/about/SSH2 -- 
denoting my knowledge about a particular gene, say, SSH2. Given two XML 
schema, which semantic domain overlap, but none subsuming the other,  
are we talking about the same or different thing?
> or - are you indicating, that say in the case of an xml based RSS feed
> that you should be able to request the version of RSS your client
> understands? (and negotiate that way, which would fit the sub-sub-type
> model I'm discussing).
>    
Yes.
>>> With DELETE and OPTIONS it's irrelevant, likewise with POST.
>>>
>>>        
>> I disagree. POST+form is syntactically equivalent to a GET+path/or query
>> parameter. There is no essential difference here only different
>> methodology.
>>      
>>> With PUT there is no way for a server to indicate what it Accepts any
>>> way. Additionally the general pattern on the web is to create a resource
>>> using POST and then update it using PUT, this itself indicates that in
>>> order to update you must already know and understand the entity at the
>>> resource so is a bit of a non-issue. The other use case is where people
>>> blindly try and PUT things to any old location on the web, but I'd
>>> assume that in virtually 100% of cases clients (or machines) are
>>> configured to use a specific location / directory to store things in, so
>>> again a non-issue.
>>>
>>>        
>> Really. What if I want to PUT something in a database?
>>      
> Wouldn't you POST something in to a database or PUT an entirely new
> database..?
>    
A URI denotes a resource, which can be anything. Hence, PUT/GET/POST 
etc. are really doing the same thing -- communication. GET seems not 
having a message body. But the message body is in the URI itself.  
Unless you have something specific in mind for a particular resource, 
defining the different semantics of GET/POST/PUT etc., will be a waste 
of time.
> HTTPBIS
> http://tools.ietf.org/html/draft-ietf-httpbis-p2-semantics-09#section-7.5
> "POST is designed to allow a uniform method to cover the following
> functions: ... Extending a database through an append operation."
>
> or RFC2616 if you prefer:
> http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html#sec9.5
> "The posted entity is subordinate to that URI in the same way that ... a
> record is subordinate to a database."
>    
Who said there will be a database at the back-end? Do you think in 
practice we really care? What is the difference for the following two 
ways of communication?

GET http://example.com/x/2
POST http://example.com/?x=2 (I used query parameter as form since it is 
easier to express)

POST is a syntactic sugar. The bottom line is REQUEST-RESPONSE. It is 
about communication, it is about convey meaning or intention in Larry's 
word.

Who said there is a database behind http://example.com/? Do we really 
care? I don't. I only care what response I will get. I don't care how it 
gets implemented.
>>> With HEAD there may be some scope for a Content-Features type header
>>> which indicates that the would-be returned entity contains ontologies
>>> x&y or in the case of XML primarily rss or atom data etc.
>>>
>>> The only uncovered functionality I can see is specifically when it comes
>>> to sending RDF data from client to server, using the PATCH method, and
>>> where server depends on that RDF data including data from a specific
>>> diff/patch ontology, the server can 4** fail the request but has no way
>>> of communicating that it Accept-Patch's of type RDF/XML using ontology
>>> X. This will lead to a series of machine to machine try.catch attempts
>>> which will add strain to the web in general and it could be avoided if
>>> this were addressed.
>>>
>>> AFAICT PATCH is the first verb which introduces a server media type
>>> dependency in order to be used (successfully) by a client (other than
>>> possibly form encoding for POSTs).
>>>
>>>        
>> I don't know much about these two method. But I do not think it is any
>> different from GET/POST/PUT.
>>      
> http://www.rfc-editor.org/rfc/rfc5789.txt
>    
Again, they get caught-up by assuming the nature of a URI's 
implementation. URI is an interface but not an implementation. If 
http://example.com/anApple denotes an Apple, what does it mean to PATCH 
an apple? Even if it is a database, what does it mean to PATCH it?

The semantic web activities faces a serious dilemma. On the one hand, 
they want to generalize the expressive power of URI, i.e., use it to 
denote everything (which is good). On the other hand, they cannot get 
over the hump of a traditional file system. This is how TAG gets caught 
up in this "information resource" stuff....

>>> Perhaps the real question is: does an ontology weigh in heavily enough
>>> to be considered a definition of syntax, in the specific use case of a
>>> functionality dependent http verb?
>>>
>>>        
>> It is a misunderstanding to equip RDF with ontology. Any typing defines
>> an ontology. RDF are two things, one is URI and the other is logic. The
>> former intends to unify our symbol space; the latter tries to unify
>> language.
>>      
> Can you expand / re-write (if important) I don't quite follow
>    
Sorry, equip is mis-spelled for "equal". What I want to say is: any 
definition is an ontology, regardless of the syntax. An ontology can be 
defined in RDF, but doesn't have too.
>    
>>> side: personally I don't see this having anything to do with conneg or
>>> "content sniffing", they are more niceties than necessities as far as I
>>> know.
>>>
>>>        
>> It is just the opposite from my standpoint. PATCH etc are more niceties
>> than necessities. I can define whatever the patching semantics in a
>> document type and GET/PUT it. Don't think needs a different vocabulary
>> to do it. To follow the PATCH practice, the HTTP verb will soon to explode.
>>      
> PATCH is pretty critical to updating large graphs and files (send a
> couple of hundred bytes rather than a few million).
>
> http://www.w3.org/DesignIssues/Diff
>    
Of course it is. But, can't you define a document-type carrying the 
PATCH message? Say,

http://example.com/a/patch/method

And then POST it with the document-type URI as its semantic type with 
different syntax to suite different agents.

And at this document-URI, you can define various ways to serialize the 
well-defined semantics, such as in XML, RDF, YAML, JSON etc. In fact, if 
you use a syntax which has a schema language, e.g., XML,

GET http://example.com/a/path/method
Accept: application/xml

should resolve to the schema.

Furthermore, because the semantic is now well defined, you can even 
provide service for your clients to transform from one syntax to 
another. Wouldn't it be a much better way to "patch" things up than 
defining another HTTP verb?

Xiaoshu

>>> Larry Masinter wrote:
>>>
>>>        
>>>> An "Internet Media Type" is more than a definition of syntax --
>>>> it's is an indication of intent, by the sender, for how the sender
>>>> wishes the receiver to interpret the content being sent.
>>>>
>>>> While it's desirable, alas it is not supported:
>>>> The space of "Internet Media Types" does not provide sufficient
>>>> granularity for many applications that want to use the accept:
>>>> header to control negotiation.
>>>>
>>>> But the types available do not partition the space of
>>>> syntax and semantics.
>>>>
>>>> One of the sources of analysis for thinking about file types
>>>> and file formats is the archival community.
>>>>
>>>> http://hul.harvard.edu/ois/digpres/guidance.html
>>>>
>>>> In addition, there is the IETF work on "media features":
>>>> http://tools.ietf.org/html/rfc2912
>>>> http://tools.ietf.org/html/rfc2506
>>>> http://tools.ietf.org/html/rfc2938
>>>>
>>>> which was intended to provide additional information about
>>>> the file format other than the Internet Media Type.
>>>>
>>>> I think part of the advice I'd like to include are things
>>>> that Internet Media Types (MIME types) *aren't* good for,
>>>> even though people have tried or might think it is desirable.
>>>>
>>>> http://en.wikipedia.org/wiki/Internet_media_type
>>>>
>>>>
>>>> RTP and SIP have extended MIME types in ways that don't
>>>> exactly match MIME for email, but so has the web.
>>>>
>>>> Larry
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Xiaoshu Wang [mailto:xiao@renci.org]
>>>> Sent: Monday, April 05, 2010 2:09 PM
>>>> To: Story Henry
>>>> Cc: nathan@webr3.org; Larry Masinter; www-tag@w3.org
>>>> Subject: Re: Media Type Sub-Sub-types?
>>>>
>>>> On 4/5/10 2:44 PM, Story Henry wrote:
>>>>
>>>>          
>>>>> On 5 Apr 2010, at 01:07, Nathan wrote:
>>>>>
>>>>>
>>>>>
>>>>>            
>>>>>> For instance, if the need for an ****+rdf media type scenario came
>>>>>> about, then the specification / media type could determine a fixed
>>>>>> serialization, as in only n3 not rdf/xml or other.
>>>>>>
>>>>>>
>>>>>>              
>>>>> This is really the wrong way to do things. Media types should not
>>>>> determine
>>>>> what the content of what you are going to get back is about, only
>>>>> what the syntax is.
>>>>>
>>>>>
>>>>>            
>>>> This is not true. A document should have two types -- one is syntax and
>>>> the other semantic. (Syntax is also deal with semantic as everything
>>>> else in the world but it deals with the structural semantics as opposed
>>>> to domain semantics).
>>>>
>>>> I work in the life science domain. And often there are more than one
>>>> markup languages of the same syntax, e.g., in XML. The same data thus
>>>> can be expressed in more than one way. And since the domain semantics of
>>>> these markup languages are overlapping but subsuming each other, we face
>>>> a dilemma of how to serve them. To give each ML a URI faces the question
>>>> of how to linked them together. Even with the same markup languages, we
>>>> faced the problem how to serve different version of data under the same
>>>> URI. Using content negotiation helps solve these problem. But without a
>>>> document-type definition, it does work as well. Also, giving the type
>>>> definition a URI makes it further linked. I have discussed the issue at
>>>> http://wot.renci.org/tr/doc_uri.
>>>>
>>>> I used to define the document type as an extension to the current type.
>>>> Similar to Jan's proposal of using profile but using a different token.
>>>> However, I find it that I can not use that convention to serve with
>>>> straight-forward Apache's var file. The convention that I used is
>>>>
>>>> mime/type/semantic-type.
>>>>
>>>> Thus, in Jan's case, it would be as follows:
>>>>
>>>> application/http://xmlns.com/foaf/0.1/
>>>>
>>>>
>>>>
>>>>          
>>>>> The way the atom people are working on creating a mime type for every
>>>>> application is crazy. There are an infinite number of things one can
>>>>> speak about. Should we have an infinite number of mime types?
>>>>> Clearly not.
>>>>>
>>>>> The better way to do things is to have do it through links. So when
>>>>> you have
>>>>>
>>>>> :joe foaf:knows :jack .
>>>>>
>>>>> The the document in which :jack is defined is clearly going to be
>>>>> some form of personal profile document.... At least you should find
>>>>> more info about jack.
>>>>>
>>>>>
>>>>>            
>>>> Well, I don't think that is good enough. If I am a JavaScript engine,
>>>> what should I get once I got to foaf:knows? I don't read RDF, only JSON.
>>>> Even if I can read JSON, that does not mean that I know how to
>>>> meaningfully processes all JSON's constructs that are out there, right?
>>>>
>>>> Xiaoshu
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>          
>>>
>>>        
>>
>>
>>      
>
Received on Tuesday, 6 April 2010 15:20:05 UTC