W3C home > Mailing lists > Public > www-tag@w3.org > February 2013

Re: Revisiting Authoritative Metadata

From: Eric J. Bowman <eric@bisonsystems.net>
Date: Wed, 27 Feb 2013 15:39:44 -0700
To: Robin Berjon <robin@w3.org>
Cc: "www-tag@w3.org List" <www-tag@w3.org>
Message-Id: <20130227153944.cdaffc62116cdb7a0edd2b32@bisonsystems.net>
Robin Berjon wrote:
> 
> This discussion is taking place within the specific context of the 
> Authoritative Metadata TAG finding, which contains two paragraphs
> about the proclaimed superiority of authoritative metadata over
> embedded typing:
> 
>      http://www.w3.org/2001/tag/doc/mime-respect#embedded
> 
> As explained in detail here:
> 
>      http://lists.w3.org/Archives/Public/www-tag/2013Feb/0130.html
> 
> neither of those paragraphs is in any way substantiated. They just 
> proclaim their content with no reference to fact.
> 

I'll agree that those paragraphs do assume familiarity with REST, when
they should reference or quote it; the hint is "self-descriptive." Your
framing of the issue, "assuming someone does have a use case, is it
worth the cost of requiring a content type on every response" both
consistently rejects presented use-cases as somehow irrelevant, and
mistakenly assumes that these use-cases are the rationale behind
self-descriptive messaging, rather than a by-product.

The middle paragraph of section 3.3 gives us another hint, obviously
derived from REST, to those of us familiar with it:

"Intermediaries (i.e., proxies and gateways) perform significant
functions in Web architecture, such as encapsulating legacy services,
enhancing client functionality, and moderating the risk of interactions
across firewalls. Those functions can only be performed correctly if
the semantics of a given message are expressed within that message."

Emphasis on the last sentence.  My security policy for myself or my
clients has at one time or another disallowed various file formats at
the firewall, based on media type:

http://www.opera.com/support/kb/view/852/
http://technet.microsoft.com/en-us/security/bulletin/ms04-028
http://technet.microsoft.com/en-us/security/bulletin/MS09-047
http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2009-0263
http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2011-2949

Of course, this assumes a JPEG labelled as image/gif simply won't cause
any harm; defeated entirely by browser sniffing, a rather glaring
security hole -- taking away the sysadmin's ability to prevent clients
from decoding JPEG files, even just to sniff the magic number, which
could itself trigger a buffer overrun.

Requiring components to decode payloads, instead of self-descriptive
messaging, introduces a known class of vulnerabilities arising from
merely decoding embedded metadata in binary formats *without* requiring
an attempt be made to render the payload.

> 
> The REST thesis is a 180 pages long document...
>

It never ceases to amaze me when TAG members are unfamiliar with what
should be required (re-) reading prior to accepting their position.  I
didn't even *think* about posting to this list until I'd gained working
knowledge of REST, naievely holding myself to a higher standard than TAG
membership turns out to actually require.  Anyway, I've highlighted the
specific concepts which substantiate Authoritative Metadata, below; not
so long as compared to reading the thesis, but certainly not meant as a
substitute for those who've not read it in its entirety:

=====================
http://www.ics.uci.edu/~fielding/pubs/dissertation/top.htm
=====================

>
> [W]hile I admit that it's been a while since I last read it, I'm
> pretty sure that it says a few things other than some variation on
> "authoritative media types are really good". Keeping that in mind, if
> falsifying REST is what you expect me to do, then I would find it
> most helpfully urbane of you were you to point me to a specific
> section.
> 

The most important constraint in REST is the Uniform Interface.  One of
UI's four sub-constraints is self-descriptive messaging.  The rationale
is intermediary processing, without which the Web would collapse under
its own weight.  Referring to specific sections of REST, with pertinent
concepts annoyingly capitalized...

Starting with Section 2.3, "Architectural Properties of Key Interest,"
we can identify certain goals pertinent to this discussion; "I have
included only those properties that are clearly influenced by the
restricted set of styles surveyed."

2.3.1.3 Network Efficiency
"An interesting observation about network-based applications is that
the best application performance is obtained by not using the network."

2.3.4.5 Reusability
"The primary mechanisms for inducing reusability within architectural
styles is reduction of coupling (knowledge of identity) between
components and constraining the GENERALITY OF COMPONENT INTERFACES."

2.3.5 Visibility
"Styles can also influence the visibility of interactions within a
network-based application by restricting interfaces via GENERALITY or
providing ACCESS TO MONITORING. Visibility in this case refers to the
ability of a component to MONITOR or MEDIATE the interaction between
two other components. Visibility can enable improved performance via
shared CACHING of interactions, SCALABILITY through layered services,
RELIABILITY through reflective monitoring, and SECURITY by allowing the
interactions to be inspected by mediators (e.g., network firewalls)."

2.3.7 Reliability
"Styles can improve reliability by avoiding single points of failure,
enabling redundancy, ALLOWING MONITORING, or reducing the scope of
failure to a recoverable action."

REST is a hybrid style derived from preexisting styles, with its
constraints chosen from the constraints of those styles, based on the
desirable properties they induce.  No new constraints are introduced by
REST.  Chapter 3 is all about the properties induced by the constraints
inherent in these various, established styles.

Chapters 3 and 4 are the nitty-gritty of the computer science involved,
i.e. how styles may be classified by induced properties, that the
inherent constraints of established styles may be combined to yield new
styles with known properties, and that this methodology may be applied
to the early Web architecture to overcome its shortcomings.

Chapter 5 walks us through derivation of the new style.  Falsification
of any part of Chapter 5 would require falsification of the pertinent
parts in Chapters 2-4; i.e. visibility is either not a desirable
property for the Web architecture, or is desirable but somehow also
accomplished via embedded metadata as it is in such-and-such a style
which Roy either failed to consider, or interpreted erroneously.

Or, Roy was outright wrong in Chapter 4, and that this is no way to
design a Web architecture; or, monitoring and mediation don't require
visibility... point is, my formal training may be Chemistry, but the
scientific method is invariant, I recognize its use here, and would
like to know where and how Roy screwed up instead of just stating his
conclusion (making self-descriptive messaging a constraint) was wrong,
for whatever reasons.

5.1.5 Uniform Interface
"REST is defined by four interface constraints: identification of
resources; manipulation of resources through representations;
SELF-DESCRIPTIVE MESSAGES; and, hypermedia as the engine of application
state."

5.1.6 Layered System
"Within REST, intermediary components can actively transform the
content of messages because the messages are SELF-DESCRIPTIVE and their
semantics are VISIBLE to intermediaries."

5.2.1 Data Elements
"REST provides a hybrid of all three options by focusing on a SHARED
UNDERSTANDING OF DATA TYPES WITH METADATA, but limiting the scope of
what is revealed to a standardized interface... [gaining] the
separation of concerns of the client-server style without the server
scalability problem, allows information hiding through a GENERIC
INTERFACE to enable encapsulation and evolution of services, and
provides for a diverse set of functionality through downloadable
feature-engines."

5.3.1 Process View
"REST enables intermediate processing by constraining messages to be
SELF-DESCRIPTIVE: interaction is stateless between requests, standard
methods and media types are used to indicate semantics and exchange
information, and responses explicitly indicate cacheability."

5.3.3 Data View
"The user-perceived performance of a browser application is determined
by the latency between steady-states: the period of time between the
selection of a hypermedia link on one web page and the point when
usable information has been rendered for the next web page. The
optimization of browser performance is therefore centered around
reducing this communication latency."

I'm not a browser developer, but that last bit certainly makes sense;
deprecating self-descriptiveness also takes out the layered-system and
cache constraints clearly dependent upon it, which can only increase
latency, which can only hurt browsers.

5.4 Related Work
"REST component interactions are structured in a layered client-server
style, but the added constraints of the generic resource interface
create the opportunity for SUBSTITUTABILITY AND INSPECTION BY
INTERMEDIARIES... [T]he C2 style lacks the intermediary-friendly
constraints of REST, such as the generic resource interface, guaranteed
stateless interactions, and intrinsic support for caching."

6.2.5 REST Mismatches in URI
"Another conflict with the resource interface of REST occurs when
software attempts to treat the Web as a distributed file system...
[A]ttempts to mirror the content of a Web server as files will fail
because the resource interface does not always match the semantics of a
file system, and because both data and metadata are included within,
and significant to, the semantics of a representation."

6.3.2 Self-descriptive Messages
"REST constrains messages between components to be self-descriptive in
order to support intermediate processing of interactions."

It is IMNSHO impossible to read through REST's hammering home the point
about intermediary processing, and honestly come to the conclusion that
Authoritative Metadata is in any way unsubstantiated.

Deprecating self-descriptiveness treats the Web as a distributed file
system, a known REST mismatch; IOW, this suggestion represents not a
clarification, but a wholesale change to the Web architecture which
throws lots of babies out with the bathwater.  If such work proceeds on
an ad-hoc basis without any grounding in established architectural
styles and constraints, I don't see how it could have any legitimacy --
why on Earth would we expect the results to work, if changes to such a
massively deployed system are made by just winging it?

-Eric
Received on Wednesday, 27 February 2013 22:39:51 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 27 February 2013 22:39:53 GMT