Re: ISSUE-41/ACTION-97 decentralized-extensibility

On Oct 3, 2009, at 02:45, Shelley Powers wrote:

> I can speak for an SVG editor, Inkscape, which uses namespaced  
> elements and attributes to record information about the SVG it  
> produces. And before you mention using HTML comments, you should  
> spend time with an Inkscape SVG file, to see how extensive the use  
> of namespaced elements and attributes are in an Inkscape managed SVG  
> file.

Inkscape is indeed a good example.

It isn't clear to me why Inkscape couldn't serialize its state using a  
proprietary key-value syntax inside comment nodes instead of using  
attributes as the key-value syntax. I can see why one might consider  
it unnatural to invent a product-specific key-value syntax when the  
off-the-shelf part of the underlying XML language framework already  
offers attributes. (OTOH, xml-stylesheet is precedent for minting  
above-XML-layer key-value syntax.)

The copious amounts of internal state that Inkscape dumps into files  
raises the question of whether it's a good idea to dump that kind of  
state in files used for interchange.

> I'm also aware of other applications that have defined attributes to  
> be applied to HTML elements so that JavaScript libraries can do  
> something with the data. A case in point is a accordion JS  
> application that would depend on these attributes. To use the  
> application, I had to add these attributes to the div elements that  
> formed the accordion panels and header bar.
> Now, the HTML5 custom data  could be useful for something like this,  
> except for a problem: According to the HTML5 Specification, "These  
> attributes are not intended for use by software that is independent  
> of the site that uses the attributes."

As I understand it, JS libraries included by the page itself are not  
"independent of the site" for the purpose of the spec sentence even if  
the libraries are treated as off-the-shelf commodity parts that aren't  
modified by the site author.

> Probably the reason why is there is nothing in data-* to handle name  
> collision. After all, there's only so many types of words to use,  
> and there could very well be collision between a JavaScript library  
> that does an accordion, and perhaps another one that does a tabbed  
> UI, and both are used in the same site.

If library authors are concerned of collisions, the probability of  
collision goes near enough zero if you use the name of the library as  
part of the identifier and rely on social and trademark structures to  
avoid collisions of library names. E.g. data-dojo-foo and data-jquery- 
foo don't collide with each other and also don't collide with data-foo.

> Using RDFa for these purposes would be inappropriate because that's  
> not the underlying purpose for RDFa.

I think a stronger argument for why RDFa (or Microdata for that  
matter) is inappropriate for this use case is that the RDF graph  
represented by RDFa doesn't have data model-level correspondence to  
particular elements in the DOM even though syntactically the graph and  
the DOM are overlaid.

> An extensibility mechanism that is decentralized would be  
> appropriate, because such a mechanism would have, built in, the  
> capability of dealing with name collision. That means JS library 1  
> could define it's own "js-tab" attribute, and JS library 2, could  
> define its own "js-tab2" attribute, and a person could use both  
> libraries without a problem.

You can do this as data-library1-tab and data-library2-tab. These  
don't collide with data-tab, so this scheme works even without  
everyone participating.

> In my opinion, RDFa is a decentralized extensibility, yes, but with  
> a limiting constraint: the data that is recorded using RDFa is based  
> on a specific model, RDF.


> As such, it's extensible, in that many data vocabularies can be  
> defined within RDF, and recorded with RDFa. It's extensible in the  
> same way that the SQL data model is extensible. Like RDF, the SQL  
> model is decentralized, too, in that you don't have to go to some  
> governing body in order to define a new use for SQL. Wordpress has  
> its own database design that differs from Drupal's, but both are  
> based on the SQL Data model.

I don't follow this example. Surely typical SQL databases have a table  
called 'users' and you can't just go and merge two SQL databases  
without a table name or schema collision.

>>>> How does the Web become better if additional pieces of native- 
>>>> code software hook to the DOM in addition to hooking to <object>/ 
>>>> <embed> and a byte stream?
>>> Native code software?
>> Code implemented as instructions native to the CPU. The way NPAPI  
>> plug-ins and ActiveX controls are implemented.
> You're talking about the additional functionality necessary in order  
> to parse the namespaced elements in HTML, and attributes, and store  
> them in the DOM? We're not talking about XHTML, I'm assuming, since  
> this capability already exists. You're talking about the proposal  
> and its impact on the parsing, and JS access, in HTML?

I'm not talking about additional functionality in building the DOM in  
the parser. I'm assuming that the MS proposal meant to specify changes  
to the functionality shipped in every HTML parser.

I'm talking about the processing of the DOM subtrees that contain  
extension elements or attributes once the tree has been built by the  
browser-supplied parser. I'm asking if the purpose of enabling such  
subtrees is to increase the plug-in API surface of browsers and enable  
third-party native code implement the presentation of such subtrees.

>>> Well, when it comes to namespaced elements in SVG in an HTML  
>>> document, I can see immediate benefit to JS libraries accessing  
>>> those elements.
>> SVG is not a "decentralized" extension to HTML, AFAICT. It's  
>> centralized right here at the W3C together with HTML.
> I think you misunderstood my answer. And I'm not quite sure if your  
> understanding of "centralization" is the same as mine, either, from  
> how you're using it in the above sentence.
> I wasn't talking about SVG as an extension to HTML. I was talking  
> about SVG being part of HTML, and the fact that it's fairly common  
> for namespaced elements and attributes to be used in SVG. Common,  
> and permissible based on the SVG specification.

I see. Are existing JS libraries operating on SVG trees in existing  
Web content using namespaced attributes and elements in way that data- 
* attributes don't address?

> I tried to explain some uses and interests in distributed  
> extensibility above. Let me know if these weren't sufficient.
> I believe that Tony also referenced a view of distributed  
> (decentralized) extensibility, as well as some possible use cases.

I'm interested in seeing a definition of what "decentralized  
extensibility" so that alternative proposals can be tested against the  
definition to determine if they constitute "decentralized  
extensibility". (I guess it wouldn't be unexpected if you, Tony and  
Sam came up with different definitions, although so far Sam seems to  
be avoiding writing down a definition even when asked.)

Currently, the WG lacks a definition against which to assess if e.g.  
the naming scheme Jonas mentioned (<org_example_foo>) would be  
"decentralized extensibility".

>> Is the set of characteristics a proper subset of the  
>> characteristics of Namespaces in XML or are the sets one and the  
>> same?
> Being able to handle name collision is probably the biggest area of  
> concern.
> Any form of extensibility has to be able to handle a wide, diverse  
> audience, who may or may not be aware of the work others are doing  
> when they take advantage of the extensibility.

For clarity, do you mean element and attribute names?

Are there any other salient characteristics of "decentralized  
extensibility", in your view, than
  * a diverse audience being able to mint names
  * collision avoidance when the minters are unaware of each other

Would a scheme where names with an underscore are reserved for  
extensions and anyone minting an extension name is told to prefix the  
name with their domain name with components reversed and joined with  
underscores (followed by an underscore), e.g. org_example_foo, be  
"decentralized extensibility"?

>>>> * When content depends on language extensions that need client  
>>>> software extensions to process, the ability of users to read Web  
>>>> content is harmed in software/hardware environments for which the  
>>>> client software extensions aren't available.
>>> I would say that a significant proportion of HTML5 falls into the  
>>> category of needing implementation that isn't universally  
>>> available in all environments.
>> As far as I know, the HTML5 spec is royalty-free and it's being  
>> implemented in multiple engines some of which are Open Source.  
>> There doesn't seem to be any one party controlling the availability  
>> of an HTML5 implementation for a given computing platform.
> You really misunderstood me on this one. I think it's the different  
> views we each have.
> By this I meant that there are elements of HTML5 that are not  
> currently supported in any browser, and won't be available in all  
> environments for potentially years in the future.
> I can see now you must have meant something about royalty or  
> patents? Is that it? I'm not sure where you're coming from with this  
> one, based on your most recent answer.

I mean that the ability of a user to read HTML5 or SVG content is not  
gated by one party. The user's ability to read content in HTML+FooML  
(where FooML is an extension) could be gated by one party if the  
ability to process FooML is only available from the proprietor of  
FooML as a binary extension to those browsers that the proprietor of  
FooML chooses to support.

>>>> * Working with string tuple identifiers is harder than working  
>>>> with simple string identifiers.
>>> Again, this has nothing to do with your concern about  
>>> decentralized extensibility. I think we should focus on the most  
>>> significant concern, address it, and then move on to  
>>> implementation once past that initial concern. Don't you think?
>> It depends on whether 'decentralized extensibility' is synonymous  
>> with Namespaces.
> No, I don't believe that decentralized, or distributed,  
> extensibility has been defined as another name for Namespaces.


>>>> * Prefix-based indirection (where the prefix expands to something  
>>>> as opposed to being just a naming convention) confuses people.
>>> Again, outside of the initial concern about decentralized  
>>> extensibility as a whole.
>> It depends on whether 'decentralized extensibility' is synonymous  
>> with Namespaces.
> So, what you're saying, then, is that your concern really isn't  
> about distributed/decentralized extensibility. Your concern is about  
> the specific implementation. Do I have that right?

Assuming that the <org_example_foo> naming convention without API  
changes counts as decentralized extensibility, then this particular  
concern doesn't apply to decentralized extensibility in general but to  
a particular implementation.

However, my concern about needing proprietary processing software for  
a FooML extension seems to apply to any extension scheme where  
extensions are processed via broadened plug-in API surface.

> Hopefully others will jump in and provide you the answers I don't  
> seem to be able to provide. I do feel you deserve answers to your  
> questions.

Thanks. I'm hoping Tony (when he returns from vacation) and Sam would  
clarify what their goals are when it comes to the "decentralized  
extensibility" label.

Henri Sivonen

Received on Monday, 5 October 2009 12:09:57 UTC