Re: Why bound prefixes are an anti-pattern in language design from Ian Hickson on 2009-08-06 (public-rdf-in-xhtml-tf@w3.org from August 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Thu, 6 Aug 2009 22:07:53 +0000 (UTC)
To: Danny Ayers <danny.ayers@gmail.com>, Martin McEvoy <martin@weborganics.co.uk>
Cc: RDFa Developers <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <Pine.LNX.4.62.0908062144000.28566@hixie.dreamhostps.com>
On Thu, 6 Aug 2009, Danny Ayers wrote:
> 
> I think I'm fairly safe in assuming for example more people use the 
> WYSIWYG interface of Wordpress than copy-and-pasting raw markup.

I don't understand the relevance of that. Just because 90% of people don't 
write HTML, doesn't mean that the 10% of people who do suddenly find it 
easier and can handle prefix mechanisms easily.


> >> > Copy-and-paste is how the Web evolved, so I think it is important 
> >> > to keep it functional and easy.
> >>
> >> Yes, but it *did* evolve -
> >
> > Should it stop evolving? That's a depressing thought.
> 
> Quite the opposite of what I'm suggesting. Namespaces (and their 
> prefixes) offer an open-ended route to evolution, not locked to whatever 
> a specific interpretation mechanism (i.e. your favourite browser) can 
> do.

Whether extensibility is desireable or not is completely orthogonal to 
whether the syntax for said extensibility is easy to use or not.

If we want extensibility in this manner (and it's not clear to me that we 
do, but that's another story altogether) -- if we want extensibility in 
this manner and if we want it to be actually used, then it is absolutely 
critical that such extensibility mechanisms be easy to author.

By making the namespace syntax hard, we are putting up barriers to 
adoption of namespaces. By making RDFa use a prefix indirection mechanism, 
we are putting barriers up to the adoption of the RDFa annotation system.


> >> what proportion of Web pages do you think were created through c&p 
> >> now?
> >
> > A huge amount. I'd be shocked if many of the WordPress templates were 
> > created from scratch, for instance -- I expect most of them are 
> > created by varying pre-existing templates. That's certainly how we 
> > seem to write specs at the W3C -- has anyone ever written a W3C spec 
> > by starting from a blank page rather than starting from a template or 
> > another spec and replacing the meat?
> 
> What proportion of Wordpress users do you think use the templates off 
> the shelf? Ok arguably that's copy & paste, but it's independent of the 
> complexity of the template.

I don't understand the relevance of people who don't author HTML to the 
complexity of authoring HTML.


> >> > Prefixes are notoriously hard for authors to understand. As far 
> >> > back as 2004, Micah wrote "As the author of an O'Reilly book on 
> >> > XForms, I can report that 90% of the technical questions from 
> >> > readers involve confusion related to namespaces".
> >>
> >> Yet most programmers somehow seem to get their heads around 
> >> namespaces for their libraries, and XML namespaces has 
> >> not-insignificant deployment.
> >
> > Every year, people in the US wrap their heads around the US Tax code, 
> > but that doesn't mean that its complexity is acceptable.
> 
> Every year people put one foot in front of the other and manage to walk. 
> The physics is non-trivial yet it works.

People walking don't have to understand the physics of walking in order to 
avoid stumbling around. People filing their taxes and people using prefix 
mechanisms _do_ have to understand those mechanisms in order to avoid 
stumbling around and misfiling taxes or writing markup that isn't what 
they want.


> >> >   http://www.w3.org/2004/04/webapps-cdf-ws/papers/verity.html
> >> >
> >> > Parand Darugar has said similar things: "Experience shows XML
> >> > namespaces can be a common cause of confusion and a major complicating
> >> > factor in XML adoption."
> >> >
> >> >   http://www.ibm.com/developerworks/library/x-abolns.html
> >> >
> >> > Fundamentally, prefixes are an indirection model. Indirection models
> >> > are very, very hard for people to understand.
> >>
> >> Spurious claim.
> >
> > I disagree; I think the (admittedly circumstancial and anecdotal) evidence
> > I listed in my e-mail is enough to demonstrate that this is true.
> 
> Right. If you give me 5 minutes with Google I'm sure I can find
> anecdotal evidence that Jesus loves SOAP.

To be blunt, from my point of view it doesn't really matter if I convince 
you here -- I think that technologies that depend on prefix binding 
mechanisms are making themselves excessively complicated, and so I'm going 
to avoid making what I perceive is that mistake in technologies I work on. 
You are welcome to continue using prefix binding mechanisms in 
technologies you work on.


> >> The whole notion of a markup language is based on indirection - the 
> >> syntax is attached to some kind of dispatching mechanism to interpret 
> >> it - you don't get to follow a link by magic.
> >
> > The more layers of indirection you add, the more complex a system 
> > becomes. Just because we got away with one layer, doesn't mean we 
> > should jump up and add another.
> 
> There's no jumping up needed - it can be ignored by them that don't want 
> it.

Unfortunately, it can't. For example, Henri would have loved to ignore the 
namespace prefix nonsense when writing his HTML5 validator, but it ended 
being one of his biggest time sinks.

People who write HTML will be exposed to this syntax whenever the 
documents they work on have had RDFa added by someone else. They can't 
just ignore the RDFa -- the slightest mistake while moving stuff around 
can completely break the RDFa.


> > (But I don't think authors see markup as an indirection mechanism, I 
> > think they see it as a direct means of expression.)
> 
> Fair point, but I reckon most authors don't care about the markup at all 
> - the Wordpress WYSIWYG thing again.

When I say "author" I mean someone who is authoring HTML. So by 
definition, when I refer to "authors" I am talking about people who, to at 
least some extent, care about markup.


> Hoping to make the point that software is a world of indirection (for 
> practical purposes by definition) so pretending one form of indirection 
> is more palatable than another is unwise.

Some forms of indirection _are_ more palatable than others. You are over- 
generalising, IMHO.


On Thu, 6 Aug 2009, Martin McEvoy wrote:
> > 
> > Prefixes are notoriously hard for authors to understand. As far back 
> > as 2004, Micah wrote "As the author of an O'Reilly book on XForms, I 
> > can report that 90% of the technical questions from readers involve 
> > confusion related to namespaces"
> 
> Here is where I have to stop you I am afraid, I am not talking about 
> XML/Namespaces, I am only talking about prefixing mechanisms to convey 
> semantic meaning in a way it was intended by the author.

As far as I can tell, the authoring problems of one apply equally to the 
other.


> for example:
> http://microformats.org/wiki/hatom-faq#Why_does_hAtom_use_class_names_with_prefixes

Those are not indirection-based bound prefixes. They are just identifiers 
that happen to have a common beginning. That's a completely different 
kettle of fish.

If we dropped the xmlns:foaf="..." bit and just defined that foaf:Person 
was a FOAF person and you could never change the "foaf:" part, I wouldn't 
be complaining. The problem is that you _can_ change the "foaf:" part.


> This is more what I mean by prefixing mechanisms, these kind of mechanisms
> give wider, richer semantic scope than just simple generic keywords.

The Microformats case isn't using anything more than generic keywords that 
happen to all start with a common string. I'm fine with that, if that's 
what you want to do. My problem is when you start allowing that string to 
vary and define a mechanism to bind that string to some other string.


> In hAtom prefixing works and doesn't cause many issues or confusion, 
> sure the question "is that a namespace" comes up and the answer is 
> simply no.

Indeed.


> consider this example:
> 
> <div prefix-entry="http://tools.ietf.org/html/rfc4287#"
>       class="hentry">
>    <h2 class="entry-title">My foo article</h2>
>    <div class="entry-content">Hey this is foo.</div>
> </div>

The problem here is with the "prefix-entry" part. As soon as you introduce 
a way to bind a prefix to another, then you have the indirection problem I 
describe.

Consider this:

  <div prefix-x="http://tools.ietf.org/html/rfc4287#"
        class="hentry">
     <h2 class="x-title">My foo article</h2>
     <div class="x-content">Hey this is foo.</div>
  </div>

If things are defined such that this and the previous example have the 
same meaning, then there's a problem, IMHO.



> All the above example tries to demonstrate  is creating a scope for which
> terms can be used, there Is no namespace voodoo going on, just prefixes used
> in a "meaningful" way, the content of prefix-entry is just text that may or
> may not be referenced sometime later, Its a reference to where the meaning of
> these scoped terms are being used.

In hAtom, the scope is defined by class="hentry", not by a prefix 
declaration.


On Thu, 6 Aug 2009, Danny Ayers wrote:
>
> PS. Ian, please name me a programming language that doesn't have a long 
> name -> short name thing. The Web uses computers too.

Programming is hard. Do we really want markup authoring to be as hard as 
programming?

Also, most programming languages don't use prefix binding, they use scoped 
indentifier imports, or qualified identifiers. Both of these mechanisms 
are IMHO significantly simpler than binding prefixes. Qualified 
identifiers are ideal, since they are context-free. Importing names is ok, 
though still somewhat brittle.


On Thu, 6 Aug 2009, Martin McEvoy wrote:
> 
> RDFa is *NOT* evoking some kind of namespace routine or indirection 
> behaviour, at least that is not what is intended, its creating Scope 
> within the document which uses RDF terms and values hence the 
> "attributes" to emulate a certain behaviour, the re-use of xmlns for 
> something "meaningful" makes no difference because in HTML5 *xmlns* has 
> "no meaning" its just a token value much the same as your data- 
> attribute, its just a convenient "hook" for something other than the 
> browser, ":" is only text that represents a union of things much the 
> same way as the little stick thing people seem to be attached to "-"  
> ;)

Does RDFa rely on strings that are bound to other strings in order to form 
identifiers when combined with a third set of strings?

i.e. when you see an identifier in RDFa, like "foaf:Person" or "Ja:son", 
can you immediately know what they mean without having to look at all the 
ancestor elements?

Consider:

  <div xmlns:Ja="http://xmlns.com/foaf/0.1/Per"
       xmlns:foaf="http://example.com/">
   <span property="foaf:Person">A</span>
   <span property="Ja:son">B</span>
  </div>

What are the triples? Is that intuitive? If the answer is no, then the 
problem I've been describing is present.

Can you change the triples without changing the <span> elements on which 
the triples are defined? If the answer to that last question is yes, then 
the problem I've been describing is present.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 6 August 2009 22:08:30 UTC