[whatwg] on bibtex-in-html5 from Ian Hickson on 2009-06-10 (public-whatwg-archive@w3.org from June 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Wed, 10 Jun 2009 09:44:36 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0906100857240.1648@hixie.dreamhostps.com>
On Wed, 20 May 2009, Bruce D'Arcus wrote:
>
> Re: the recent microdata work and the subsequent effort to include 
> BibTeX in the spec, I summarized my argument against this on my blog:
> 
> <http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on-the-inclusion-of-bibtex-in-html5>

| 1. BibTeX is designed for the sciences, that typically only cite
|    secondary academic literature. It is thus inadequate for, nor widely
|    used, in many fields outside of the sciences: the humanities and law
|    being quite obvious examples. For this reason, BibTeX cannot by
|    default adequately represent even the use cases Ian has identified.
|    For example, there are many citations on Wikipedia that can only be
|    represented using effectively useless types such as "misc" and which
|    require new properties to be invented.

We will probably have to increase the coverage in due course, yes. 
However, we should verify that the mechanism works in principle before 
investing the time to extend the vocabulary.


| 2. Related, BibTeX cannot represent much of the data in widely used
|    bibliographic applications such as Endnote, RefWorks and Zotero except
|    in very general ways.

If such data is important, we can always add support when this becomes 
clear.


| 3. The BibTeX extensibility model puts a rather large burden on inventing
|    new properties to accommodate data not in the core model. For example,
|    the core model has no way to represent a DOI identifier (this is no
|    surprise, as BibTeX was created before DOIs existed). As a
|    consequence, people have gradually added this to their BibTeX records
|    and styles in a more ad hoc way. This ad hoc approach to extensibility
|    has one of two consequences: either the vocabulary terms are
|    understood as completely uncontrolled strings, or one needs to
|    standardize them. If we assume the first case, we introduce potential
|    interoperability problems. If we assume the second, we have an
|    organizational and process problem: that the WHATWG and/or the
|    W3C-neither of which have expertise in this domain-become the
|    gate-keepers for such extensions. In either case, we have a rather
|    brittle and anachronistic approach to extension.

I don't see any of this as a problem.


| 4. The BibTeX model conflicts with Dublin Core and with vCard, both of
|    which are quite sensibly used elsewhere in the microdata spec to
|    encode information related to the document proper. There seems little
|    justification in having two different ways to represent a document
|    depending on whether on it is THIS document or THAT document.

I don't understand this point. Could you provide an example of this 
conflict?


| 5. Aspects of BibTeX's core model are ambiguous/confusing. For example,
|    what number does "number" refer to? Is it a document number, or an
|    issue number?

What's the difference? Why does it matter?


| My suggestion instead?
| 1. reuse Dublin Core and vCard for the generic data: titles,
|    creators/contributors, publisher, dates, part/version relations, etc.,
|    and only add those properties (volume, issue, pages, editors, etc.)
|    that they omit

This seems unduly heavy duty (especially the use of vCard for author 
names) when all that is needed is brief bibliographic entries.


| 2. typing should NOT be handled a bibtex-type property, but the same way
|    everything else is typed in the microdata proposal: a global
|    identifier

Why?


| 3. make it possible for people to interweave other, richer, vocabularies
|    such as bibo within such item descriptions. In other words, extension
|    properties should be URIs.

This is already possible.


| 4. define the mapping to RDF of such an "item" description; can we say,
|    for example, that it constitutes a dct:references link from the
|    document to the described source?

The mapping to RDF is already defined; further mappings can be done using 
the "sameAs" mechanism.


On Thu, 21 May 2009, Henri Sivonen wrote:
> 
> The set of fields is more of an issue, but it can be fixed by inventing 
> more fields--it doesn't mean the whole base solution needs to be 
> discarded. Fortunately, having custom fields in .bib doesn't break 
> existing pre-Web, pre-ISBN bibliography styles. I've used at least these 
> custom fields:
> 
> key: Show this citation pseudo-id in rendering instead of the actual id used
> for matching.
> url: The absolute URL of a resource that is on the Web.
> refdate: The date when the author made the reference to an ephemeral source
> such as a Web page.
> isbn: The ISBN of a publication.
> stdnumber: RFC or ISO number. e.g. "RFC 2397" or "ISO/IEC 10646:2003(E)"
> 
> Particularly the 'url' and 'isbn' field names should be obvious and 
> uncontroversial additions.

"url" seems widely supported and I included it. I haven't added any other 
fields yet; I imagine that once this feature gets traction, we'll have 
more direct data as to which fields would be most useful, and then we can 
see what common practices are in the bibtex world for those cases and use 
compatible mechanisms.


On Thu, 21 May 2009, Bruce D'Arcus wrote:
> Henri wrote:
> > This doesn't mean that BibTeX is a bad basis. The set of types and 
> > fields is limited, though.
> 
> It's limited, and it's flat.

Right. That's a good thing. It makes the vocabulary more usable.


> > Since renderings of bibliography don't show the type of the reference 
> > usually, having to use 'misc' for almost everything isn't a practical 
> > problem although it is aesthetically displeasing.
> 
> But this is not the point of adding structured data to HTML; it's to 
> allow it be extracted, and subsequently processed, as data.

No, not at all. The point here is exclusively to address the use cases 
described here:

   http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2009-May/019833.html


On Tue, 2 Jun 2009, James Graham wrote:
> 
> 1) It seems like this and similar sections (bibtex, vCard, iCalendar) 
> could be productively split out of the main spec into separate normative 
> documents, since they are rather self-contained and have rather obvious 
> interest for communities who are unlikely to find them at present or to 
> be interested in the rest of the spec. Although the drag and drop stuff 
> being dependent on them does mean that you'd need some circular 
> references.

So far based on my experience with the Workers, Storage, Web Sockets, and 
Server-sent Events sections, I'm not convinced that the advantage of 
getting more review is real. Those sections in particular got more review 
while in the HTML5 spec proper than they have since.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 10 June 2009 02:44:36 UTC