- From: Ian Hickson <ian@hixie.ch>
- Date: Wed, 8 Jul 2009 02:45:12 +0000 (UTC)
Based on the feedback below, I've removed the BibTeX vocabulary from HTML5. The primary use case -- enabling drag-and-drop in a manner that the target document could automatically add a reference to the source document -- can still be done between cooperating sources, it's just no longer a first-class citizen in the automatically generated drag-and-drop JSON object. (The previous mechanism found relevant citation information in the page or section footer and automatically included that.) I would encourage people interested in enabling this use case to develop a format for this to expose in the drag-and-drop API, along with some scripts to enable it. This doesn't really require built-in support so long as scripting is enabled; the APIs do provide the power to do this already. On Wed, 10 Jun 2009, Simon Spiegel wrote: > > > > > > Most of them are defined as aliases and are handled just fine by > > > biblatex. For example, journal works just as fine as journaltitle. > > > While there may be small differences they're definitely not > > > essential. In real life, most of the bibtex data publicly available > > > differs to "pure bibtex" in about the same degree. There are very > > > few places where you can get 100% correct bibtex. Biblatex certainly > > > doesn't bring a new level of incompability here. > > > > My original point was just that it seems unnecessarily incompatible > > with BibTeX, and that the latter appears to have more deployed > > support. > > > > I disagree that using the same term to mean something else (as in the > > "inbook" case) is a "small difference" that is "not essential". > > Are walking an a theoretical level what would be best "in principle", or > do we talk about what actually happens? From the fact that you > originally chose BibTeX I inferred that you want to go for a "practical" > solution which takes account of what is used in the real world. Now if > we do that, we also must take a look what actually happens in the real > world. And although this may just be anecdotal evidence I can assure > that according to my experience a) 100% correct BibTeX is the exception > and b) that the compability problems between BibTeX data that you can > download from various sites and biblatex is no big deal. About every > BibTeX style introduces its own quirks, in the majority of cases you > have to clean your data anyway after you downloaded it. So I really > don't see a fundamental problem here. But I certainly do see a > fundamental problem ? both theoretical and practical ? if you go for a > standard which is limited in major ways and which from the start > excludes about everyhing which is not english speaking hard science. > > There will always be a tradeoff, the question is which is the lesser > evil. On Wed, 10 Jun 2009, Simon Spiegel wrote: > On 10.06.2009, at 11:44, Ian Hickson wrote: > > On Wed, 20 May 2009, Bruce D'Arcus wrote: > > > > > > Re: the recent microdata work and the subsequent effort to include > > > BibTeX in the spec, I summarized my argument against this on my > > > blog: > > > > > > <http://community.muohio.edu/blogs/darcusb/archives/2009/05/20/on- > > > the-inclusion-of-bibtex-in-html5> > > > > | 1. BibTeX is designed for the sciences, that typically only cite > > | secondary academic literature. It is thus inadequate for, nor widely > > | used, in many fields outside of the sciences: the humanities and law > > | being quite obvious examples. For this reason, BibTeX cannot by > > | default adequately represent even the use cases Ian has identified. > > | For example, there are many citations on Wikipedia that can only be > > | represented using effectively useless types such as "misc" and which > > | require new properties to be invented. > > > > We will probably have to increase the coverage in due course, yes. > > However, we should verify that the mechanism works in principle before > > investing the time to extend the vocabulary. > > I really don't think that a body like WHATWG is suited for this task. > Especially since other groups have already been working on this exact > issue. > > > | 2. Related, BibTeX cannot represent much of the data in widely used > > | bibliographic applications such as Endnote, RefWorks and Zotero except > > | in very general ways. > > > > If such data is important, we can always add support when this becomes > > clear. > > What does this mean? When would it become clear? BibTeX's deficits have > been clear for ages. About everyone who works in humanities knows that > every bibliographic solution which has been introduced in the past was > too limited. Why do we have to go through the same things over and over > again? The problems of the current standards are known, that's why new > solutions like biblatex or the bibliographic ontology have been > developped. On Wed, 10 Jun 2009, Bruce D'Arcus wrote: > > No; you should drop this proposal and move it to an experimental annex. > > If you do insist, against all reason, in pushing forward with this > without modification, then I suggest you explain how this process of > extension will work. If, as I suspect, it'll be another case of a > centralized authority (you; who have admitted you really know nothing > about this space), then that's a deal-breaker from my perspective. > > [...] > The two biggest problems in bibtex are two properties: > > book > journal > > They're a problem because they're both horribly concrete/narrow, and > (arguably) redundant. > > If those were instead replaced with something more generic like either: > > 1) publication-title > > ... or, better yet ... > > 2) a nested/related object (call it "publication" or "container" or "isPartOf") > > ... then extension becomes easier. If I need to encode a newspaper > article, then I just do: > > title = Some Article > publication-title = Some Newspaper > > .. or (better, because I can attach other information to the container): > > title = Some Article > publication = [ title = Some Newspaper ] > > As is, you need to add stuff like this just to resolve the problems > I've repeayedly pointed out: > > newspaper-title > magazine-title > court-reporter-title > television-program-title > radio-program-title > > Aside: of course, some of the above could be collapsed into more > generic stuff like "broadcast-title", but I'm just following the same, > broken, approach as bibtex. > > This stuff isn't theoretical Ian. Just look through this wikipedia > page, for example: > > <http://en.wikipedia.org/wiki/Guantanamo_Bay_detention_camp> > > The citations include references to legal cases and briefs, and news > articles (television, radio and print). Your proposal doesn't cover > this stuff. > > OTOH, applications like Zoteor can. > > > | 4. The BibTeX model conflicts with Dublin Core and with vCard, both of > > | ?? ??which are quite sensibly used elsewhere in the microdata spec to > > | ?? ??encode information related to the document proper. There seems little > > | ?? ??justification in having two different ways to represent a document > > | ?? ??depending on whether on it is THIS document or THAT document. > > > > I don't understand this point. Could you provide an example of this > > conflict? > > Here's an academic article in an open access biology journal. > > <http://www.plosbiology.org/article/info:doi/10.1371/journal.pbio.1000082> > > THIS article refers to the metadata about the document proper, with > the title "Accelerated Adaptive Evolution on a Newly Formed X > Chromosome." > > The metadata about the documents referenced in text are included in > the bibliography. This is what I mean by THAT document. > > My point???and this is an important one???is that one should be able to > use to the same mechanism to describe both, but still to be able to > distinguish them. I'd think this journal would insist on it. > > > | 5. Aspects of BibTeX's core model are ambiguous/confusing. For example, > > | ?? ??what number does "number" refer to? Is it a document number, or an > > | ?? ??issue number? > > > > What's the difference? Why does it matter? > > I can't find the example, but I've come across cases where one needed > both an issue and document number. Since I haven't cited it, though, I > guess you can leave it aside ;-). > > > | My suggestion instead? > > | 1. reuse Dublin Core and vCard for the generic data: titles, > > | ?? ??creators/contributors, publisher, dates, part/version relations, etc., > > | ?? ??and only add those properties (volume, issue, pages, editors, etc.) > > | ?? ??that they omit > > > > This seems unduly heavy duty (especially the use of vCard for author > > names) when all that is needed is brief bibliographic entries. > > On what basis do you make this claim? "[A]ll that is needed" for whom? > > I'll point out here that the article I link to above includes > affiliation information for the authors. > > But this isn't the most critical point. > > > | 2. typing should NOT be handled a bibtex-type property, but the same way > > | ?? ??everything else is typed in the microdata proposal: a global > > | ?? ??identifier > > > > Why? > > a) consistency; why introduce a new mechanism (from the standpoint of > microdata)? > > b) flexibility (since I've made clear that bibtex is not adequate, and > I have no intention relying on the WHATWG to determine what's > important) > > > | 3. make it possible for people to interweave other, richer, vocabularies > > | ?? ??such as bibo within such item descriptions. In other words, extension > > | ?? ??properties should be URIs. > > > > This is already possible. > > OK, possible; but hardly very easy. See above. > > > | 4. define the mapping to RDF of such an "item" description; can we say, > > | ?? ??for example, that it constitutes a dct:references link from the > > | ?? ??document to the described source? > > > > The mapping to RDF is already defined; further mappings can be done using > > the "sameAs" mechanism. > > How so? I'm asking: what's the relationship between the document and > the cited document? On Wed, 10 Jun 2009 simon at simifilm.ch wrote: > > Related to this I want to remark some things on a more general level: We > currently experience major changes in the world of bibliographic > software. At least, this is how I experience it. After years of limited > and/or closed formats and models like BibTeX or Endnote we finally see > new models like CSL or biblatex emerging which try to learn from the > lessons from the past. Of course, I do not know how things will evolve, > but looking at the success of solutions like Zotero I think it's not so > bold to say that things will change quite a bit in the coming years. > > And then we have HTML5, an emerging standard which is now getting > support by the newest and latest browsers. I do know even less how HTML5 > will evolve, what impact it will have on the web. But it's probably fair > to say that widespread adoption of HTML5 will not happen overnight. > > Honestly, I really don't get why a coming web standard should support a > bibliographic standard which is obviously outdated. The fact that BibTeX > is widely used is really a non argument, because if we follow this logic > we wont have any development. By the same logic you should avoid > something like <video> ? after all, there isn't any support for it > *yet*. If HTML5 wants to be forward-looking, it certainly shouldn't > adopt a twenty years old standard but should instead try to support > something new which is really up to date and has chance if being useful > in the future. On Wed, 10 Jun 2009, Jonas Sicking wrote: > > [...] I'd prefer to see these things developed elsewhere. Mostly because > the group of people with expertise in developing a better version of > bibtex is not the people in this WG. > > I do think it's important to show that microdata is able to express > something like bibtex. And I do think that the discussion in the past > weeks have been interesting since people haven't actually been finding > problems in microdatas ability to express something like bibtex, but > rather in the exact bibtex format itself. > > But the exact microdata format does not seem productive to have here. It > seems completely orthogonal to the rest of HTML, so there seems to be no > win to put it in the HTML 5 spec. > > If bibtex-in-microdata can't gather enough interest outside of the HTML > 5 spec, it probably is a bad spec. On Thu, 11 Jun 2009, Simon Spiegel wrote: > > I completely agree with this conclusion. I also think that it would be a > big mistake to include bibtex and then extend it later as Ian has > suggested. > > Let me give a concrete example, take the following biblipgraphic entry: > Doe, John: Foreword. In: Doe, Jane: The Book. Middle-Earth 2008. > > What we have here is a chapter by an author in a book by someone else. > This someone else is not the editor though, but the author of the book, > This kind of text is fairly common in my field but it cannot be > expressed in bibtex since bibtex originally only has fields for 'author' > and 'editor ', but not for 'bookauthor'. > > According to Ian, something like this could be covered by extending the > bibtex vocabulary. For me, two problems pop up here: > > Who will decide how the vocabulary gets extended? And on what will these > decisions be based? > > Now lets say that some kind of process to extend the bibtex vocabulary > can be established and that the addition of a 'bookauthor' field will be > decided. The problem then is that something gets added to bibtex which > no existing bibtex style (and no other tool which can import bibtex) > knows about. AFAIK only biblatex has a 'bookauthor' field. In other > words: We then have data which is not useable with the traditional > bibtex tools (they don't break, they just wont process the new fields). > If bibtex gets extended (which would be absolutely necessary since all > kind of additional fields are needed), we unavoidably end up with some > kind of superbibtex which no tool in the world can process. In other > words: We then have a new format which looks like bibtex but which > cannot be used in a traditional bibtex workflow. At this point the whole > argument why bibtex should be used in this spec breaks down. Ian is in > favor of bibtex because it is widely used; but if we unavoidably end up > with an unuseable superbibtex, this argument becomes moot. > > If compatibility to existing formats is the main objective, we simply > can't extend an old format like bibtex. If the goal is to cover > substantially more than bibtex does, we need a different format. On Thu, 11 Jun 2009, David Gerard wrote: > > I was about to mention Wikipedia! The citation templates there would be > an excellent set of examples of what a citation format would need to > cover in practical use. See: > > http://en.wikipedia.org/wiki/Category:Citation_templates > > There's a lot there, but many aren't that heavily used. You can see how > many uses there are of a template, or if there are any at all, by going > to the template page and clicking on "What links here" in the sidebar. > The ones whose name starts "Template:Cite ..." include the biggies. > > These constitute a bunch of special cases, but you'll be pleased to know > that similar templates tend to get combined with time. I certainly > wouldn't suggest a set of special cases in a spec for this. But these > will be useful for ideas and examples of what sort of citations are in > demand on the web. On Thu, 11 Jun 2009, Bruce D'Arcus wrote: > > My immediate concern has been this particular use case, and I've been > assuming : that the microdata proposal will be included in HTML5. > > In a vacuum, I think microdata is fine technically. > > In the context of an existing spec that covers the same use cases > (RDFa), I think it's creating unnecessary and unproductive duplication. > > Just to go back to the use case I'm focusing on here, it puts metadata > producers and consumers in an awkward position of having to likely > support two different specs; means double work with no obvious benefits. > This is happening JUST as RDFa is starting to be implemented by major > players, and starting to build up a head of steam in terms of tools. > > And to put this in some context, the only reasonable technical point > that Ian has made in favor of throwing out RDFa and creating a new spec > is the prefix issue. But I have a really hard time seeing how prefixes > is so onerous a burden as to justify the costs (to the WHATWG, and to > metadata producers and consumers) of creating and maintaining a new > spec. > > FWIW, some possibly relevant background from the OpenDocument > experience: > > To make a long story short, ODF 1.2 will have an extensible metadata > system based on RDF/XML (for in--package metadata) and a subset of > RDFa.(for embedded). Getting to this solution was a long and torturous > process, and the original proposal effectively forked RDFa by requiring > fully unqualified URIs for names. The technical reasons were > more-or-less the same as those that drove Ian to invent an entirely new > spec: that in a GUI environment where users are copy-and-pasting > content, dealing with prefixes was an additional burden on implementers. > In addition, people don't hand author ODF files, so prefix have no > authoring benefit. > > In the end, though, I understand the ODF TC decided to include prefixes, > since implementers found the burdens largely theoretical (OpenOffice > should see an initial implementation in 3.2 I understand), and because > in general the group prefers to stick as closely to existing specs as > reasonable. > > On predefined vocabularies, we thought about doing something similar > informally, but decided it was out-of-scope; better initially to put a > solid extensible system in place and let developers start working with > it. > > My work on the Bibliographic Ontology was in part done with that in > mind, though has the added benefit it can be repurposed for RDFa in > XHTML. Cheers, -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Tuesday, 7 July 2009 19:45:12 UTC