- From: Kevin Ford <kefo@3windmills.com>
- Date: Wed, 05 Dec 2012 19:06:02 -0500
- To: public-schemabibex@w3.org
> 1) document itemprops in Book ScholarlyArticle, etc for all the > identifier types that we think are relevant for the bibliographic > universe: doi, oclcnum, pmid, etc. -- This has been in the hopper for quite some time and we've not brought it all the way to completion for a variety of reasons but I can offer this up: http://id.loc.gov/vocabulary/identifiers It represents the complete (I think) list of identifier types for MARC. That said, a few identifier types would/should be added, not least of which are oclcnum, pmid, gpo item number, among others (we've actually been using identifiers:oclcnum behind the scenes even though it is not formally on the MARC list). In short, there is a (relatively) finite list of identifier types commonly found in bibliographic data to consider if this is the decided way to go. Personally, I prefer a defined list of types, at least for the most "popular" identifier types, because alternate methods (a number of which have already been outlined) invariably result in verbosity. This may not be possible in this instance, but we've tended to embrace solutions where we try to have our cake and eat it too: define a set of identifier-type properties but also design an abstract "Identifier" entity that could capture both the type of identifier and the identifier value. Yours, Kevin On 12/05/2012 09:58 AM, Ed Summers wrote: > Offlist Alf Eaton kindly reminded me that the HTML Microdata spec does > not appear to allow you to encode multiple identifiers for a given > item using itemid. I don't think I'm going out on a limb here by > saying that this is problematic, for example in use cases like > ScholarlyArticle where it would be useful to encode a PubMedID and a > DOI. > > So I emailed the WHATWG mailing list to make sure that this is > actually the case, and to propose that the Microdata spec allow for it > [1]. As you can see from Ian Hickson's response, itemid doesn't allow > for multiple identifiers by design. He also had some suggestions for > workarounds using meta and link with a generic 'id' itemprop [2]. > > So I think this leaves us with two options: > > 1) document itemprops in Book ScholarlyArticle, etc for all the > identifier types that we think are relevant for the bibliographic > universe: doi, oclcnum, pmid, etc. > 2) document a pattern for expressing identifiers of different types: > using meta, link (as Ian suggested) or some other mechanism. > > I'm not sure I have a preference at this point, but I just wanted to > point out that relying entirely on itemid for expressing identifiers > is not going to work. Perhaps it would be useful to document some of > the design choices on the wiki for further discussion? > > //Ed > > [1] http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2012-December/038256.html > [2] http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2012-December/038257.html > > PS. Sorry for sending this to you twice Karen :-) > > On Tue, Dec 4, 2012 at 8:42 PM, Karen Coyle <kcoyle@kcoyle.net> wrote: >> I did check these fields on what I can find of the Moen statistics (a large >> study of MARC field frequency), so there may be some we can defer. >> Unfortunately, what I have of those stats only covers books, not, for >> example, serials or music, so I am making a guess here, but these fields >> seem to be used less in less than 80% of the relevant records: >> >> >> 013 - Patent Control Information (R) Full | Concise >> 017 - Copyright or Legal Deposit Number (R) Full | Concise >> 024 - Other Standard Identifier (R) Full | Concise >> 025 - Overseas Acquisition Number (R) Full | Concise >> 026 - Fingerprint Identifier (R) Full | Concise >> 027 - Standard Technical Report Number (R) Full | Concise >> 031 - Musical Incipits Information (R) Full | Concise >> 035 - System Control Number (R) Full | Concise >> >> I rather expected the GPO item number (074) to be higher, but it is not. >> However, I've lost access to the full set of stats so I don't know its >> actual frequency. (Some files are on the original site are giving me 404) >> I'll see if I can rectify this. >> >> kc >> >> >> On 12/4/12 11:45 AM, Karen Coyle wrote: >>> >>> It kind of depends on what you consider a bibliographic identifier. So >>> maybe our first step should be to define that. >>> >>> Here are the ones that I find in the MARC21 format: >>> >>> 010 - Library of Congress Control Number (NR) Full | Concise >>> 013 - Patent Control Information (R) Full | Concise >>> 015 - National Bibliography Number (R) Full | Concise >>> 016 - National Bibliographic Agency Control Number (R) Full | Concise >>> 017 - Copyright or Legal Deposit Number (R) Full | Concise >>> 020 - International Standard Book Number (R) Full | Concise >>> 022 - International Standard Serial Number (R) Full | Concise >>> 024 - Other Standard Identifier (R) Full | Concise >>> 025 - Overseas Acquisition Number (R) Full | Concise >>> 026 - Fingerprint Identifier (R) Full | Concise >>> 027 - Standard Technical Report Number (R) Full | Concise >>> 028 - Publisher Number (R) Full | Concise >>> 030 - CODEN Designation (R) Full | Concise >>> 031 - Musical Incipits Information (R) Full | Concise >>> 032 - Postal Registration Number (R) Full | Concise >>> 035 - System Control Number (R) Full | Concise >>> ?036 - Original Study Number for Computer Data Files (NR) Full | Concise >>> 074 - GPO Item Number (R) Full | Concise >>> >>> I think this is all of them.... Then we go on to the classification codes: >>> >>> >>> 050 - Library of Congress Call Number (R) Full | Concise >>> 052 - Geographic Classification (R) Full | Concise >>> 055 - Classification Numbers Assigned in Canada (R) Full | Concise >>> 060 - National Library of Medicine Call Number (R) Full | Concise >>> 070 - National Agricultural Library Call Number (R) Full | Concise >>> ?072 - Subject Category Code (R) Full | Concise >>> >>> And that doesn't cover thesauri. However, we may want to ignore any >>> thesauri that cannot provide URIs? >>> >>> kc >>> >>> >>> >>> On 12/4/12 11:28 AM, Ross Singer wrote: >>>> >>>> >>>> On Dec 4, 2012, at 2:23 PM, Ed Summers <ehs@pobox.com >>>> <mailto:ehs@pobox.com>> wrote: >>>> >>>>> Call me naive, but I contend that most bibliographic identifiers are >>>>> expressable as URIs (URNs, info-uris, URLs) and that as such they can >>>>> use microdata's itemid [1]. Is there really a problem here? >>>> >>>> >>>> +1 >>>> >>>> I was hoping to suggest something along these lines, but had lacked the >>>> cycles to actually do the research to back it up. >>>> >>>> -Ross. >>>>> >>>>> >>>>> //Ed >>>>> >>>>> [1] >>>>> >>>>> http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#global-identifiers-for-items >>>>> >>>>> >>>>> >>>>> On Tue, Dec 4, 2012 at 9:00 AM, Karen Coyle <kcoyle@kcoyle.net >>>>> <mailto:kcoyle@kcoyle.net>> wrote: >>>>> >>>>> >>>>> >>>>> On 12/4/12 5:01 AM, Shlomo Sanders wrote: >>>>> >>>>> For what it is worth, I prefer: >>>>> >>>>> ISBN-10<span property=" identifier" >>>>> typeof="ISBN">0316769487</__span> >>>>> >>>>> >>>>> I don't think this is correct -- unless you have a property that >>>>> is "ISBN". The "typeof" takes a property, not a value. >>>>> >>>>> Any values have to be outside of the <> unless you use a meta tag. >>>>> see: >>>>> http://schema.org/docs/gs.__html#advanced_missing >>>>> <http://schema.org/docs/gs.html#advanced_missing> >>>>> >>>>> Maybe that's how we'll have to go - with meta. >>>>> >>>>> kc >>>>> >>>>> >>>>> >>>>> Or >>>>> ISBN-10: <span itemprop="isbn">0316769487</__span> >>>>> >>>>> These are short and clean. >>>>> The itemprop="isbn" is not generic since the valid values for >>>>> itemprop is enumerated? >>>>> Is that the same issue for typeof? >>>>> >>>>> -----Original Message----- >>>>> From: Karen Coyle [mailto:kcoyle@kcoyle.net >>>>> <mailto:kcoyle@kcoyle.net>] >>>>> Sent: Tuesday, December 04, 2012 14:58 >>>>> To: public-schemabibex@w3.org <mailto:public-schemabibex@w3.org> >>>>> Subject: Re: Missing Schema.Org <http://Schema.Org> properties >>>>> >>>>> Do we need to consider how this might be displayed, since >>>>> schema.org <http://schema.org/> generally wraps around a >>>>> display? These two options would result in different displays: >>>>> >>>>> On 12/4/12 3:33 AM, Shlomo Sanders wrote: >>>>> >>>>> How is this as a schema.org <http://schema.org/> >>>>> "friendly" version of the ONIX structure: >>>>> >>>>> <div typeof="identifier"> >>>>> <span property=" identifierValue >>>>> ">0316769487</span> >>>>> <span property=" identifierType ">ISBN</span> >>>>> </div> >>>>> >>>>> >>>>> 0316769487 ISBN >>>>> >>>>> >>>>> >>>>> Seems too long to me, perhaps: <span property=" >>>>> identifier" typeof="ISBN">0316769487</__span> >>>>> >>>>> >>>>> 0316769487 >>>>> >>>>> The schema.org <http://schema.org/> documentation shows a >>>>> similar example to this latter approach using price: >>>>> >>>>> Price: <span itemprop="price">$6.99</span> >>>>> <meta itemprop="priceCurrency" content="USD" /> >>>>> >>>>> This gets the "$6.99" display for the human reader, plus the >>>>> currency type for processing. >>>>> >>>>> The current use of ISBN is illustrated as: >>>>> >>>>> ISBN-10: <span itemprop="isbn">0316769487</__span> >>>>> >>>>> If we go with id type and value, then display is limited by >>>>> the defined types, unless we leave type very loose. To get the >>>>> same display as the ISBN immediately above, we'd need: >>>>> >>>>> <div itemprop="identifier" >>>>> itemscope="http://schema.org/__Identifier >>>>> <http://schema.org/Identifier>"> >>>>> <span itemprop="idType">ISBN-10: </span> >>>>> <span itemprop="idValue">0316769487<__/span> >>>>> </div> >>>>> >>>>> Does identifier type do what we want if it's not a controlled >>>>> value? Or would we need a <meta> with a controlled value? >>>>> >>>>> kc >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Karen Coyle [mailto:kcoyle@kcoyle.net >>>>> <mailto:kcoyle@kcoyle.net>] >>>>> Sent: Monday, December 03, 2012 20:28 >>>>> To: Graham Bell >>>>> Cc: public-schemabibex@w3.org >>>>> <mailto:public-schemabibex@w3.org> >>>>> Subject: Re: Missing Schema.Org <http://Schema.Org> >>>>> properties >>>>> >>>>> I do, however, see a significant difference between >>>>> schema.org <http://schema.org/> and the XML structure of >>>>> ONIX (or any other XML-based metadata): schema.org >>>>> <http://schema.org/> allows the data to be flattened to a >>>>> single horizon of data. This is for the sake of >>>>> simplicity, if I understand correctly. There seems to be a >>>>> philosophy in schema.org <http://schema.org/> that avoids >>>>> a strict division of descriptions into "right" and >>>>> "wrong." XML, instead, is really an enforcement mechanism. >>>>> >>>>> I'm leery of adding much structure to schema.org >>>>> <http://schema.org/>. Or at least, of either requiring it >>>>> or relying on it. That makes the identifier "problem" >>>>> particularly difficult. It is for this reason that I >>>>> asked, in response to Shlomo's post, whether one can make >>>>> use of the self-identifying nature of URIs. That doesn't >>>>> help us with non-URI identifiers, but it seems that we are >>>>> moving increasingly in the direction of "fully formed" >>>>> identifiers. >>>>> >>>>> kc >>>>> >>>>> On 12/3/12 8:41 AM, Graham Bell wrote: >>>>> >>>>> Worth saying at this point that this is EXACTLY how >>>>> ONIX is structured: >>>>> >>>>> <entityIdentifier> >>>>> <entityIDType> >>>>> <IDTypeName> >>>>> <IDValue> >>>>> </entityIdentifier> >>>>> >>>>> >>>>> where 'entity' might be 'product', 'work', 'name', or >>>>> whatever. There >>>>> is a controlled vocabulary for common IDTypes, and if >>>>> you have some >>>>> proprietary identifier not in the list, you must >>>>> include a 'likely to >>>>> be unique' name for it in <IDTypeName> instead. >>>>> >>>>> A point of history -- ONIX started (in 1999) with a >>>>> property per >>>>> identifier type: there were tags called <ISBN> and >>>>> <UPC>, but as >>>>> pointed out below, that isn't really practical, so the >>>>> above XML >>>>> structure is used extensively now. It's easy to add to >>>>> the controlled >>>>> vocabulary when a new identifier comes along, without >>>>> having to >>>>> change the schema. In UML, it looks like the attached, >>>>> and I leave >>>>> the RDF as an exercise for the reader... >>>>> >>>>> Graham >>>>> >>>>> >>>>> >>>>> Graham Bell >>>>> EDItEUR >>>>> >>>>> Tel: +44 20 7503 6418 <tel:%2B44%2020%207503%206418> >>>>> Mob: +44 7887 754958 <tel:%2B44%207887%20754958> >>>>> >>>>> EDItEUR Limited is a company limited by guarantee, >>>>> registered in >>>>> England no 2994705. Registered Office: United House, >>>>> North Road, >>>>> London N7 9DP, UK. Website: http://www.editeur.org >>>>> <http://www.editeur.org/> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 3 Dec 2012, at 16:18, Laura Dawson wrote: >>>>> >>>>> That might work, actually. >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Dec 3, 2012, at 4:05 PM, Karen Coyle >>>>> <kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> >>>>> <mailto:kcoyle@kcoyle.net >>>>> <mailto:kcoyle@kcoyle.net>>> wrote: >>>>> >>>>> >>>>> >>>>> On 12/3/12 7:19 AM, Richard Wallis wrote: >>>>> >>>>> >>>>> Hi Shlomo, >>>>> >>>>> Couple of points. >>>>> >>>>> >>>>> *Identifiers: *This is a particular >>>>> concern of mine. >>>>> >>>>> >>>>> Me, too! >>>>> >>>>> The approach of >>>>> >>>>> having a named property for each possible >>>>> identifier that a >>>>> CreativeWork or a Person could have, just >>>>> does not scale. However >>>>> to handle this you will always be >>>>> disenfranchising some identifier >>>>> backing group. Isbn seems to of got in >>>>> because it is know by everyone, oclcnum is >>>>> obvious >>>>> from where I sit (but that does not make >>>>> it right). I think we (in all >>>>> of Schema, not just the bib domain) need >>>>> an identifier Type with >>>>> properties of 'identifierValue' and >>>>> 'identifierType' - which could >>>>> handle either an enumerated list or at >>>>> least well known identifier >>>>> names. >>>>> >>>>> >>>>> I believe that this means that "Identifier" >>>>> becomes a "schema" in >>>>> schema.org <http://schema.org/> >>>>> <http://schema.org <http://schema.org/>>. >>>>> >>>>> kc >>>>> >>>>> >>>>> ~Richard. >>>>> >>>>> >>>>> -- >>>>> Karen Coyle >>>>> kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> >>>>> http://kcoyle.net <http://kcoyle.net/> >>>>> ph: 1-510-540-7596 <tel:1-510-540-7596> >>>>> m: 1-510-435-8234 <tel:1-510-435-8234> >>>>> skype: kcoylenet >>>>> >>>>> >>>>> >>>>> -- >>>>> Karen Coyle >>>>> kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> http://kcoyle.net >>>>> <http://kcoyle.net/> >>>>> ph: 1-510-540-7596 <tel:1-510-540-7596> >>>>> m: 1-510-435-8234 <tel:1-510-435-8234> >>>>> skype: kcoylenet >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Karen Coyle >>>>> kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> http://kcoyle.net >>>>> <http://kcoyle.net/> >>>>> ph: 1-510-540-7596 <tel:1-510-540-7596> >>>>> m: 1-510-435-8234 <tel:1-510-435-8234> >>>>> skype: kcoylenet >>>>> >>>>> >>>> >>> >> >> -- >> Karen Coyle >> kcoyle@kcoyle.net http://kcoyle.net >> ph: 1-510-540-7596 >> m: 1-510-435-8234 >> skype: kcoylenet >> >
Received on Thursday, 6 December 2012 00:05:04 UTC