- From: Ross Singer <rxs@talis.com>
- Date: Wed, 5 Dec 2012 10:09:28 -0500
- To: Ed Summers <ehs@pobox.com>
- Cc: Karen Coyle <kcoyle@kcoyle.net>, "public-schemabibex@w3.org" <public-schemabibex@w3.org>
I'm still not sure why this is a problem. Doesn't RDF deal with this all the time? Can't we use something along the lines of dcterms:identifier? That is, let the 'other' identifiers be the object, rather than the subject? -Ross. On Dec 5, 2012, at 9:58 AM, Ed Summers <ehs@pobox.com> wrote: > Offlist Alf Eaton kindly reminded me that the HTML Microdata spec does > not appear to allow you to encode multiple identifiers for a given > item using itemid. I don't think I'm going out on a limb here by > saying that this is problematic, for example in use cases like > ScholarlyArticle where it would be useful to encode a PubMedID and a > DOI. > > So I emailed the WHATWG mailing list to make sure that this is > actually the case, and to propose that the Microdata spec allow for it > [1]. As you can see from Ian Hickson's response, itemid doesn't allow > for multiple identifiers by design. He also had some suggestions for > workarounds using meta and link with a generic 'id' itemprop [2]. > > So I think this leaves us with two options: > > 1) document itemprops in Book ScholarlyArticle, etc for all the > identifier types that we think are relevant for the bibliographic > universe: doi, oclcnum, pmid, etc. > 2) document a pattern for expressing identifiers of different types: > using meta, link (as Ian suggested) or some other mechanism. > > I'm not sure I have a preference at this point, but I just wanted to > point out that relying entirely on itemid for expressing identifiers > is not going to work. Perhaps it would be useful to document some of > the design choices on the wiki for further discussion? > > //Ed > > [1] http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2012-December/038256.html > [2] http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2012-December/038257.html > > PS. Sorry for sending this to you twice Karen :-) > > On Tue, Dec 4, 2012 at 8:42 PM, Karen Coyle <kcoyle@kcoyle.net> wrote: >> I did check these fields on what I can find of the Moen statistics (a large >> study of MARC field frequency), so there may be some we can defer. >> Unfortunately, what I have of those stats only covers books, not, for >> example, serials or music, so I am making a guess here, but these fields >> seem to be used less in less than 80% of the relevant records: >> >> >> 013 - Patent Control Information (R) Full | Concise >> 017 - Copyright or Legal Deposit Number (R) Full | Concise >> 024 - Other Standard Identifier (R) Full | Concise >> 025 - Overseas Acquisition Number (R) Full | Concise >> 026 - Fingerprint Identifier (R) Full | Concise >> 027 - Standard Technical Report Number (R) Full | Concise >> 031 - Musical Incipits Information (R) Full | Concise >> 035 - System Control Number (R) Full | Concise >> >> I rather expected the GPO item number (074) to be higher, but it is not. >> However, I've lost access to the full set of stats so I don't know its >> actual frequency. (Some files are on the original site are giving me 404) >> I'll see if I can rectify this. >> >> kc >> >> >> On 12/4/12 11:45 AM, Karen Coyle wrote: >>> >>> It kind of depends on what you consider a bibliographic identifier. So >>> maybe our first step should be to define that. >>> >>> Here are the ones that I find in the MARC21 format: >>> >>> 010 - Library of Congress Control Number (NR) Full | Concise >>> 013 - Patent Control Information (R) Full | Concise >>> 015 - National Bibliography Number (R) Full | Concise >>> 016 - National Bibliographic Agency Control Number (R) Full | Concise >>> 017 - Copyright or Legal Deposit Number (R) Full | Concise >>> 020 - International Standard Book Number (R) Full | Concise >>> 022 - International Standard Serial Number (R) Full | Concise >>> 024 - Other Standard Identifier (R) Full | Concise >>> 025 - Overseas Acquisition Number (R) Full | Concise >>> 026 - Fingerprint Identifier (R) Full | Concise >>> 027 - Standard Technical Report Number (R) Full | Concise >>> 028 - Publisher Number (R) Full | Concise >>> 030 - CODEN Designation (R) Full | Concise >>> 031 - Musical Incipits Information (R) Full | Concise >>> 032 - Postal Registration Number (R) Full | Concise >>> 035 - System Control Number (R) Full | Concise >>> ?036 - Original Study Number for Computer Data Files (NR) Full | Concise >>> 074 - GPO Item Number (R) Full | Concise >>> >>> I think this is all of them.... Then we go on to the classification codes: >>> >>> >>> 050 - Library of Congress Call Number (R) Full | Concise >>> 052 - Geographic Classification (R) Full | Concise >>> 055 - Classification Numbers Assigned in Canada (R) Full | Concise >>> 060 - National Library of Medicine Call Number (R) Full | Concise >>> 070 - National Agricultural Library Call Number (R) Full | Concise >>> ?072 - Subject Category Code (R) Full | Concise >>> >>> And that doesn't cover thesauri. However, we may want to ignore any >>> thesauri that cannot provide URIs? >>> >>> kc >>> >>> >>> >>> On 12/4/12 11:28 AM, Ross Singer wrote: >>>> >>>> >>>> On Dec 4, 2012, at 2:23 PM, Ed Summers <ehs@pobox.com >>>> <mailto:ehs@pobox.com>> wrote: >>>> >>>>> Call me naive, but I contend that most bibliographic identifiers are >>>>> expressable as URIs (URNs, info-uris, URLs) and that as such they can >>>>> use microdata's itemid [1]. Is there really a problem here? >>>> >>>> >>>> +1 >>>> >>>> I was hoping to suggest something along these lines, but had lacked the >>>> cycles to actually do the research to back it up. >>>> >>>> -Ross. >>>>> >>>>> >>>>> //Ed >>>>> >>>>> [1] >>>>> >>>>> http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#global-identifiers-for-items >>>>> >>>>> >>>>> >>>>> On Tue, Dec 4, 2012 at 9:00 AM, Karen Coyle <kcoyle@kcoyle.net >>>>> <mailto:kcoyle@kcoyle.net>> wrote: >>>>> >>>>> >>>>> >>>>> On 12/4/12 5:01 AM, Shlomo Sanders wrote: >>>>> >>>>> For what it is worth, I prefer: >>>>> >>>>> ISBN-10<span property=" identifier" >>>>> typeof="ISBN">0316769487</__span> >>>>> >>>>> >>>>> I don't think this is correct -- unless you have a property that >>>>> is "ISBN". The "typeof" takes a property, not a value. >>>>> >>>>> Any values have to be outside of the <> unless you use a meta tag. >>>>> see: >>>>> http://schema.org/docs/gs.__html#advanced_missing >>>>> <http://schema.org/docs/gs.html#advanced_missing> >>>>> >>>>> Maybe that's how we'll have to go - with meta. >>>>> >>>>> kc >>>>> >>>>> >>>>> >>>>> Or >>>>> ISBN-10: <span itemprop="isbn">0316769487</__span> >>>>> >>>>> These are short and clean. >>>>> The itemprop="isbn" is not generic since the valid values for >>>>> itemprop is enumerated? >>>>> Is that the same issue for typeof? >>>>> >>>>> -----Original Message----- >>>>> From: Karen Coyle [mailto:kcoyle@kcoyle.net >>>>> <mailto:kcoyle@kcoyle.net>] >>>>> Sent: Tuesday, December 04, 2012 14:58 >>>>> To: public-schemabibex@w3.org <mailto:public-schemabibex@w3.org> >>>>> Subject: Re: Missing Schema.Org <http://Schema.Org> properties >>>>> >>>>> Do we need to consider how this might be displayed, since >>>>> schema.org <http://schema.org/> generally wraps around a >>>>> display? These two options would result in different displays: >>>>> >>>>> On 12/4/12 3:33 AM, Shlomo Sanders wrote: >>>>> >>>>> How is this as a schema.org <http://schema.org/> >>>>> "friendly" version of the ONIX structure: >>>>> >>>>> <div typeof="identifier"> >>>>> <span property=" identifierValue >>>>> ">0316769487</span> >>>>> <span property=" identifierType ">ISBN</span> >>>>> </div> >>>>> >>>>> >>>>> 0316769487 ISBN >>>>> >>>>> >>>>> >>>>> Seems too long to me, perhaps: <span property=" >>>>> identifier" typeof="ISBN">0316769487</__span> >>>>> >>>>> >>>>> 0316769487 >>>>> >>>>> The schema.org <http://schema.org/> documentation shows a >>>>> similar example to this latter approach using price: >>>>> >>>>> Price: <span itemprop="price">$6.99</span> >>>>> <meta itemprop="priceCurrency" content="USD" /> >>>>> >>>>> This gets the "$6.99" display for the human reader, plus the >>>>> currency type for processing. >>>>> >>>>> The current use of ISBN is illustrated as: >>>>> >>>>> ISBN-10: <span itemprop="isbn">0316769487</__span> >>>>> >>>>> If we go with id type and value, then display is limited by >>>>> the defined types, unless we leave type very loose. To get the >>>>> same display as the ISBN immediately above, we'd need: >>>>> >>>>> <div itemprop="identifier" >>>>> itemscope="http://schema.org/__Identifier >>>>> <http://schema.org/Identifier>"> >>>>> <span itemprop="idType">ISBN-10: </span> >>>>> <span itemprop="idValue">0316769487<__/span> >>>>> </div> >>>>> >>>>> Does identifier type do what we want if it's not a controlled >>>>> value? Or would we need a <meta> with a controlled value? >>>>> >>>>> kc >>>>> >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Karen Coyle [mailto:kcoyle@kcoyle.net >>>>> <mailto:kcoyle@kcoyle.net>] >>>>> Sent: Monday, December 03, 2012 20:28 >>>>> To: Graham Bell >>>>> Cc: public-schemabibex@w3.org >>>>> <mailto:public-schemabibex@w3.org> >>>>> Subject: Re: Missing Schema.Org <http://Schema.Org> >>>>> properties >>>>> >>>>> I do, however, see a significant difference between >>>>> schema.org <http://schema.org/> and the XML structure of >>>>> ONIX (or any other XML-based metadata): schema.org >>>>> <http://schema.org/> allows the data to be flattened to a >>>>> single horizon of data. This is for the sake of >>>>> simplicity, if I understand correctly. There seems to be a >>>>> philosophy in schema.org <http://schema.org/> that avoids >>>>> a strict division of descriptions into "right" and >>>>> "wrong." XML, instead, is really an enforcement mechanism. >>>>> >>>>> I'm leery of adding much structure to schema.org >>>>> <http://schema.org/>. Or at least, of either requiring it >>>>> or relying on it. That makes the identifier "problem" >>>>> particularly difficult. It is for this reason that I >>>>> asked, in response to Shlomo's post, whether one can make >>>>> use of the self-identifying nature of URIs. That doesn't >>>>> help us with non-URI identifiers, but it seems that we are >>>>> moving increasingly in the direction of "fully formed" >>>>> identifiers. >>>>> >>>>> kc >>>>> >>>>> On 12/3/12 8:41 AM, Graham Bell wrote: >>>>> >>>>> Worth saying at this point that this is EXACTLY how >>>>> ONIX is structured: >>>>> >>>>> <entityIdentifier> >>>>> <entityIDType> >>>>> <IDTypeName> >>>>> <IDValue> >>>>> </entityIdentifier> >>>>> >>>>> >>>>> where 'entity' might be 'product', 'work', 'name', or >>>>> whatever. There >>>>> is a controlled vocabulary for common IDTypes, and if >>>>> you have some >>>>> proprietary identifier not in the list, you must >>>>> include a 'likely to >>>>> be unique' name for it in <IDTypeName> instead. >>>>> >>>>> A point of history -- ONIX started (in 1999) with a >>>>> property per >>>>> identifier type: there were tags called <ISBN> and >>>>> <UPC>, but as >>>>> pointed out below, that isn't really practical, so the >>>>> above XML >>>>> structure is used extensively now. It's easy to add to >>>>> the controlled >>>>> vocabulary when a new identifier comes along, without >>>>> having to >>>>> change the schema. In UML, it looks like the attached, >>>>> and I leave >>>>> the RDF as an exercise for the reader... >>>>> >>>>> Graham >>>>> >>>>> >>>>> >>>>> Graham Bell >>>>> EDItEUR >>>>> >>>>> Tel: +44 20 7503 6418 <tel:%2B44%2020%207503%206418> >>>>> Mob: +44 7887 754958 <tel:%2B44%207887%20754958> >>>>> >>>>> EDItEUR Limited is a company limited by guarantee, >>>>> registered in >>>>> England no 2994705. Registered Office: United House, >>>>> North Road, >>>>> London N7 9DP, UK. Website: http://www.editeur.org >>>>> <http://www.editeur.org/> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On 3 Dec 2012, at 16:18, Laura Dawson wrote: >>>>> >>>>> That might work, actually. >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Dec 3, 2012, at 4:05 PM, Karen Coyle >>>>> <kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> >>>>> <mailto:kcoyle@kcoyle.net >>>>> <mailto:kcoyle@kcoyle.net>>> wrote: >>>>> >>>>> >>>>> >>>>> On 12/3/12 7:19 AM, Richard Wallis wrote: >>>>> >>>>> >>>>> Hi Shlomo, >>>>> >>>>> Couple of points. >>>>> >>>>> >>>>> *Identifiers: *This is a particular >>>>> concern of mine. >>>>> >>>>> >>>>> Me, too! >>>>> >>>>> The approach of >>>>> >>>>> having a named property for each possible >>>>> identifier that a >>>>> CreativeWork or a Person could have, just >>>>> does not scale. However >>>>> to handle this you will always be >>>>> disenfranchising some identifier >>>>> backing group. Isbn seems to of got in >>>>> because it is know by everyone, oclcnum is >>>>> obvious >>>>> from where I sit (but that does not make >>>>> it right). I think we (in all >>>>> of Schema, not just the bib domain) need >>>>> an identifier Type with >>>>> properties of 'identifierValue' and >>>>> 'identifierType' - which could >>>>> handle either an enumerated list or at >>>>> least well known identifier >>>>> names. >>>>> >>>>> >>>>> I believe that this means that "Identifier" >>>>> becomes a "schema" in >>>>> schema.org <http://schema.org/> >>>>> <http://schema.org <http://schema.org/>>. >>>>> >>>>> kc >>>>> >>>>> >>>>> ~Richard. >>>>> >>>>> >>>>> -- >>>>> Karen Coyle >>>>> kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> >>>>> http://kcoyle.net <http://kcoyle.net/> >>>>> ph: 1-510-540-7596 <tel:1-510-540-7596> >>>>> m: 1-510-435-8234 <tel:1-510-435-8234> >>>>> skype: kcoylenet >>>>> >>>>> >>>>> >>>>> -- >>>>> Karen Coyle >>>>> kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> http://kcoyle.net >>>>> <http://kcoyle.net/> >>>>> ph: 1-510-540-7596 <tel:1-510-540-7596> >>>>> m: 1-510-435-8234 <tel:1-510-435-8234> >>>>> skype: kcoylenet >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Karen Coyle >>>>> kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> http://kcoyle.net >>>>> <http://kcoyle.net/> >>>>> ph: 1-510-540-7596 <tel:1-510-540-7596> >>>>> m: 1-510-435-8234 <tel:1-510-435-8234> >>>>> skype: kcoylenet >>>>> >>>>> >>>> >>> >> >> -- >> Karen Coyle >> kcoyle@kcoyle.net http://kcoyle.net >> ph: 1-510-540-7596 >> m: 1-510-435-8234 >> skype: kcoylenet >> >
Received on Wednesday, 5 December 2012 15:10:32 UTC