Re: Missing Schema.Org properties

On 5 Dec 2012, at 14:58, Ed Summers wrote:

> Offlist Alf Eaton kindly reminded me that the HTML Microdata spec does
> not appear to allow you to encode multiple identifiers for a given
> item using itemid. I don't think I'm going out on a limb here by
> saying that this is problematic, for example in use cases like
> ScholarlyArticle where it would be useful to encode a PubMedID and a
> DOI.
> 
> So I emailed the WHATWG mailing list to make sure that this is
> actually the case, and to propose that the Microdata spec allow for it
> [1]. As you can see from Ian Hickson's response, itemid doesn't allow
> for multiple identifiers by design. He also had some suggestions for
> workarounds using meta and link with a generic 'id' itemprop [2].
> 
> So I think this leaves us with two options:

Is the library world the only one with this problem?
> 
> 1) document itemprops in Book ScholarlyArticle, etc for all the
> identifier types that we think are relevant for the bibliographic
> universe: doi, oclcnum, pmid, etc.

This would require each field to map the relevant identifiers -- and would need to be changed each time a new identifier scheme comes along.

> 2) document a pattern for expressing identifiers of different types:
> using meta, link (as Ian suggested) or some other mechanism.

+1 : A design pattern seems more appropriate to me.

-Jodi


> 
> I'm not sure I have a preference at this point, but I just wanted to
> point out that relying entirely on itemid for expressing identifiers
> is not going to work. Perhaps it would be useful to document some of
> the design choices on the wiki for further discussion?
> 
> //Ed
> 
> [1] http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2012-December/038256.html
> [2] http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2012-December/038257.html
> 
> PS. Sorry for sending this to you twice Karen :-)
> 
> On Tue, Dec 4, 2012 at 8:42 PM, Karen Coyle <kcoyle@kcoyle.net> wrote:
>> I did check these fields on what I can find of the Moen statistics (a large
>> study of MARC field frequency), so there may be some we can defer.
>> Unfortunately, what I have of those stats only covers books, not, for
>> example, serials or music, so I am making a guess here, but these fields
>> seem to be used less in less than 80% of the relevant records:
>> 
>> 
>> 013 - Patent Control Information (R) Full | Concise
>> 017 - Copyright or Legal Deposit Number (R) Full | Concise
>> 024 - Other Standard Identifier (R) Full | Concise
>> 025 - Overseas Acquisition Number (R) Full | Concise
>> 026 - Fingerprint Identifier (R) Full | Concise
>> 027 - Standard Technical Report Number (R) Full | Concise
>> 031 - Musical Incipits Information (R) Full | Concise
>> 035 - System Control Number (R) Full | Concise
>> 
>> I rather expected the GPO item number (074) to be higher, but it is not.
>> However, I've lost access to the full set of stats so I don't know its
>> actual frequency. (Some files are on the original site are giving me 404)
>> I'll see if I can rectify this.
>> 
>> kc
>> 
>> 
>> On 12/4/12 11:45 AM, Karen Coyle wrote:
>>> 
>>> It kind of depends on what you consider a bibliographic identifier. So
>>> maybe our first step should be to define that.
>>> 
>>> Here are the ones that I find in the MARC21 format:
>>> 
>>> 010 - Library of Congress Control Number (NR) Full | Concise
>>> 013 - Patent Control Information (R) Full | Concise
>>> 015 - National Bibliography Number (R) Full | Concise
>>> 016 - National Bibliographic Agency Control Number (R) Full | Concise
>>> 017 - Copyright or Legal Deposit Number (R) Full | Concise
>>> 020 - International Standard Book Number (R) Full | Concise
>>> 022 - International Standard Serial Number (R) Full | Concise
>>> 024 - Other Standard Identifier (R) Full | Concise
>>> 025 - Overseas Acquisition Number (R) Full | Concise
>>> 026 - Fingerprint Identifier (R) Full | Concise
>>> 027 - Standard Technical Report Number (R) Full | Concise
>>> 028 - Publisher Number (R) Full | Concise
>>> 030 - CODEN Designation (R) Full | Concise
>>> 031 - Musical Incipits Information (R) Full | Concise
>>> 032 - Postal Registration Number (R) Full | Concise
>>> 035 - System Control Number (R) Full | Concise
>>> ?036 - Original Study Number for Computer Data Files (NR) Full | Concise
>>> 074 - GPO Item Number (R) Full | Concise
>>> 
>>> I think this is all of them.... Then we go on to the classification codes:
>>> 
>>> 
>>> 050 - Library of Congress Call Number (R) Full | Concise
>>> 052 - Geographic Classification (R) Full | Concise
>>> 055 - Classification Numbers Assigned in Canada (R) Full | Concise
>>> 060 - National Library of Medicine Call Number (R) Full | Concise
>>> 070 - National Agricultural Library Call Number (R) Full | Concise
>>> ?072 - Subject Category Code (R) Full | Concise
>>> 
>>> And that doesn't cover thesauri. However, we may want to ignore any
>>> thesauri that cannot provide URIs?
>>> 
>>> kc
>>> 
>>> 
>>> 
>>> On 12/4/12 11:28 AM, Ross Singer wrote:
>>>> 
>>>> 
>>>> On Dec 4, 2012, at 2:23 PM, Ed Summers <ehs@pobox.com
>>>> <mailto:ehs@pobox.com>> wrote:
>>>> 
>>>>> Call me naive, but I contend that most bibliographic identifiers are
>>>>> expressable as URIs (URNs, info-uris, URLs) and that as such they can
>>>>> use microdata's itemid [1]. Is there really a problem here?
>>>> 
>>>> 
>>>> +1
>>>> 
>>>> I was hoping to suggest something along these lines, but had lacked the
>>>> cycles to actually do the research to back it up.
>>>> 
>>>> -Ross.
>>>>> 
>>>>> 
>>>>> //Ed
>>>>> 
>>>>> [1]
>>>>> 
>>>>> http://www.whatwg.org/specs/web-apps/current-work/multipage/microdata.html#global-identifiers-for-items
>>>>> 
>>>>> 
>>>>> 
>>>>> On Tue, Dec 4, 2012 at 9:00 AM, Karen Coyle <kcoyle@kcoyle.net
>>>>> <mailto:kcoyle@kcoyle.net>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>    On 12/4/12 5:01 AM, Shlomo Sanders wrote:
>>>>> 
>>>>>        For what it is worth, I prefer:
>>>>> 
>>>>>             ISBN-10<span property=" identifier"
>>>>>        typeof="ISBN">0316769487</__span>
>>>>> 
>>>>> 
>>>>>    I don't think this is correct -- unless you have a property that
>>>>>    is "ISBN". The "typeof" takes a property, not a value.
>>>>> 
>>>>>    Any values have to be outside of the <> unless you use a meta tag.
>>>>>    see:
>>>>>    http://schema.org/docs/gs.__html#advanced_missing
>>>>>    <http://schema.org/docs/gs.html#advanced_missing>
>>>>> 
>>>>>    Maybe that's how we'll have to go - with meta.
>>>>> 
>>>>>    kc
>>>>> 
>>>>> 
>>>>> 
>>>>>        Or
>>>>>             ISBN-10: <span itemprop="isbn">0316769487</__span>
>>>>> 
>>>>>        These are short and clean.
>>>>>        The itemprop="isbn" is not generic since the valid values for
>>>>>        itemprop is enumerated?
>>>>>        Is that the same issue for typeof?
>>>>> 
>>>>>        -----Original Message-----
>>>>>        From: Karen Coyle [mailto:kcoyle@kcoyle.net
>>>>>        <mailto:kcoyle@kcoyle.net>]
>>>>>        Sent: Tuesday, December 04, 2012 14:58
>>>>>        To: public-schemabibex@w3.org <mailto:public-schemabibex@w3.org>
>>>>>        Subject: Re: Missing Schema.Org <http://Schema.Org> properties
>>>>> 
>>>>>        Do we need to consider how this might be displayed, since
>>>>>        schema.org <http://schema.org/> generally wraps around a
>>>>>        display? These two options would result in different displays:
>>>>> 
>>>>>        On 12/4/12 3:33 AM, Shlomo Sanders wrote:
>>>>> 
>>>>>            How is this as a schema.org <http://schema.org/>
>>>>>            "friendly" version of the ONIX structure:
>>>>> 
>>>>>            <div typeof="identifier">
>>>>>                        <span property=" identifierValue
>>>>>            ">0316769487</span>
>>>>>                        <span property=" identifierType ">ISBN</span>
>>>>>            </div>
>>>>> 
>>>>> 
>>>>>        0316769487 ISBN
>>>>> 
>>>>> 
>>>>> 
>>>>>            Seems too long to me, perhaps:    <span property="
>>>>>            identifier" typeof="ISBN">0316769487</__span>
>>>>> 
>>>>> 
>>>>>        0316769487
>>>>> 
>>>>>        The schema.org <http://schema.org/> documentation shows a
>>>>>        similar example to this latter approach using price:
>>>>> 
>>>>>            Price: <span itemprop="price">$6.99</span>
>>>>>            <meta itemprop="priceCurrency" content="USD" />
>>>>> 
>>>>>        This gets the "$6.99" display for the human reader, plus the
>>>>>        currency type for processing.
>>>>> 
>>>>>        The current use of ISBN is illustrated as:
>>>>> 
>>>>>             ISBN-10: <span itemprop="isbn">0316769487</__span>
>>>>> 
>>>>>        If we go with id type and value, then display is limited by
>>>>>        the defined types, unless we leave type very loose. To get the
>>>>>        same display as the ISBN immediately above, we'd need:
>>>>> 
>>>>>        <div itemprop="identifier"
>>>>>        itemscope="http://schema.org/__Identifier
>>>>>        <http://schema.org/Identifier>">
>>>>>            <span itemprop="idType">ISBN-10: </span>
>>>>>            <span itemprop="idValue">0316769487<__/span>
>>>>>        </div>
>>>>> 
>>>>>        Does identifier type do what we want if it's not a controlled
>>>>>        value? Or would we need a <meta> with a controlled value?
>>>>> 
>>>>>        kc
>>>>> 
>>>>> 
>>>>> 
>>>>>            -----Original Message-----
>>>>>            From: Karen Coyle [mailto:kcoyle@kcoyle.net
>>>>>            <mailto:kcoyle@kcoyle.net>]
>>>>>            Sent: Monday, December 03, 2012 20:28
>>>>>            To: Graham Bell
>>>>>            Cc: public-schemabibex@w3.org
>>>>>            <mailto:public-schemabibex@w3.org>
>>>>>            Subject: Re: Missing Schema.Org <http://Schema.Org>
>>>>> properties
>>>>> 
>>>>>            I do, however, see a significant difference between
>>>>>            schema.org <http://schema.org/> and the XML structure of
>>>>>            ONIX (or any other XML-based metadata): schema.org
>>>>>            <http://schema.org/> allows the data to be flattened to a
>>>>>            single horizon of data. This is for the sake of
>>>>>            simplicity, if I understand correctly. There seems to be a
>>>>>            philosophy in schema.org <http://schema.org/> that avoids
>>>>>            a strict division of descriptions into "right" and
>>>>>            "wrong." XML, instead, is really an enforcement mechanism.
>>>>> 
>>>>>            I'm leery of adding much structure to schema.org
>>>>>            <http://schema.org/>. Or at least, of either requiring it
>>>>>            or relying on it. That makes the identifier "problem"
>>>>>            particularly difficult. It is for this reason that I
>>>>>            asked, in response to Shlomo's post, whether one can make
>>>>>            use of the self-identifying nature of URIs. That doesn't
>>>>>            help us with non-URI identifiers, but it seems that we are
>>>>>            moving increasingly in the direction of "fully formed"
>>>>>            identifiers.
>>>>> 
>>>>>            kc
>>>>> 
>>>>>            On 12/3/12 8:41 AM, Graham Bell wrote:
>>>>> 
>>>>>                Worth saying at this point that this is EXACTLY how
>>>>>                ONIX is structured:
>>>>> 
>>>>>                      <entityIdentifier>
>>>>>                           <entityIDType>
>>>>>                           <IDTypeName>
>>>>>                           <IDValue>
>>>>>                      </entityIdentifier>
>>>>> 
>>>>> 
>>>>>                where 'entity' might be 'product', 'work', 'name', or
>>>>>                whatever. There
>>>>>                is a controlled vocabulary for common IDTypes, and if
>>>>>                you have some
>>>>>                proprietary identifier not in the list, you must
>>>>>                include a 'likely to
>>>>>                be unique' name for it in <IDTypeName> instead.
>>>>> 
>>>>>                A point of history -- ONIX started (in 1999) with a
>>>>>                property per
>>>>>                identifier type: there were tags called <ISBN> and
>>>>>                <UPC>, but as
>>>>>                pointed out below, that isn't really practical, so the
>>>>>                above XML
>>>>>                structure is used extensively now. It's easy to add to
>>>>>                the controlled
>>>>>                vocabulary when a new identifier comes along, without
>>>>>                having to
>>>>>                change the schema. In UML, it looks like the attached,
>>>>>                and I leave
>>>>>                the RDF as an exercise for the reader...
>>>>> 
>>>>>                Graham
>>>>> 
>>>>> 
>>>>> 
>>>>>                Graham Bell
>>>>>                EDItEUR
>>>>> 
>>>>>                Tel: +44 20 7503 6418 <tel:%2B44%2020%207503%206418>
>>>>>                Mob: +44 7887 754958 <tel:%2B44%207887%20754958>
>>>>> 
>>>>>                EDItEUR Limited is a company limited by guarantee,
>>>>>                registered in
>>>>>                England no 2994705. Registered Office: United House,
>>>>>                North Road,
>>>>>                London N7 9DP, UK. Website: http://www.editeur.org
>>>>>                <http://www.editeur.org/>
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>                On 3 Dec 2012, at 16:18, Laura Dawson wrote:
>>>>> 
>>>>>                    That might work, actually.
>>>>> 
>>>>>                    Sent from my iPhone
>>>>> 
>>>>>                    On Dec 3, 2012, at 4:05 PM, Karen Coyle
>>>>>                    <kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>
>>>>>                    <mailto:kcoyle@kcoyle.net
>>>>>                    <mailto:kcoyle@kcoyle.net>>> wrote:
>>>>> 
>>>>> 
>>>>> 
>>>>>                        On 12/3/12 7:19 AM, Richard Wallis wrote:
>>>>> 
>>>>> 
>>>>>                            Hi Shlomo,
>>>>> 
>>>>>                            Couple of points.
>>>>> 
>>>>> 
>>>>>                            *Identifiers: *This is a particular
>>>>>                            concern of mine.
>>>>> 
>>>>> 
>>>>>                        Me, too!
>>>>> 
>>>>>                        The approach of
>>>>> 
>>>>>                            having a named property for each possible
>>>>>                            identifier that a
>>>>>                            CreativeWork or a Person could have, just
>>>>>                            does not scale.  However
>>>>>                            to handle this you will always be
>>>>>                            disenfranchising some identifier
>>>>>                            backing group.  Isbn seems to of got in
>>>>>                            because it is know by everyone, oclcnum is
>>>>>                            obvious
>>>>>                            from where I sit (but that does not make
>>>>>                            it right).   I think we (in all
>>>>>                            of Schema, not just the bib domain) need
>>>>>                            an identifier Type with
>>>>>                            properties of 'identifierValue' and
>>>>>                            'identifierType' - which could
>>>>>                            handle either an enumerated list or at
>>>>>                            least well known identifier
>>>>>                            names.
>>>>> 
>>>>> 
>>>>>                        I believe that this means that "Identifier"
>>>>>                        becomes a "schema" in
>>>>>                        schema.org <http://schema.org/>
>>>>>                        <http://schema.org <http://schema.org/>>.
>>>>> 
>>>>>                        kc
>>>>> 
>>>>> 
>>>>>                            ~Richard.
>>>>> 
>>>>> 
>>>>>            --
>>>>>            Karen Coyle
>>>>>            kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net>
>>>>>            http://kcoyle.net <http://kcoyle.net/>
>>>>>            ph: 1-510-540-7596 <tel:1-510-540-7596>
>>>>>            m: 1-510-435-8234 <tel:1-510-435-8234>
>>>>>            skype: kcoylenet
>>>>> 
>>>>> 
>>>>> 
>>>>>        --
>>>>>        Karen Coyle
>>>>>        kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> http://kcoyle.net
>>>>>        <http://kcoyle.net/>
>>>>>        ph: 1-510-540-7596 <tel:1-510-540-7596>
>>>>>        m: 1-510-435-8234 <tel:1-510-435-8234>
>>>>>        skype: kcoylenet
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>    --
>>>>>    Karen Coyle
>>>>>    kcoyle@kcoyle.net <mailto:kcoyle@kcoyle.net> http://kcoyle.net
>>>>>    <http://kcoyle.net/>
>>>>>    ph: 1-510-540-7596 <tel:1-510-540-7596>
>>>>>    m: 1-510-435-8234 <tel:1-510-435-8234>
>>>>>    skype: kcoylenet
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
>> --
>> Karen Coyle
>> kcoyle@kcoyle.net http://kcoyle.net
>> ph: 1-510-540-7596
>> m: 1-510-435-8234
>> skype: kcoylenet
>> 
> 

Received on Wednesday, 5 December 2012 15:25:54 UTC