Fwd: Updated Schema Architypes Straw Man Proposal



Hi Richard,

Thanks, that’s cleared up a few things…but a few I’m still unsure about. 

> 3. just a small error really, but schema: identifier should be GB71 THM/407  and not 407/8. You’ve put temporal coverage as 1954-2005 but its actually 1929-2005 for the whole collection.
> 
> In my example, I made the Audio Recordings part of the collection a sub collection of the whole collection. In which case the identifier and temporalCoverage values are as per the page the relevant html page.
> 
> However I did get the temporalCoverage wrong for the main collection which I have now corrected.

I’m definitely not getting this. 

You have given an example of an ‘Archive Collection’, along with an example of a sub-collection (Audio Recordings) and an item (Sound Recording of Lines from My Grandafther's Forehead). 
The example of an archive collection is “The Ronnie Barker Collection”. The reference for that collection is THM/407. So when you say you made the Audio Recordings part of the collection a sub collection of the whole collection….yes, I can see that is the example given of a sub-collection. But the example of the Ronnie Barker Collection surely should have schema:identifier "GB 71 THM/407

> 4. However I concur that in such a case in practice it probably would not be practical to list all 500 in the JSON-LD insert on the collection page.   In such a case however use of isPartOf In the description of the ArchiveItem would be sufficient to assert the relationship to a search engine: 
> “isPartOf”: “https://archiveshub.jisc.ac.uk/data/gb71-thm/407/thm/407/8” (JSON-LD syntax)

Yes, that’s exactly my thinking. 

But it would be “isPartOf”: “https://archiveshub.jisc.ac.uk/data/gb71-thm/407/thm/407” (is part of the Ronnie Barker Collection)

> 5. Extent

> Potentially, with ArchiveCollection we could propose a property to describe the size of a collection, property names that come to mind include collectionSize, itemQuantity, collectionExtent, with expected types of Text and Integer.   Would usage across archives be consistent enough to support use of a general property such as this?

OK, I think this is worth a bit more discussion. For us (the Hub, representing a large number of UK archives), extent is mandatory at the collection level and we see it as vital for researchers. What is put down as extent is not consistent! But the use of this field is very common. If we did use something like collectionSize then that would imply top level (collection level) and I don’t know whether that works….or maybe I’m reverting back to worrying too much about semantics. 

> 6. archiveHeld
> 
> Could this description include:
> 
> schema:archiveHeld "V&A Theatre and Performance Collections”
> 
> Whoops! - I missed out that property in the examples - now corrected:
> “archiveHeld": "https://archiveshub.jisc.ac.uk/data/gb71-thm/407”,


I dont’ understand why you’ve made the assertion Archive Held = The Ronnie Barker Collection. Why wouldn’t it be the name of the repository? Or a URI for the repository? 

Also, you’ve put that into #An Archive, which is describing the repository itself. I’m talking about the #Archive Collection and stating where it is held. 


cheers,
Jane


> 
>> On 17 May 2017, at 17:15, Richard Wallis <richard.wallis@dataliberate.com> wrote:
>> 
>> Hi Jane,
>> 
>> Comments in line.
>> 
>> On 17 May 2017 at 14:52, Jane Stevenson <Jane.Stevenson@jisc.ac.uk> wrote:
>> Hi Richard,
>> 
>> Great, thanks.
>> 
>> Just looking at the #Archive Collection for the time being:
>> 
>> 1. Overall this tallies with my attempts - so I’m pleased that I seem to be going in the right direction.
>> 
>> Good to hear ;-)
>> 
>> 
>> 2. can you just clarify for me the syntax re. the creator
>> 
>> schema:creator [ a schema:Person ;
>>            schema:name "Ronnie Barker" ;
>>            schema:sameAs <http://viaf.org/viaf/2676198> ] ;
>> 
>> You would do this every time you introduce Types?  So you might do it if you had, for example,  schema:publisher [a schema: Organization
>> 
>> This depends much on what information you have about the creator.  For example if you had available a separate description of that Person with its own identifying URI, you would just use that.
>> 
>> e.g.  schema:creator <https://archiveshub.jisc.ac.uk/data/person/1234>
>> 
>> Or you could use an external authoritative source such as <http://viaf.org/viaf/2676198>, or <http://www.wikidata.org/entity/Q963893> 
>> 
>> Note the syntax referenced in the examples here is Turtle, which I used for clarity when displaying the whole of the example model.  It would be worth checking out the JSON-LD examples of what would be inserted into the individual html pages for search engines to crawl.
>> 
>> 
>> 3. just a small error really, but schema: identifier should be GB71 THM/407  and not 407/8. You’ve put temporal coverage as 1954-2005 but its actually 1929-2005 for the whole collection.
>> 
>> In my example, I made the Audio Recordings part of the collection a sub collection of the whole collection. In which case the identifier and temporalCoverage values are as per the page the relevant html page.
>> 
>> However I did get the temporalCoverage wrong for the main collection which I have now corrected.
>> 
>> 
>> 4. hasPart
>> 
>> This particular collection has something like 500 parts (series, sub series, items). To my mind it is not generally going to be practical to use ‘has part’ in this way. I don’t think that matters, as I guess the principle is that you can indicate parts of a whole if you wish to. If I did want to do this, would it be:
>> 
>> schema:hasPart "https://archiveshub.jisc.ac.uk/data/gb71-thm/407/thm/407/1" ;
>> schema:hasPart "https://archiveshub.jisc.ac.uk/data/gb71-thm/407/thm/407/2” ;
>> schema:hasPart "https://archiveshub.jisc.ac.uk/data/gb71-thm/407/thm/407/3” ;
>> schema:hasPart "https://archiveshub.jisc.ac.uk/data/gb71-thm/407/thm/407/4” ;
>> 
>> I’m not yet sure of the benefits of listing 500 parts in this way.
>> 
>> The benefits would be for the search engines to gain a detailed understanding of the relationship between the collection and the items it contains.
>> 
>> However I concur that in such a case in practice it probably would not be practical to list all 500 in the JSON-LD insert on the collection page.   In such a case however use of isPartOf In the description of the ArchiveItem would be sufficient to assert the relationship to a search engine: 
>> “isPartOf”: “https://archiveshub.jisc.ac.uk/data/gb71-thm/407/thm/407/8” (JSON-LD syntax)
>> 
>> 
>> 5. extent
>> 
>> I definitely don’t want to include all the descriptive information within the schema.org representation, but I would tend to include the size of the collection as core information. At present I don’t think there is a property that we could use for this?
>> 
>> Extent was an issue that caused much discussion in the bibliographic extension work, as there was much variation as to what extent could be used for and mean.  It was never therefore recommended as a property, and the use of description was recommended.
>> 
>> Potentially, with ArchiveCollection we could propose a property to describe the size of a collection, property names that come to mind include collectionSize, itemQuantity, collectionExtent, with expected types of Text and Integer.   Would usage across archives be consistent enough to support use of a general property such as this?
>> 
>> 
>> 6. archiveHeld
>> 
>> Could this description include:
>> 
>> schema:archiveHeld "V&A Theatre and Performance Collections”
>> 
>> Whoops! - I missed out that property in the examples - now corrected:
>> “archiveHeld": "https://archiveshub.jisc.ac.uk/data/gb71-thm/407",
>> 
>> 
>> 7. Language
>> 
>> From what I gather, to be compliant we would have to use ISO639-1 codes? i.e. inLanguage: “EN” and not “eng”? All of our descriptions use ISO 639-2 so its a shame if we can't use them!
>> 
>> That’s the trouble with standards - there are so many to choose from! ;-)
>> 
>> Schema.org inLanguage encourages the use of BCP47, which as you indicate, is ISO639-1 based.  This being the generic used across many domains and accross the web and html.  However, there are many domains that, as you, do not yet use that standard.  This is an area where data consumers (search engines) almost certainly apply Postel’s Law and would probably recognise your language codes.
>> 
>> 
>> 8. Aboutness
>> 
>> Finally, one of the things I assumed with schema.org is that it would be useful to include what the archive is about. So I thought about using e.g:
>> 
>> schema:about “Comedy”
>> schema:about “Television comedy"
>> 
>> I was thinking in terms of discoverability. What do you think about adding subjects/people/places in this way?
>> 
>> Most definitely!
>> 
>> I have added a couple of about references in the examples.  I used text values, but equally they could have been URIs for the concepts, person, etc.
>> 
>> ~Richard
>> 
>> 
>> cheers
>> Jane
>> 
>> 
>> 
>> 
>>> On 16 May 2017, at 13:26, Richard Wallis <richard.wallis@dataliberate.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> Following discussions on the mailing list and taking into account general evolution of the schema.org vocabulary over recent months, I have produced an updated version of the straw man initial proposal in the Wiki.
>>> 
>>> ~Richard.
>>> Richard Wallis
>>> Founder, Data Liberate
>>> http://dataliberate.com

>>> Linkedin: http://www.linkedin.com/in/richardwallis

>>> Twitter: @rjw
>> 
>> Jisc is a registered charity (number 1149740) and a company limited by guarantee which is registered in England under Company No. 5747339, VAT No. GB 197 0632 86. Jisc’s registered office is: One Castlepark, Tower Hill, Bristol, BS2 0JA. T 0203 697 5800.
>> 
>> Jisc Services Limited is a wholly owned Jisc subsidiary and a company limited by guarantee which is registered in England under company number 2881024, VAT number GB 197 0632 86. The registered office is: One Castle Park, Tower Hill, Bristol BS2 0JA. T 0203 697 5800.
>> 
> 

Received on Thursday, 18 May 2017 09:35:02 UTC