Re: Tweaks to the Archives proposal [via Schema Architypes Community Group] from Owen Stephens on 2017-07-14 (public-architypes@w3.org from July 2017)

From: Owen Stephens <owen@ostephens.com>
Date: Fri, 14 Jul 2017 15:46:57 +0100
To: Richard Wallis <richard.wallis@dataliberate.com>
Cc: public-architypes <public-architypes@w3.org>, Jane Stevenson <Jane.Stevenson@jisc.ac.uk>
Message-Id: <F7751F95-5B4D-43E4-95E2-E98D93BEE0B6@ostephens.com>
OK - I’ve put up something on the wiki - I’ve not checked it through as carefully as I’d like but I wanted to get something up sooner rather than later - I think it captures the essence of the proposal even if there are errors or tweaks to be made.

https://www.w3.org/community/architypes/wiki/Alternative_1_model_proposal

Comments welcome.

Owen

Owen Stephens
Owen Stephens Consulting
Web: http://www.ostephens.com
Email: owen@ostephens.com
Telephone: 0121 288 6936

> On 12 Jul 2017, at 14:22, Richard Wallis <richard.wallis@dataliberate.com> wrote:
> 
> Yes Owen, lets see what that proposal might look like and compare the two.
> 
> ~Richard.
> 
> Richard Wallis
> Founder, Data Liberate
> http://dataliberate.com <http://dataliberate.com/>
> Linkedin: http://www.linkedin.com/in/richardwallis <http://www.linkedin.com/in/richardwallis>
> Twitter: @rjw
> 
> On 12 July 2017 at 14:09, Owen Stephens <owen@ostephens.com <mailto:owen@ostephens.com>> wrote:
> On 12 Jul 2017, at 13:08, Richard Wallis <richard.wallis@dataliberate.com <mailto:richard.wallis@dataliberate.com>> wrote:
>> 
>> Hi Owen,
>> 
>> Thanks for you input - never too late until the proposal is submitted and adopted :-)
>> 
>> I feel that by focusing on Jane’s particular use case we are looking at an edge case, in Jane’s situation a nevertheless substantial one, but I want to be careful we don’t skew our proposal away from potentially broader generic cases.
> 
> I don’t agree that this is an edge case - or at least I don’t think there is any evidence that it is. If this is the data that Archives Hub is getting, then I think it is reasonable to assume that this is the data that is available for many archives - and in that case many archives will be faced with the same issue of not knowing, on any automated basis, whether a particular description is a collection or an individual item
> 
>> In describing archive collections and the things within them our objective is to make them more discoverable widely on the web.  I believe that in general non-archivists understand the concept of an Archive holding organisation which is responsible for collection(s) that contain things or items.  Reflecting that simplistic understanding was one of the starting points for the proposed model.
> 
> I agree with this - I don’t think that anything I suggested went against this (or at least it wasn’t my intention)
> 
>> 
>> As to your particular points :
>> 
>> The naming of a Type as ArchiveItem or ArchiveObject. Checking some online definitions I see that object and item are both synonyms of each other, however I still feel that the description of item (an individual article or unit, especially one that is part of a list, collection, or set.) is closer to what we are trying to express than that of object (a material thing that can be seen and touched)
> 
> Fair enough - I was trying to find a term that didn’t suggest either an item or a collection - I obviously failed!
> 
>> 
>> Effectively merging the properties of ArchiveCollection and ArchiveItem. Using other types to define not only if it is a collection or not, but also what type of item it is, I Believe may be pushing the multi-type generic capabilities of Schema.org <http://schema.org/> a little too far to be understandable to implementing archivists.  It has many similarities to my original proposal where ArchiveCollection was made a subtype of both Collection and ArchiveItem an approach, although logically correct, caused much discussion and confusion early on. 
> 
> I can see the potential to create confusion here, but I think this already exists in the current proposal which mixes two approaches to adding archive properties to a Thing. I think my proposal is simpler in that it adopts a single way of doing this. I’m not entirely happy with this (I was initially against the use of Intangible type) but I’d argue it is simpler as it reduces the number of new types and groups all the relevant properties in a single type.
> 
>> 
>> So going back to the proposal as it currently stands, it works well when you know what you are describing - an item or collection of items held by organisation.
>> 
>> Where it is difficult is when you don’t know what you are describing.  What do you default to?  The two options being a collection (which would be wrong if it is for example an individual document) or; an item (which would be wrong if for example it was a folder containing several as yet to be described items).  Whichever of these are chosen it will be wrong some of the time.
>> 
>> My thoughts are that it should be up to the describing organisation to decide, based on probabilities within their collection(s), as to which of these to choose.  Not ideal, but I believe preferential when compared with creating a fuzzy type that would work for either case but loose useful specificity when what is being described is known to be an individual item or a collection of items.
> 
> While I don’t disagree its up to the describing organisation to decide (of course), it’s about the decision they are having to make.
> I’m proposing that the question of whether it is a Collection or not should be separate to whether the thing is in an Archive or not. At the moment this seems problematic as you have decide up front whether you want to use ArchiveCollection or ArchiveItem.
> 
> The intent of my proposal was to separate out the question of ‘what’ it is, from the fact it is in an archive and therefore has a set of archive specific properties related to it.
> 
> I’m inclined to write up a proposal using this approach on the wiki so I can be a bit more explicit and we can see how the approaches compare. Does this seem like a good way forward?
> Does anyone else have a comment or view on this?
> 
> Owen
> 
> 
>> 
>> ~Richard.
>> 
>> 
>> 
>> 
>> 
>> 
>> Richard Wallis
>> Founder, Data Liberate
>> http://dataliberate.com <http://dataliberate.com/>
>> Linkedin: http://www.linkedin.com/in/richardwallis <http://www.linkedin.com/in/richardwallis>
>> Twitter: @rjw
>> 
>> On 12 July 2017 at 11:56, Owen Stephens <owen@ostephens.com <mailto:owen@ostephens.com>> wrote:
>> Hi all,
>> 
>> I’ve not had the time to contribute to this discussion so much, and it’s good to see some practical progress, but this latest point has brought me back to some slight unhappiness with the structure of the current proposal and the use of ArchiveItem. Apologies if this is either too late, or I’ve missed how the model has developed over the last few months. I’m looking at https://www.w3.org/community/architypes/wiki/Initial_model_proposal <https://www.w3.org/community/architypes/wiki/Initial_model_proposal>
>> 
>> As I understand it, the current proposal has ArchiveCollection as a subtype of Collection (which is a CreativeWork), while ArchiveItem is an intangible, and intended to be applied alongside other types (such as CreativeWork or Thing) to enable the addition of ArchiveItem properties to existing sdc types.
>> 
>> The case that Jane has highlighted here is that it is unknown whether what we are looking at is a Collection or a specific Item.
>> 
>> In this case, giving something that maybe a collection or maybe an item the type ArchiveCollection, seems wrong - it suggests a level of specificity we don’t know.
>> Also it seems to me that giving it a type ArchiveItem doesn’t imply it is actually a specific item - because ArchiveItem can be applied alongside other types (presumably including ArchiveCollection).
>> 
>> So it would make more sense to me in this case to state that the thing is a Thing or CreativeWork, with an additional type of ArchiveItem - this doesn’t imply it is either a single item or a collection, it would leave this open to question - which seems to me to reflect the reality of the situation.
>> 
>> Trying to draw this up into the modelling of archives in scd, the question it brings me to - is what is the advantage of splitting archival properties between ArchiveCollection and ArchiveItem? Why not bundle all the properties (there aren’t that many) into a single type based on intangible (taking the current ArchiveItem approach) - I’ll call it ‘ArchiveObject’ for now. When you know you have a Collection you apply type of Collection and ArchiveObject, and when you have a CreativeWork you apply type of CreativeWork and ArchiveObject etc.
>> 
>> At the moment applying ArchiveCollection when you aren’t sure whether it is actually a Collection seems wrong to me. If there is any ambiguity then I think you can apply ArchiveItem (you know it is in an Archive) but you can’t assert Collection.
>> 
>> Owen
>> 
>> Owen Stephens
>> Owen Stephens Consulting
>> Web: http://www.ostephens.com <http://www.ostephens.com/>
>> Email: owen@ostephens.com <mailto:owen@ostephens.com>
>> Telephone: 0121 288 6936
>> 
>> > On 12 Jul 2017, at 11:29, Jane Stevenson <Jane.Stevenson@jisc.ac.uk <mailto:Jane.Stevenson@jisc.ac.uk>> wrote:
>> >
>> > Hi Richard,
>> >
>> > Yes, we are an awkward case! But at least we then bring benefits to over 300 repositories when we implement schema.org <http://schema.org/>.
>> >
>> >> As to your A/B decision, I can only suggest from a non archivist point of view, but if something has already been identified in someway as an item or piece, it would be worth reflecting that in the description shared with the web (using the ArchiveItem type), then defaulting, in your case, to ArchiveCollection where this is not known.
>> >>
>> > Perfect - I was going to go with that, as I’m thinking be accurate where you can be accurate.
>> >
>> >> A minor syntax point:  The convention within Schema.org <http://schema.org/> is for the names of Types to begin with an uppercase letter (Archive, ArchiveCollection, ArchiveItem)  and properties with a lowercase (ItemLocation, holdingArchive, accessConditions, etc.).   I know we are only in discussion mode, but looking back on this documentation it can be confusing for some if we don’t follow these conventions here as well as in the type definitions etc.
>> >
>> > Thanks. I may have been a bit inconsistent with this….but we’ll ensure we implement it correctly.
>> >
>> > OK….we’ll crack on then.
>> >
>> > Thanks to all - the discussion has been really useful.
>> >
>> > cheers,
>> > Jane
>> >
>> >
>> > Jane Stevenson
>> > Archives Hub Service Manager
>> > jane.stevenson@jisc.ac.uk <mailto:jane.stevenson@jisc.ac.uk>
>> > (Work days: Monday to Thursday)
>> >
>> > Tel: 0161 413 7555
>> > Web: archiveshub.jisc ac.uk <http://ac.uk/>
>> > Skype:  janestevenson
>> > Twitter: @archiveshub, @janestevenson
>> >
>> >
>> >
>> >> On 12 Jul 2017, at 10:20, Richard Wallis <richard.wallis@dataliberate.com <mailto:richard.wallis@dataliberate.com>> wrote:
>> >>
>> >> Thanks Jane for your insight into the issues surrounding this within Archives Hub.  As effectively an aggregator of archives this provides a test of the model at one end of the spectrum of use cases we are looking to satisfy.
>> >>
>> >> As you say, from the information you are provided with you may not know if something being described is a collection or a single item.  Also it is unlikely that you would know if a single item is located with the rest of the collection or not.
>> >>
>> >> Those responsible for other individual archives may well be very clear on these things for their collections.  Hopefully we are in a position to satisfy the broad spectrum of use cases with this proposal.
>> >>
>> >> As to your A/B decision, I can only suggest from a non archivist point of view, but if something has already been identified in someway as an item or piece, it would be worth reflecting that in the description shared with the web (using the ArchiveItem type), then defaulting, in your case, to ArchiveCollection where this is not known.
>> >>
>> >> If there are no further discussion points from the group, I intend in the next couple of weeks to forward this proposal to the Schema.org <http://schema.org/> group for consideration.
>> >>
>> >> A minor syntax point:  The convention within Schema.org <http://schema.org/> is for the names of Types to begin with an uppercase letter (Archive, ArchiveCollection, ArchiveItem)  and properties with a lowercase (ItemLocation, holdingArchive, accessConditions, etc.).   I know we are only in discussion mode, but looking back on this documentation it can be confusing for some if we don’t follow these conventions here as well as in the type definitions etc.
>> >>
>> >> ~Richard.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Richard Wallis
>> >> Founder, Data Liberate
>> >> http://dataliberate.com <http://dataliberate.com/>
>> >> Linkedin: http://www.linkedin.com/in/richardwallis <http://www.linkedin.com/in/richardwallis>
>> >> Twitter: @rjw
>> >>
>> >> On 12 July 2017 at 09:36, Jane Stevenson <Jane.Stevenson@jisc.ac.uk <mailto:Jane.Stevenson@jisc.ac.uk>> wrote:
>> >> Hi Richard,
>> >>
>> >>> It would work to describe a collection of one or more things. However, if you have a known physical item (book, article, photograph, etc) or file (video, audio, image, web page, etc.) why would you not describe it as such?
>> >>
>> >> This is the nub of the matter….it is because we won’t always know. We can definitely decide that if the level is described as “item” we apply the archiveItem type. But (1) levels are not always given values - although on the Hub we do ask for this, but in general, within EAD, values are not mandatory (2) You can have a level that is a sub-series, or a folder or a file that is effectively one physical item, but the level value does not identify this. Archivists will describe ‘one folder’ but it may have one item in it.  Is something described as ‘one folder’ an item? Should ‘one box’ always be treated as a collection of items, although it may only have one item in it , e.g. an account book is a sub-series in one box.
>> >>
>> >> It is maybe possible for an individual repository to sort out single item descriptions  from ‘more than one item’ descriptions, but its not possible for us to do that in an automated way across all our data. People aren’t consistent enough with cataloguing for that, and to be fair, the standards have never emphasised the importance of distinguishing one physical item in this way.
>> >>
>> >>> This comes back to describing information about an individual item.  Potentially the ArchiveCollection the item is part of could be held by an organisation (Archive), yet an individual item could be located, on extended loan for example, at a different location.
>> >>
>> >>
>> >> OK. I get the logic. It is just quite rare for that to happen, unlike museums. And if it was temporarily elsewhere, we wouldn’t know. Something on loan would not be flagged as such in the description. But that’s OK - we would always just use the repository as the holding institution, so itemLocation, if we use it, would always have the same value as holdingArchive. If an item was on loan it simply wouldn’t show up in our schema.org <http://schema.org/> data.  I don’t think that matters. As you say, its optional anyway.
>> >>
>> >> I think we’re ready to go now. I just have to decide on either
>> >>
>> >> A. Always use archiveCollection, including for items, because we can’t distinguish all items anyway
>> >> B. use archiveItem where we have a level value of “item” or “piece”, which will give us a majority of items (my estimate is that we would get something like 70% of single entities this way), but it will be the case that a fair number of items won’t be described as items because they don’t have that level value, even if they are single physical entities, so they will be single physical items but described as type archiveCollection.
>> >>
>> >> cheers,
>> >> Jane.
>> >>
>> >>
>> >> Jane Stevenson
>> >> Archives Hub Service Manager
>> >> jane.stevenson@jisc.ac.uk <mailto:jane.stevenson@jisc.ac.uk>
>> >> (Work days: Monday to Thursday)
>> >>
>> >> Tel: 0161 413 7555
>> >> Web: archiveshub.jisc ac.uk <http://ac.uk/>
>> >> Skype:  janestevenson
>> >> Twitter: @archiveshub, @janestevenson
>> >>
>> >>
>> >>
>> >>> On 11 Jul 2017, at 17:26, Richard Wallis <richard.wallis@dataliberate.com <mailto:richard.wallis@dataliberate.com>> wrote:
>> >>>
>> >>> Hi Jane,
>> >>>
>> >>> Sorry for being slow in responding.
>> >>>
>> >>> Answers inline.
>> >>>
>> >>> ~Richard.
>> >>>
>> >>>
>> >>> On 3 July 2017 at 07:48, Jane Stevenson <Jane.Stevenson@jisc.ac.uk <mailto:Jane.Stevenson@jisc.ac.uk>> wrote:
>> >>> Hi Richard and everyone,
>> >>>
>> >>> If I decided to only use #archiveCollection for all of the units of description, would that work?  We don’t necessarily know if units described are single items or more than one item anyway, and it seems to me we can effectively describe each unit with the properties now provided, which is the main thing. So my question is, why would I need to use #archiveItem?
>> >>>
>> >>> It would work to describe a collection of one or more things. However, if you have a known physical item (book, article, photograph, etc) or file (video, audio, image, web page, etc.) why would you not describe it as such?
>> >>>
>> >>>
>> >>> Just one more question…. we have properties archiveHeld and holdingArchive, and we also have itemLocation. How is itemLocation different from holdingArchive? In the example, for Ronnie Barker, itemLocation is given as the V&A Theatre & Performance Archive (URL). But surely the property of holdingArchive would do just as well.
>> >>>
>> >>> This comes back to describing information about an individual item.  Potentially the ArchiveCollection the item is part of could be held by an organisation (Archive), yet an individual item could be located, on extended loan for example, at a different location.
>> >>>
>> >>> All properties within Schema.org <http://schema.org/> are optional, so you probably would only provide an itemLocation when an item is located separate from the holdingArchive of the ArchiveCollection of which it is part.
>> >>>
>> >>> ~Richard.
>> >>>
>> >>>
>> >>> cheers
>> >>> Jane
>> >>>
>> >>> Jane Stevenson
>> >>> Archives Hub Service Manager
>> >>> jane.stevenson@jisc.ac.uk <mailto:jane.stevenson@jisc.ac.uk>
>> >>>
>> >>> Jisc is a registered charity (number 1149740) and a company limited by guarantee which is registered in England under Company No. 5747339, VAT No. GB 197 0632 86. Jisc’s registered office is: One Castlepark, Tower Hill, Bristol, BS2 0JA. T 0203 697 5800.
>> >>>
>> >>> Jisc Services Limited is a wholly owned Jisc subsidiary and a company limited by guarantee which is registered in England under company number 2881024, VAT number GB 197 0632 86. The registered office is: One Castle Park, Tower Hill, Bristol BS2 0JA. T 0203 697 5800.
>> >>>
>> >>
>> >>
>> >
>> 
>> 
> 
>
Received on Friday, 14 July 2017 14:47:32 UTC