Re: Tweaks to the Archives proposal [via Schema Architypes Community Group] from Richard Wallis on 2017-07-12 (public-architypes@w3.org from July 2017)

From: Richard Wallis <richard.wallis@dataliberate.com>
Date: Wed, 12 Jul 2017 14:22:51 +0100
To: Owen Stephens <owen@ostephens.com>
Cc: public-architypes <public-architypes@w3.org>, Jane Stevenson <Jane.Stevenson@jisc.ac.uk>
Message-ID: <CAD47Kz6xg9ZaMC0b_fMbuYfi8Tz5J1C3VVmMpOvLdaJfBxwVmA@mail.gmail.com>
Yes Owen, lets see what that proposal might look like and compare the two.

~Richard.

Richard Wallis
Founder, Data Liberate
http://dataliberate.com
Linkedin: http://www.linkedin.com/in/richardwallis
Twitter: @rjw

On 12 July 2017 at 14:09, Owen Stephens <owen@ostephens.com> wrote:

> On 12 Jul 2017, at 13:08, Richard Wallis <richard.wallis@dataliberate.com>
> wrote:
>
>
> Hi Owen,
>
> Thanks for you input - never too late until the proposal is submitted and
> adopted :-)
>
> I feel that by focusing on Jane’s particular use case we are looking at an
> edge case, in Jane’s situation a nevertheless substantial one, but I want
> to be careful we don’t skew our proposal away from potentially broader
> generic cases.
>
>
> I don’t agree that this is an edge case - or at least I don’t think there
> is any evidence that it is. If this is the data that Archives Hub is
> getting, then I think it is reasonable to assume that this is the data that
> is available for many archives - and in that case many archives will be
> faced with the same issue of not knowing, on any automated basis, whether a
> particular description is a collection or an individual item
>
> In describing archive collections and the things within them our objective
> is to make them more discoverable widely on the web.  I believe that in
> general non-archivists understand the concept of an Archive holding
> organisation which is responsible for collection(s) that contain things or
> items.  Reflecting that simplistic understanding was one of the starting
> points for the proposed model.
>
> I agree with this - I don’t think that anything I suggested went against
> this (or at least it wasn’t my intention)
>
>
> As to your particular points :
>
> *The naming of a Type as ArchiveItem or ArchiveObject*. Checking some
> online definitions I see that object and item are both synonyms of each
> other, however I still feel that the description of item (*an individual
> article or unit, especially one that is part of a list, collection, or set.*)
> is closer to what we are trying to express than that of object (*a
> material thing that can be seen and touched*)
>
> Fair enough - I was trying to find a term that didn’t suggest either an
> item or a collection - I obviously failed!
>
>
> *Effectively merging the properties of ArchiveCollection and ArchiveItem.* Using
> other types to define not only if it is a collection or not, but also what
> type of item it is, I Believe may be pushing the multi-type generic
> capabilities of Schema.org a little too far to be understandable to
> implementing archivists.  It has many similarities to my original proposal
> where *ArchiveCollection* was made a subtype of both *Collection* and
> *ArchiveItem* an approach, although logically correct, caused much
> discussion and confusion early on.
>
> I can see the potential to create confusion here, but I think this already
> exists in the current proposal which mixes two approaches to adding archive
> properties to a Thing. I think my proposal is simpler in that it adopts a
> single way of doing this. I’m not entirely happy with this (I was initially
> against the use of Intangible type) but I’d argue it is simpler as it
> reduces the number of new types and groups all the relevant properties in a
> single type.
>
>
> So going back to the proposal as it currently stands, it works well when
> you know what you are describing - an item or collection of items held by
> organisation.
>
> Where it is difficult is when you don’t know what you are describing.
> What do you default to?  The two options being a collection (which would be
> wrong if it is for example an individual document) or; an item (which would
> be wrong if for example it was a folder containing several as yet to be
> described items).  Whichever of these are chosen it will be wrong some of
> the time.
>
> My thoughts are that it should be up to the describing organisation to
> decide, based on probabilities within their collection(s), as to which of
> these to choose.  Not ideal, but I believe preferential when compared with
> creating a fuzzy type that would work for either case but loose useful
> specificity when what is being described is known to be an individual item
> or a collection of items.
>
>
> While I don’t disagree its up to the describing organisation to decide (of
> course), it’s about the decision they are having to make.
> I’m proposing that the question of whether it is a Collection or not
> should be separate to whether the thing is in an Archive or not. At the
> moment this seems problematic as you have decide up front whether you want
> to use ArchiveCollection or ArchiveItem.
>
> The intent of my proposal was to separate out the question of ‘what’ it
> is, from the fact it is in an archive and therefore has a set of archive
> specific properties related to it.
>
> I’m inclined to write up a proposal using this approach on the wiki so I
> can be a bit more explicit and we can see how the approaches compare. Does
> this seem like a good way forward?
> Does anyone else have a comment or view on this?
>
> Owen
>
>
>
> ~Richard.
>
>
>
>
>
>
> Richard Wallis
> Founder, Data Liberate
> http://dataliberate.com
> Linkedin: http://www.linkedin.com/in/richardwallis
> Twitter: @rjw
>
> On 12 July 2017 at 11:56, Owen Stephens <owen@ostephens.com> wrote:
>
>> Hi all,
>>
>> I’ve not had the time to contribute to this discussion so much, and it’s
>> good to see some practical progress, but this latest point has brought me
>> back to some slight unhappiness with the structure of the current proposal
>> and the use of ArchiveItem. Apologies if this is either too late, or I’ve
>> missed how the model has developed over the last few months. I’m looking at
>> https://www.w3.org/community/architypes/wiki/Initial_model_proposal
>>
>> As I understand it, the current proposal has ArchiveCollection as a
>> subtype of Collection (which is a CreativeWork), while ArchiveItem is an
>> intangible, and intended to be applied alongside other types (such as
>> CreativeWork or Thing) to enable the addition of ArchiveItem properties to
>> existing sdc types.
>>
>> The case that Jane has highlighted here is that it is unknown whether
>> what we are looking at is a Collection or a specific Item.
>>
>> In this case, giving something that maybe a collection or maybe an item
>> the type ArchiveCollection, seems wrong - it suggests a level of
>> specificity we don’t know.
>> Also it seems to me that giving it a type ArchiveItem doesn’t imply it is
>> actually a specific item - because ArchiveItem can be applied alongside
>> other types (presumably including ArchiveCollection).
>>
>> So it would make more sense to me in this case to state that the thing is
>> a Thing or CreativeWork, with an additional type of ArchiveItem - this
>> doesn’t imply it is either a single item or a collection, it would leave
>> this open to question - which seems to me to reflect the reality of the
>> situation.
>>
>> Trying to draw this up into the modelling of archives in scd, the
>> question it brings me to - is what is the advantage of splitting archival
>> properties between ArchiveCollection and ArchiveItem? Why not bundle all
>> the properties (there aren’t that many) into a single type based on
>> intangible (taking the current ArchiveItem approach) - I’ll call it
>> ‘ArchiveObject’ for now. When you know you have a Collection you apply type
>> of Collection and ArchiveObject, and when you have a CreativeWork you apply
>> type of CreativeWork and ArchiveObject etc.
>>
>> At the moment applying ArchiveCollection when you aren’t sure whether it
>> is actually a Collection seems wrong to me. If there is any ambiguity then
>> I think you can apply ArchiveItem (you know it is in an Archive) but you
>> can’t assert Collection.
>>
>> Owen
>>
>> Owen Stephens
>> Owen Stephens Consulting
>> Web: http://www.ostephens.com
>> Email: owen@ostephens.com
>> Telephone: 0121 288 6936
>>
>> > On 12 Jul 2017, at 11:29, Jane Stevenson <Jane.Stevenson@jisc.ac.uk>
>> wrote:
>> >
>> > Hi Richard,
>> >
>> > Yes, we are an awkward case! But at least we then bring benefits to
>> over 300 repositories when we implement schema.org.
>> >
>> >> As to your A/B decision, I can only suggest from a non archivist point
>> of view, but if something has already been identified in someway as an item
>> or piece, it would be worth reflecting that in the description shared with
>> the web (using the ArchiveItem type), then defaulting, in your case, to
>> ArchiveCollection where this is not known.
>> >>
>> > Perfect - I was going to go with that, as I’m thinking be accurate
>> where you can be accurate.
>> >
>> >> A minor syntax point:  The convention within Schema.org is for the
>> names of Types to begin with an uppercase letter (Archive,
>> ArchiveCollection, ArchiveItem)  and properties with a lowercase
>> (ItemLocation, holdingArchive, accessConditions, etc.).   I know we are
>> only in discussion mode, but looking back on this documentation it can be
>> confusing for some if we don’t follow these conventions here as well as in
>> the type definitions etc.
>> >
>> > Thanks. I may have been a bit inconsistent with this….but we’ll ensure
>> we implement it correctly.
>> >
>> > OK….we’ll crack on then.
>> >
>> > Thanks to all - the discussion has been really useful.
>> >
>> > cheers,
>> > Jane
>> >
>> >
>> > Jane Stevenson
>> > Archives Hub Service Manager
>> > jane.stevenson@jisc.ac.uk
>> > (Work days: Monday to Thursday)
>> >
>> > Tel: 0161 413 7555
>> > Web: archiveshub.jisc ac.uk
>> > Skype:  janestevenson
>> > Twitter: @archiveshub, @janestevenson
>> >
>> >
>> >
>> >> On 12 Jul 2017, at 10:20, Richard Wallis <
>> richard.wallis@dataliberate.com> wrote:
>> >>
>> >> Thanks Jane for your insight into the issues surrounding this within
>> Archives Hub.  As effectively an aggregator of archives this provides a
>> test of the model at one end of the spectrum of use cases we are looking to
>> satisfy.
>> >>
>> >> As you say, from the information you are provided with you may not
>> know if something being described is a collection or a single item.  Also
>> it is unlikely that you would know if a single item is located with the
>> rest of the collection or not.
>> >>
>> >> Those responsible for other individual archives may well be very clear
>> on these things for their collections.  Hopefully we are in a position to
>> satisfy the broad spectrum of use cases with this proposal.
>> >>
>> >> As to your A/B decision, I can only suggest from a non archivist point
>> of view, but if something has already been identified in someway as an item
>> or piece, it would be worth reflecting that in the description shared with
>> the web (using the ArchiveItem type), then defaulting, in your case, to
>> ArchiveCollection where this is not known.
>> >>
>> >> If there are no further discussion points from the group, I intend in
>> the next couple of weeks to forward this proposal to the Schema.org
>> group for consideration.
>> >>
>> >> A minor syntax point:  The convention within Schema.org is for the
>> names of Types to begin with an uppercase letter (Archive,
>> ArchiveCollection, ArchiveItem)  and properties with a lowercase
>> (ItemLocation, holdingArchive, accessConditions, etc.).   I know we are
>> only in discussion mode, but looking back on this documentation it can be
>> confusing for some if we don’t follow these conventions here as well as in
>> the type definitions etc.
>> >>
>> >> ~Richard.
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> Richard Wallis
>> >> Founder, Data Liberate
>> >> http://dataliberate.com
>> >> Linkedin: http://www.linkedin.com/in/richardwallis
>> >> Twitter: @rjw
>> >>
>> >> On 12 July 2017 at 09:36, Jane Stevenson <Jane.Stevenson@jisc.ac.uk>
>> wrote:
>> >> Hi Richard,
>> >>
>> >>> It would work to describe a collection of one or more things.
>> However, if you have a known physical item (book, article, photograph, etc)
>> or file (video, audio, image, web page, etc.) why would you not describe it
>> as such?
>> >>
>> >> This is the nub of the matter….it is because we won’t always know. We
>> can definitely decide that if the level is described as “item” we apply the
>> archiveItem type. But (1) levels are not always given values - although on
>> the Hub we do ask for this, but in general, within EAD, values are not
>> mandatory (2) You can have a level that is a sub-series, or a folder or a
>> file that is effectively one physical item, but the level value does not
>> identify this. Archivists will describe ‘one folder’ but it may have one
>> item in it.  Is something described as ‘one folder’ an item? Should ‘one
>> box’ always be treated as a collection of items, although it may only have
>> one item in it , e.g. an account book is a sub-series in one box.
>> >>
>> >> It is maybe possible for an individual repository to sort out single
>> item descriptions  from ‘more than one item’ descriptions, but its not
>> possible for us to do that in an automated way across all our data. People
>> aren’t consistent enough with cataloguing for that, and to be fair, the
>> standards have never emphasised the importance of distinguishing one
>> physical item in this way.
>> >>
>> >>> This comes back to describing information about an individual item.
>> Potentially the ArchiveCollection the item is part of could be held by an
>> organisation (Archive), yet an individual item could be located, on
>> extended loan for example, at a different location.
>> >>
>> >>
>> >> OK. I get the logic. It is just quite rare for that to happen, unlike
>> museums. And if it was temporarily elsewhere, we wouldn’t know. Something
>> on loan would not be flagged as such in the description. But that’s OK - we
>> would always just use the repository as the holding institution, so
>> itemLocation, if we use it, would always have the same value as
>> holdingArchive. If an item was on loan it simply wouldn’t show up in our
>> schema.org data.  I don’t think that matters. As you say, its optional
>> anyway.
>> >>
>> >> I think we’re ready to go now. I just have to decide on either
>> >>
>> >> A. Always use archiveCollection, including for items, because we can’t
>> distinguish all items anyway
>> >> B. use archiveItem where we have a level value of “item” or “piece”,
>> which will give us a majority of items (my estimate is that we would get
>> something like 70% of single entities this way), but it will be the case
>> that a fair number of items won’t be described as items because they don’t
>> have that level value, even if they are single physical entities, so they
>> will be single physical items but described as type archiveCollection.
>> >>
>> >> cheers,
>> >> Jane.
>> >>
>> >>
>> >> Jane Stevenson
>> >> Archives Hub Service Manager
>> >> jane.stevenson@jisc.ac.uk
>> >> (Work days: Monday to Thursday)
>> >>
>> >> Tel: 0161 413 7555
>> >> Web: archiveshub.jisc ac.uk
>> >> Skype:  janestevenson
>> >> Twitter: @archiveshub, @janestevenson
>> >>
>> >>
>> >>
>> >>> On 11 Jul 2017, at 17:26, Richard Wallis <
>> richard.wallis@dataliberate.com> wrote:
>> >>>
>> >>> Hi Jane,
>> >>>
>> >>> Sorry for being slow in responding.
>> >>>
>> >>> Answers inline.
>> >>>
>> >>> ~Richard.
>> >>>
>> >>>
>> >>> On 3 July 2017 at 07:48, Jane Stevenson <Jane.Stevenson@jisc.ac.uk>
>> wrote:
>> >>> Hi Richard and everyone,
>> >>>
>> >>> If I decided to only use #archiveCollection for all of the units of
>> description, would that work?  We don’t necessarily know if units described
>> are single items or more than one item anyway, and it seems to me we can
>> effectively describe each unit with the properties now provided, which is
>> the main thing. So my question is, why would I need to use #archiveItem?
>> >>>
>> >>> It would work to describe a collection of one or more things.
>> However, if you have a known physical item (book, article, photograph, etc)
>> or file (video, audio, image, web page, etc.) why would you not describe it
>> as such?
>> >>>
>> >>>
>> >>> Just one more question…. we have properties archiveHeld and
>> holdingArchive, and we also have itemLocation. How is itemLocation
>> different from holdingArchive? In the example, for Ronnie Barker,
>> itemLocation is given as the V&A Theatre & Performance Archive (URL). But
>> surely the property of holdingArchive would do just as well.
>> >>>
>> >>> This comes back to describing information about an individual item.
>> Potentially the ArchiveCollection the item is part of could be held by an
>> organisation (Archive), yet an individual item could be located, on
>> extended loan for example, at a different location.
>> >>>
>> >>> All properties within Schema.org are optional, so you probably would
>> only provide an itemLocation when an item is located separate from the
>> holdingArchive of the ArchiveCollection of which it is part.
>> >>>
>> >>> ~Richard.
>> >>>
>> >>>
>> >>> cheers
>> >>> Jane
>> >>>
>> >>> Jane Stevenson
>> >>> Archives Hub Service Manager
>> >>> jane.stevenson@jisc.ac.uk
>> >>>
>> >>> Jisc is a registered charity (number 1149740) and a company limited
>> by guarantee which is registered in England under Company No. 5747339, VAT
>> No. GB 197 0632 86. Jisc’s registered office is: One Castlepark, Tower
>> Hill, Bristol, BS2 0JA. T 0203 697 5800.
>> >>>
>> >>> Jisc Services Limited is a wholly owned Jisc subsidiary and a company
>> limited by guarantee which is registered in England under company number
>> 2881024, VAT number GB 197 0632 86. The registered office is: One Castle
>> Park, Tower Hill, Bristol BS2 0JA. T 0203 697 5800.
>> >>>
>> >>
>> >>
>> >
>>
>>
>
>
Received on Wednesday, 12 July 2017 13:23:28 UTC