Re: Updated wish list from Bob Wyman on 2023-04-06 (public-swicg@w3.org from April 2023)

From: Bob Wyman <bob@wyman.us>
Date: Wed, 5 Apr 2023 23:03:19 -0400
To: Johannes Ernst <johannes.ernst@gmail.com>
Cc: public-swicg@w3.org
Message-ID: <CAA1s49UGaToCUoy9czLjRJi2H3BOtQrMMtWWnd7EqqQhVbyXJA@mail.gmail.com>
Johannes,
You wrote:

> [Search] is probably far less a technical problem than one of successful
> communication


I think there are more issues associated with "search" than you suggest.

Below are just a few issues, other than those related to "communications,"
that should be considered:

   - Rights and Obligations:
      - Assuming that the law establishes that "All Rights Are Reserved" to
      the content creator, what rights must be granted, by a creator, to permit
      search?
      - What rights are not reserved to the creator? (Note: These may vary
      by jurisdiction.)
         - May individuals maintain searchable databases of content
         received for their own personal use?
         - What, if any, rights are granted by law and do not need to be
         granted by creators? (Fair use, etc.?)
      - What mechanism or syntax or mechanism will be used by creators to
      express grants of rights to others? (Rights Expression Language?)
      - Can content creators constrain the audience who may discover their
      content via search? (i.e. To just members of groups, etc.?) If so, how?
      - What obligations or limitations do search providers have?
         - If content is signed, must the result of a search be verifiable?
         - If a license requires attribution, how is that requirement
         satisfied? Also, what about licenses embedded in indexed posts?
         - May content be summarized? If so, to what degree?
         - If a post includes images or media, must they be retained in the
         search result?
         - If creators limit the "right to store or archive" how does that
         affect search providers?
         - May content from multiple posts be combined to produce
         derivative works? (i.e. large language model (LLM) systems?)
      - Kinds of search:
      - Retrospective search: Searching for things that have been published
      in the past (i.e. traditional "search")
      - Prospective search: Requesting notification whenever an object
      matching some query is published in the future.
         - Should results of prospective searches be delivered in the same
         manner as posts addressed to a user or should they be
displayed via some
         other mechanism
      - Cross-matching: Enforcement of creator-specified audience
      constraints on delivery of search results (i.e. While search results must
      match the searcher's constraints, the searcher's attributes must
match the
      creator's audience constraints. See question about audience-constraints
      above.)
   - Search API?
      - Should the specs be extended to provide a standard search
      interface, for both retrospective and prospective search?
      - Should the standard API provide "universal search?" (i.e. both
      retrospective and prospective search in a single interface)
      - If a standard API is provided, where should it be defined?
         - Search addendum to ActivityStreams Collections?
         - Extension to the ActivityPub Client2Server interface?
         - Extension to ActivityPub Server2Server interface?
      - Query syntax?
         - Traditional text search engine syntax? (Google-like and easy to
         use)
         - SQL-like filters (i.e. as in WHERE clauses)
         - JsonPath? (with XPath for searching within HTML/XML content?)
         - SPARQL? (Semantic Web, very powerful, but very hard to use.)
      - How should result rate limits be expressed and enforced? (i.e. no
      more than XXX results/hour...)
   - Search implementation:
      - Are there useful systems for effectively and efficiently
      implementing distributed
      <https://en.wikipedia.org/wiki/Distributed_search_engine> or federated
      search <https://en.wikipedia.org/wiki/Federated_search>? (If so,
      should normal instances be encouraged to participate in such
distributed or
      federated systems?) Will the European Common DataSpaces
      <https://dataspaces.info/#concepts> project provide anything of use
      here?
      - Can/Should IPFS (InterPlanetary File System
      <https://en.wikipedia.org/wiki/InterPlanetary_File_System>) be
      leveraged?
      - Given that search systems will often have broad audiences, and can
      be much more resource intensive than Social Web instances, is
there a need
      to find ways to monetize these systems? If so, what means are acceptable?
      - Alternatives to crawling. (How do we prevent search crawlers from
      overloading instances?)
         - FeedMesh for ActivityPub? For blogging, we built a system that
         allowed major blog search providers (Bloggdigger, Blo.gs,
Google, PubSub,
         VeriSign and Yahoo) to share what their crawlers found. This
reduced load
         on individual blogs and also ensured that all search providers
         distinguished their services based on their quality of
service, not just
         the number of blogs they crawled. This may have later led to
PubSubHubbub
         and then to WebSub <https://www.w3.org/TR/websub/>...
         - WebSub for ActivityPub? We could define a Activity* variant of
         WebSub to which instances would forward copies of public,
searchable posts
         for distribution to others, including providers of either
retrospective or
         prospective search. This would eliminate the need for search
crawlers to
         impose load on instances.
      - Can/Should we build standard Web Components
   <https://www.webcomponents.org/>for the entering of search queries and
   display of search results in order to make it easier for people to adopt
   this capability?

This is just a quick summary of issues off the top of my head. I'm sure
that others in the group can add additional issues that should be
considered.

bob wyman
Received on Thursday, 6 April 2023 03:03:38 UTC