- From: Bob Wyman <bob@wyman.us>
- Date: Wed, 5 Apr 2023 23:03:19 -0400
- To: Johannes Ernst <johannes.ernst@gmail.com>
- Cc: public-swicg@w3.org
- Message-ID: <CAA1s49UGaToCUoy9czLjRJi2H3BOtQrMMtWWnd7EqqQhVbyXJA@mail.gmail.com>
Johannes, You wrote: > [Search] is probably far less a technical problem than one of successful > communication I think there are more issues associated with "search" than you suggest. Below are just a few issues, other than those related to "communications," that should be considered: - Rights and Obligations: - Assuming that the law establishes that "All Rights Are Reserved" to the content creator, what rights must be granted, by a creator, to permit search? - What rights are not reserved to the creator? (Note: These may vary by jurisdiction.) - May individuals maintain searchable databases of content received for their own personal use? - What, if any, rights are granted by law and do not need to be granted by creators? (Fair use, etc.?) - What mechanism or syntax or mechanism will be used by creators to express grants of rights to others? (Rights Expression Language?) - Can content creators constrain the audience who may discover their content via search? (i.e. To just members of groups, etc.?) If so, how? - What obligations or limitations do search providers have? - If content is signed, must the result of a search be verifiable? - If a license requires attribution, how is that requirement satisfied? Also, what about licenses embedded in indexed posts? - May content be summarized? If so, to what degree? - If a post includes images or media, must they be retained in the search result? - If creators limit the "right to store or archive" how does that affect search providers? - May content from multiple posts be combined to produce derivative works? (i.e. large language model (LLM) systems?) - Kinds of search: - Retrospective search: Searching for things that have been published in the past (i.e. traditional "search") - Prospective search: Requesting notification whenever an object matching some query is published in the future. - Should results of prospective searches be delivered in the same manner as posts addressed to a user or should they be displayed via some other mechanism - Cross-matching: Enforcement of creator-specified audience constraints on delivery of search results (i.e. While search results must match the searcher's constraints, the searcher's attributes must match the creator's audience constraints. See question about audience-constraints above.) - Search API? - Should the specs be extended to provide a standard search interface, for both retrospective and prospective search? - Should the standard API provide "universal search?" (i.e. both retrospective and prospective search in a single interface) - If a standard API is provided, where should it be defined? - Search addendum to ActivityStreams Collections? - Extension to the ActivityPub Client2Server interface? - Extension to ActivityPub Server2Server interface? - Query syntax? - Traditional text search engine syntax? (Google-like and easy to use) - SQL-like filters (i.e. as in WHERE clauses) - JsonPath? (with XPath for searching within HTML/XML content?) - SPARQL? (Semantic Web, very powerful, but very hard to use.) - How should result rate limits be expressed and enforced? (i.e. no more than XXX results/hour...) - Search implementation: - Are there useful systems for effectively and efficiently implementing distributed <https://en.wikipedia.org/wiki/Distributed_search_engine> or federated search <https://en.wikipedia.org/wiki/Federated_search>? (If so, should normal instances be encouraged to participate in such distributed or federated systems?) Will the European Common DataSpaces <https://dataspaces.info/#concepts> project provide anything of use here? - Can/Should IPFS (InterPlanetary File System <https://en.wikipedia.org/wiki/InterPlanetary_File_System>) be leveraged? - Given that search systems will often have broad audiences, and can be much more resource intensive than Social Web instances, is there a need to find ways to monetize these systems? If so, what means are acceptable? - Alternatives to crawling. (How do we prevent search crawlers from overloading instances?) - FeedMesh for ActivityPub? For blogging, we built a system that allowed major blog search providers (Bloggdigger, Blo.gs, Google, PubSub, VeriSign and Yahoo) to share what their crawlers found. This reduced load on individual blogs and also ensured that all search providers distinguished their services based on their quality of service, not just the number of blogs they crawled. This may have later led to PubSubHubbub and then to WebSub <https://www.w3.org/TR/websub/>... - WebSub for ActivityPub? We could define a Activity* variant of WebSub to which instances would forward copies of public, searchable posts for distribution to others, including providers of either retrospective or prospective search. This would eliminate the need for search crawlers to impose load on instances. - Can/Should we build standard Web Components <https://www.webcomponents.org/>for the entering of search queries and display of search results in order to make it easier for people to adopt this capability? This is just a quick summary of issues off the top of my head. I'm sure that others in the group can add additional issues that should be considered. bob wyman
Received on Thursday, 6 April 2023 03:03:38 UTC