Federation for Preprint Servers ←

I’ve started thinking recently about what it would look like to federate preprint servers. Federation involves a common language between servers to better enable communication and interoperability across multiple servers/services/clients, as opposed to centralization where all of the user connections and interactions occur on the servers of one operator. In social media, Twitter would be an example of a centralized service, where all Twitter users must have accounts with Twitter in order to interact with each other on that platform. Compare this with Mastodon, which provides a Twitter-like experience, but users are not required to have an account from any one provider in order to interact with users on the network, or Fediverse. In this way, users can identify with (create an account on) the server of the community that they feel the most connected to, but regardless of which community they choose they can still interact with users of every other community.

So would a similar model be possible for preprint servers?

Over the last few years, we have seen a drastic uptick in the number of preprint servers available to researchers. The software behind each server may be different, but the core functionality is the same, allow users to publish their work to the internet in a way that makes it accessible to the public with some basic metadata attached. If the various servers were able to communicate using a common protocol, perhaps something like ActivityPub, which is used by federated services such as Mastodon, then users are free to join any server they wish, while ensuring that their work is discoverable across the broader network.

In practice, an individual researcher or their institution would also be able to start their own server to host their own research, while enabling greater reach by allowing the research to distribute across a server network. Many universities and other institutions already operate institutional repositories today, but they are often cut-off from the rest of the network by being silo’d on the basis of employer or funder. Similarly, disciplinary repositories segregate research along the lines of subject area. However, this can sometimes make it difficult to discover work that is interdisciplinary or otherwise doesn’t fit the mold of traditional disciplinary boundaries.

Is this issue solved by indexing services?

Perhaps this idea is not representative of a real problem or this problem is already solved by indexing services such as Google Scholar, Microsoft Academic, or the newer Internet Archive Scholar. These services operate like specialized web search engines and scrape meta-data from preprint servers, journal publishers, and sometimes from researcher’s personal websites in order to make research products discoverable from their search boxes. Personally, I think that allowing the indexing services to be the arbiters of what constitutes scholarly research and what is therefore surfaced in the index grants them too much control. As someone who has been publishing for a while now, both as an author/researcher and as a journal editor and director of a preprint server, I am aware of how much of an unknown quantity it is regarding what gets indexed by a service like Google Scholar and what gets left out.

What would it take?

I should point out that nothing I’ve discussed here is really unique to preprint servers and couldn’t equally be applied to the entire academic publishing enterprise. There isn’t any particular reason why academic journals couldn’t also be federated in the same way that I am proposing for preprint servers. In fact, allowing individuals or smaller institutions to join a federated network on equal footing may help us to retake ownership of academic publishing and begin to alleviate the challenges faced by disadvantaged researchers. Of course, the issues discussed here are nothing new and people have been discussing and proposing new systems and infrastructure for the act of scholarship for a long time.

Considering all of this together, what would the academic publishing infrastructure look like if research was published on a federated network of servers that were all working together to enable broad discoverability and linking of research products? Data linked to plots, plots links to manuscripts, manuscripts linked to peer reviews. All discoverable from any one community chosen by the user and not controlled by any one publisher, funding agency, or institution. No one provider can dictate what constitutes scholarship or what is allowed on the network.

Disclosure: I am the founder and director of Engineering Archive.

So would a similar model be possible for preprint servers?

Is this issue solved by indexing services?

What would it take?

Comments