Our aspiration is for this to be an open, collaborative project, with individuals and organization of all sizes able to participate. There is not much structure or documentation on how volunteers can get started or be most helpful, but perhaps we can work together on that as well!
The best place to organize and coordinate right now is the gitter chatroom. Gitter is described as "for developers", but we use it for everybody, and you don't need an invitation.
Want to help out? Below are a few example roles you could play.
The user sign-up and editing workflow on fatcat.wiki is currently pretty poor. How could this experience be improved and better documented? Specific ideas, suggestions and diagrams would be very helpful. You don't need to know how to program or about web technologies to contribute; hand drawings and example text can be sufficient.
Are you passionate about Open Access and want to help build a community around preservation and universal access to knowledge? We could use help structuring an editing community, and communicating with partner projects like Wikidata to ensure we are not duplicating efforts.
A good example of a project to organize would be improving journal-level metadata in wikidata, including journal homepages, and linking to fatcat "container" entities.
If you have an interest in a specific scholarly field, you could give us feedback on how good of a job fatcat is doing preserving at-risk open access content. We know we have a lot of work to do, but both specific examples of missing publications, as well as broader patterns and missing holes are helpful to know about. Some missing content we know we don't have, but there are surely entire categories of in-scope content that we do not even know are missing!
Are you an experienced wrangler of BibFrame, MARC, bibtext, RDF, OAI-PMH, and Citation Style Language? Our data model and entity schemas are bespoke (sorry!) and designed to evolved over time. There might be related efforts and new controlled vocabularies we could adopt or align with, or small changes to the schema might enable new use cases. It could be as simple as identifying and prioritizing new external identifiers (PIDs) to allow. Let us know what we got right and what needs improvement!
Are you super experienced with data entry, editing, and corrections? Do you have ideas on how our interface could be improved, or what kinds of new interfaces and tools could be build to support effective editing? Our open API allows third-party interfaces to make edits on individuals' behalf, meaning new tools can be build for specific patterns of editing or user contribution.
We have hundreds of gigabytes of metadata to transform and normalize before importing, and already have a rich open dataset with millions of linked entities. Our elasticsearch analytics database has an open read-only endpoint (https://search.fatcat.wiki), which are used to power our coverage interface. What other interactive visualizations could be built? What tools should we be using to wrangle bibliographic metadata better and faster?
Do you publish research documents, and want to ensure it is accessible to the broadest audience today and in the future? Like many academic search engines, you can add papers and link an author profile to specific publications. Unlike others, you can also ensure uploaded pre-prints and other open versions of your research are found and linked using the "save paper now" feature, and you can any errors made by publishers and bots.
Some of our web interfaces have existing internationalization infrastructure, and translations can be contributed directly.
Other projects need help getting translation infrastructure in place, and all of our projects could use review and recommendations for improvement by experts in web accessibility. For example, if you use a screen reader, feedback on which parts of our services are most difficult to use are very helpful.
Fatcat is structured such that all changes to the catalog go through an open API. This includes human edits through the web interface, but the large majority of edits are made by bots. You could write a new bot to help...
- review human edits (from the "reviewable" queue) to "lint" for typos, missing fields, or other problems, and then leave an annotation
- harvest, transform, and import metadata from addition subject- and region-specific sources
- find and clean-up patterns of poor or incorrect metadata already in the catalog
We have a large (500+ GByte) PostgreSQL database backing the catalog. This is working great so far, but we have concerns about how the catalog will scale further, especially if bots start making multiple updates per entity. You could review our SQL schema and recommend improvements, or give feedback and advice on how to switch to a distributed primary datastore.
Short on time? As a US 501(c)(3) non-profit, the Internet Archive always appreciates and makes good use of donations.
Bugs and patches can be filed on Github at: https://github.com/internetarchive/fatcat
When considering making a non-trivial contribution, it can save review time and duplicated work to post an issue with your intentions and plan. New code and features must include unit tests before being merged, though we can help with writing them.