Building the Open Science Graph
An open graph of data about published research would be broadly useful. It would speed research, ease studying and teaching, and help inform models of the world and of different fields, augmented by analysts, scripts and learning networks.
Today there are few open data sets (Wikidata), a few half-open data sets (the Open Academic Graph, the Lens), and number of competing, closed collections of such data, which fuel overlay services like the Web of Science and Dimensions. Most of these services could be equally well built on top of open data.
Some completable elements that such a science graph could include:
A catalog of articles (current + future, completed set of articles)
With original metadata + abstract, extracted metadata + topics
A unique ID per article + figure; w/ preferred secondary ID
Name resolvers (for people, institutions, article titles, funder institutions)
Data partners + sources (groups currently providing data, under various agreements, maybe not public)
I find it helpful to describing this work in terms of completable subproblems, so that we can look at how much is left to do in each.
e.g.: “The citation graph of all peer-reviewed academic papers in these languages since 1600”
and then “the subset of citation graph 1 in these topical journals since 1900"
What are the completable problems making up SemScholar? Other important efforts?