Full information with methods and example data analyses are available for this database as a Nature Methods paper [1]. The methodology is based on a combination of two techniques. The first is topic modeling, using Latent Dirichlet Allocation [2-5], a Bayesian statistical algorithm that automatically discovers meaningful categories from unstructured text, independent of keywords or preconceived categorical designations. The second is a graph-based layout algorithm, [6-8] which produces a two-dimensional visualized output in which documents are clustered based on their overall topic- and word-based similarity to one another. These two complementary methods are combined in an interactive web-based format [9] that provides a context in which grants are categorized and clustered based on the language used by researchers.


