Tag Archives: topic modeling

Comparing methods to extract technical content for technological intelligence

We are developing indicators for the emergence of science and technology (S&T) topics. To do so, we extract information from various S&T information resources. This paper compares alternative ways of consolidating messy sets of key terms [e.g., using Natural Language Processing on abstracts and titles, together with various keyword sets]. Our process includes combinations of stopword removal, fuzzy term matching, association rules, and term commonality weighting. We compare topic modeling to Principal Components Analysis for a test set of 4104 abstract records on Dye-Sensitized Solar Cells. Results suggest potential to enhance understanding regarding technological topics to help track technological emergence.

Author(s): Nils C. Newman, Alan L. Porter, David Newman, Cherie Courseault Trumbach and Stephanie D. Bolan
Organization(s): Georgia Institute of Technology, University of California, University of New Orleans
Source: Journal of Engineering and Technology Management
Year: 2014
http://www.sciencedirect.com/science/article/pii/S0923474813000556

Clustering scientific documents with topic modeling

Topic modeling is a type of statistical model for discovering the latent “topics” that occur in a collection of documents through machine learning. Currently, latent Dirichlet allocation (LDA) is a popular and common modeling approach. In this paper, we investigate methods, including LDA and its extensions, for separating a set of scientific publications into several clusters. Continue reading Clustering scientific documents with topic modeling