Automated Handwriting Processing

Erick Peirson's picture

Thanks to Allison for passing on this article about automated handwriting processing! From the abstract:

We describe our efforts with the National Archives and Records Administration (NARA) to provide a form of automated search of handwritten content within large digitized document archives. With a growing push towards the digitization of paper archives there is an imminent need to develop tools capable of searching the resulting unstructured image data as data from such collections offer valuable historical records that can be mined for information pertinent a number of fields from the geosciences to the humanities.

Week 5: Topic Modeling with Paper Machines

Erick Peirson's picture


Topic modeling is the process of using a topic model to discover the hidden (latent) topics that are represented by a large collection of texts. This process involves the use of Bayesian statistics and optimization algorithms, and (unfortunately for the digital humanist) most of the currently-available topic modeling algorithms require the user to have at least basic programming skills. This is beginning to change, however, and in this tutorial we will use a user-friendly tool for latent Dirichlet allocation (LDA) to analyze our texts. 

Week 5: Topic Modeling

Erick Peirson's picture


We're going to mix up the schedule a bit, and jump to latent semantic analysis and topic modeling! Our objective this week is to get our feet wet with some more abstract techniques at the interface of information retrieval, computational linguistics, and statistical modeling. Topic modeling is a super-hot (sometimes contentious!) field in digital humanities, and is being used quite fruitfully to design "smart" search algorithms and find relationships among very large collections of texts.


Subscribe to Introduction to Digital & Computational Methods in the Humanities (HPS) RSS