Digital Research with Regular Expressions

Erick Peirson's picture

This week in class, Julia will be teaching us all how to use regular expressions to find features in texts. Here is an excellent example of how regular expressions can be used in a computational workflow to address a historical question:

Chen, Shih-Pei, Yu-Ming Huang, Jieh Hsiang, Hsieh-Chang Tu, Hou-Ieong Ho, and Ping-Yen Chen. “Discovering Land Transaction Relations from Land Deeds of Taiwan.” Literary and Linguistic Computing 28, no. 2 (June 1, 2013): 257–270. doi:10.1093/llc/fqs063.
Land deeds were the only proof of ownership in pre-1900 Taiwan. They are indispensable for the studies of Taiwan’s social, anthropological, and economic evolution. We have built a full-text digital library that contains almost 40,000 land deeds. The deeds in our collection range over 250 years and are collected from over 100 sources. The unprecedented volume and diversity of the sources provide an exciting source of primary documents for historians. But they also pose an interesting challenge: how to tell if two land deeds are related. In this article, we describe an approach to discover two important relations: successive transactions and allotment agreements involving the same property. Our method enabled us to construct 6,035 such transaction pairs. We also introduce a notion of ‘land transitivity graph’ to capture the transitivity embedded in these transactions. We discovered 2,436 such graphs, the largest of which includes 104 deeds. Some of these graphs involve land behavior that had never been studied before. 

Browser extension for ASU library proxy

Erick Peirson's picture

If you frequently access material in scholarly databases via the ASU library's subscriptions, you may find this helpful: I've thrown together browser extensions for Chrome and Firefox that can help you to quickly access online content via the library's proxy. These extensions add a button to your toolbar that, when clicked, reloads the current page via the library proxy.

You can download the extensions from Github: These have not been extensively tested, so if your browser isn't quite up to date you may encounter problems.

Notes from week 2

Erick Peirson's picture

Tutorials (video and text) are available for building a Zotero collection, extracting and exporting text using Paper Machines, and performing co-occurrence analysis using AntConc. Installation instructions are included in each tutorial; please let us know if they are unclear, or if you have trouble getting things to work. We would greatly appreciate feedback on the tutorial videos, in terms of both content and format. Just leave a comment at the bottom of that tutorial's page.

The co-occurrence tutorial also includes links to the papers that I briefly described at the beginning of class; I think that they give a great sense of the kinds of research that can benefit from co-occurrence analysis.

The plan for week three is available under Weekly Activities. Note the reading for Tuesday: we may not discuss it at length, but this is a great primer for thinking about project design.

If you haven't already talked to us, we'd love to hear more about your ideas for your semester project! Check out the discussion thread here, and don't be shy about commenting on your classmates' ideas. There may be some opportunities for fruitful collaboration!

Finally, thanks for your posts in the Building a Corpus thread. We encourage you to take a look at problems/solutions encountered by others, and offer advice based on your experience.

Have a great weekend!


Subscribe to Introduction to Digital & Computational Methods in the Humanities (HPS) RSS