Thursday, March 20

Erick Peirson's picture

This week is a bit of a mash-up of various things. Tuesday we introduced the idea of a web service, and specifically talked about geocoding web services. To dive a bit deeper into web servics, you may find useful these slides from a recent course on data collection for the humanities at Cambridge. A tutorial on generating geocoded coauthorship and institutional networks can be found here.

Today we're going to dive a bit deeper into producing structured datasets from texts themselves. When we used Named Entity Recognition, we were able to locate instances of particular kinds of entities -- e.g. people, places, and institutions -- in texts. The advantage of approaches like NER is that they require little input from the operator, and can therefore be used on a very large collection of texts without "supervision." One of the downsides of this approach, however, is that although we know that we have found instances of particular kinds of things, we do not know anything about those instances. NER can find names of people, but it can't tell us who those people are. Another downside is that we don't learn anything about the relationships between those entities: we may find two personal names in the same sentence or paragraph, but it is difficult to know precisely how they are related. Also, unsupervised techniques leave no room for differences of interpretation between readers. And there are other issues.

To deal with some of those issues, we've been advocating a 'meso-level' approach that brings human readers back into the mix. This evening, we'll introduce some of the main concepts and components of this approach. You may find it helpful to read this paper, which we presented at a conference last year.

Tutorial: Generating coauthorship networks from bibliographic data

Erick Peirson's picture

A tutorial based on yesterday's meeting can be found in the Tethne documentation: Coauthorship Networks.

For instructions on how to install Tethne, see: Installation.

For a review of how to collect bibliographic data from the Web of Science, see Getting Bibliographic Data.

If you run into trouble with Tethne, don't despair: report issues here: https://github.com/diging/tethne/issues?state=open (be sure to include as much information as you can: e.g. what were you trying to do? what did you click last? was there an error message?)

Week 7: Networks from Bibliographic Data

Erick Peirson's picture

This week we will dive a bit deeper into network analysis, focusing on what we can learn from modeling bibliographic data as networks. Rather than assigning readings for this week, we've listed a bunch of literature related to some of the techniques and concepts we'll explore this week. Feel free to explore that literature, and let us know if you'd like further suggestions for reading.

Pages

Subscribe to Introduction to Digital & Computational Methods in the Humanities (HPS) RSS