Using "Gene Knock-Out" Techniques to Test Cultural Evolution

Project Leader: Na Zhang

The recent booming in online social media, data source and archives open doors for studying evolution of culture and language from a technical perspective. One example, Google Books Ngram Viewer has provided a collection of 500 billion words appeared in published books in six languages between 1500 and 2008 and is now a fascinating tool to researchers to explore a wide array of topics like historic epidemiology and changes in collective memory (Michel 2011). Digitalized records of human culture can be an amazing source for quantitative researchers and numerous projects can be proposed. Yet we found that up to now most of the works are either specified on a particular historic phenomenon or focused on dynamics of the frequency of certain words or phrases. The first approach is loose on generality as the more time it has passed by, the less significant a historic event is to contemporary people. In the second approach, we also find it has failed to grasp the core part that makes culture system interesting to so many: the system of human language and culture is a complicated network made up of mutual independent components, or “clusters” of concepts, ideas, and phenomenon organized by the collective and accumulated work of highly developed neural network systems—the human brain. As most of the current quantitative approaches are not focusing on the hierarchy and interdependent relationship between the components, here we would like to propose a framework through which the evolution of culture can be investigated from a systematic perspective.

The question we would like to address is: could we establish a framework under which we could locate the “functional units” of a given collection of historic literature over a certain period of time? Here functional unit of culture and language can be thought as analogue to functional units of genes in a genome. Just as regulatory gene controls downstream expression of gene batteries, a systematical view of cultural and language system assumes the existence of a core group of “topics” that significantly influence the patterns of the vocabulary of interest.