Data Mining and Informatics Team

The Data Mining and Informatics Team (Data Team) at the Laubichler Lab builds unique and innovative data systems that capture in unprecedented detail the processes that drive important scientific innovation. Combining expertise in data wrangling, network science, and advanced statistical modeling, we push at the interdisciplinary boundaries of the life sciences, medicine, clinical research, data science, and digital humanities.

In the past year, the Data Team has collected, cleaned, and wrangled with over 80 gigabytes of data. This data represents a large diachronic cross section of multiple institutional, social, and knowledge domains. For instance, in order to understand the emergence of the microbiome concept, we have gathered the full text and accompanying metadata of every scientific paper ever published containing the word "microbiome" and every US funded microbiome research project. Also, to better understand how scientific fields are influenced by language and social systems, we analyzed and mapped how the content of evolution journals has changed in over 50 different journals from 1900 to 2015. These complete and carefully curated datasets allow us to approach questions about scientific innovation that have never before been answerable: How do scientific innovations spread from obscure corners of science into the mainstream? What hidden (or not so hidden) variables influence the likelihood of funding for innovative science?

In addition to data collection and cleaning, the Data Team has been honing data analysis and communication skills through Data Competitions. Students get hands-on experience, feedback, and mentoring by analyzing real-world datasets and presenting their findings to other staff and students.

Contact Kenneth Aiello