Large Datasets and You: A Field Guide

Publication information:

Blackwell, Matthew, and Maya Sen. 2012. “Large Datasets and You: A Field Guide”. The Political Methodologist 20 (1): 2-5.

Abstract

The last five years have seen an explosion in the amount of data available to social scientists. Although a blessing, these extremely large sources of data can cause problems for political scientists working with standard statistical software programs, which are poorly suited to analyzing big data sets. In this essay, we describe a few approaches to handling extremely large datasets within the R programming language, both at the command line prior to R and after we fire up R. We show that handling large datasets is about either (1) choosing tools that can shrink the problem or (2) fine-tuning R to handle massive data files.