Back in 2014 the city of New York put online a dataset with yellow cab rides comprising a full year of data. Back then I remember struggling quite a bit with managing the sheer volume of the dataset involved, trying out various alternatives for reading in the full dataset.