What to do with vast amounts of information

One of the things I have been thinking about as I prepare to start up my project is how to best organise the vast amounts of data I will need to use. Some of this data will come “ready-made” – already packaged and organised by various commercial and public vendors of data. Here I am thinking about company accounts and various other financial and corporate data. Some of this information will have to be gleaned from old and contemporary newspapers and magazines, a great deal of which won’t even be digitised (or in English). Some of this stuff will have to come from old-school grant in the archives (and since I happily and unabashedly fetishise archives, this will be FUN). Some will come from the vast online ocean of data regularly emitted by the agencies of the US Government. And then, there will be ethnographic materials, and long open-ended, unstructured interviews, fieldwork, and site visits…

I am not yet sure how I am going to organise all of this. I assume I will need to create a relational database of a sort. But what software to use here? And once it is constructed, how do I make sure I am not swallowed up by the data – either by its sheer volume or by the strange and perverse attraction of numbers?

The blog, Sapping Attention, has long been a home for thinking through the problems of digital humanities. The reason I really like this blog – and its ruminations – is threefold. First, it has to do with whaling ships. As you have probably figured out, I am now obsessing a bit with ships, and given my slightly hysterical and quasi-religious love of Moby Dick, a complete fascination (wholly non-instrumental) in whaling ships. And Sapping attention deals with whaling data! Woohoo! Second, the author is at once very technically proficient and sufficiently confident enough not to have the slight inferiority complex so many social scientists seem to have towards their more mathematically rigour-minded colleagues in the natural sciences. Third, the author is a historian who thinks about sources and the making of sources; he writes

We need to rejuvenate three traditional practices: first, a source criticism that explains what’s in the data; second, a hermeneutics that lets us read data into a meaningful form; and third, situated argumentation that ties the data in to live questions in their field.

As a first step, how can you not love a person in digital humanities whose admonishment faintly echoes this wonderful passage by Michel-Rolph Trouillot in Silencing the Past:

Silences enter the process of historical production at four crucial moments: the moment of fact creation (the making of sources); the moment of fact assembly (the making of archives); the moment of fact retrieval (the making of narratives); and the moment of retrospective significance (the making of history in the first instance) (p. 26).