This is a follow up blog post for Digital Data in Environmental Archaeology 1: Preservation.
This is a short account of the reasons why I think that environmental archaeology data should be stored and disseminated as open data (i.e. data that is freely available in accessible formats, usually digital, under licences that allow it to be re-used). I’ve provided an outline of some of the methods that I have used below.
Research is a process that builds on the results of the past, and in the case of environmental archaeology it can often be a useful process to incorporate results from many different sites into one larger dataset, and to analyse this to see if new patterns and insights emerge. To move the study of environmental archaeology forward, I think it is important to ensure that results are stored and disseminated in a way that allows other researchers to re-use data.
Making data accessibleIf a researcher wants to re-use archaeobotanical data from one of my reports, no doubt they could re-type all the information that is available in printed formats or in PDFs. But it would be much better if the data was made available digitally. Much of the raw data in environmental archaeology (certainly in archaeobotany) is prepared in spreadsheets and I have spreadsheets that date back to 1998. How long will I be able to access these using more modern software packages? And is it realistic to expect me to convert and update the files each time there is a new iteration of spreadsheet software?
Fortunately many software packages have some built in backwards compatibility. The best way to ensure that the data in my spreadsheets (and in databases) is readable into the future is actually to convert it into a very old format, a .csv file. Comma Separated Value files (.csv) provide a very simple means of structuring data. CSV is a de facto standard for saving tabular data and it supported by a huge number of applications. This means that if you save your tabular data as a .csv file, most programmes will be able to access the data (and the more accessible your data, the more likely it is to be preserved into the future).
For more details on .csv formats, see http://data.okfn.org/doc/csv
How to convert your spreadsheet to a .csv fileThe easiest way to save your data in. csv format is to open your preferred spreadsheet application, click on “Save as” and scroll down the list of options until you find .csv. This file should contain all your basic data, organised simply and clearly (leave pie-charts out). It should be kept as the preservation copy of your data.
N.B. Preserving text files is different. Save your report as a .pdf, as this is a relatively stable and supported format. For added accessibility it is a good idea to save text as .txt files (go to “Save as” and select the .txt option). This will preserve the text but won’t preserve any added graphs and images, and it won’t preserve formatting.
Licensing your data so that it is available for re-useOpen data is distributed so that it can be re-used. This usually means publishing your data under an open licence, such as one of the Creative Commons licences. These are licences that provide an extension to copyright, allowing you to give permission in advance for people to re-use your material, and allowing you to stipulate the conditions under which this re-use can take place. Creative Commons offer several different ways for you to share your material, from a completely open licence (CC-0) to more restrictive licences that stipulate that the material must be cited as your original content (CC-By).
For more details about Creative Commons licences, see http://creativecommons.org/licenses/.