Harvard Referencing: Data

Introduction to datasets

A dataset (or data file) is the published findings/observations from a research project. It may be raw or analysed and can come in many forms:

  • questionnaires
  • recordings
  • transcripts
  • field notes
  • biological material
  • test results
  • laboratory notes
  • digital files
  • photos
  • log files
  • other primary or secondary data and materials

Data can be published on a repository, website or as supplementary material along with a publication (such as a journal article). Not all research data is electronic, data can also be physical. However, in order for you to cite a data file or dataset, it must at least be discoverable through either a DOI or a URL.

Many repositories show how you should cite the data. This can be useful to refer to but you should always follow Harvard example given in this guide.

Data terminology

Data file -  Otherwise known as a data item, this is a singular record of data such as an excel sheet or log book. 

Dataset - Most commonly data is published as a dataset. That is, a collection of related sets of data. Keep in mind that some datasets are dynamic and change over the course of time. Always try to cite the specific version of the dataset that you used.

DOI - stands for Digital Object Identifier and is the alpha numeric value (or digital identifier) specifically assigned to a data file, dataset or associated output (grey literature, workflow etc). DOIs are persistent identifiers (meaning they will always point to the location of the data/dataset) so should always be used in your citation if available.

Basic format to reference published data

The basics of a Reference List entry for a dataset or data file:

  • Author or authors. The surname is followed by first initials.
  • Year.
  • Title, in italics.
  • Description (electronic dataset or data file).
  • Publisher Name (i.e. database, repository).
  • DOI, or
  • Date viewed and URL <in angled brackets>.

Include author/s name for dataset references where possible (an author may be a corporate body or organisation responsible for creating, producing or publishing a webpage or website).

  • Where there is no identifiable author or authoring body, use the title of the dataset.

Raw data from a dataset should not be quoted directly but should instead be summarised.

Referencing a dataset: Examples

Material Type In-text example Reference List example

Dataset (DOI)                                                            

(Woods et al. 2018)                                                                               

Woods, C, Fernee, C, Browne, M, Zakrzewski, S & Dickinson, A 2018, The potential of statistical shape modelling for geometric morphometric analysis of human teeth in archaeological research, electronic dataset, University of Southampton Institutional Repository, doi:10.5258/SOTON/404043.

Dataset (URL) (Fluker 2017) Fluker, M 2017, Fluker posts: community based environmental monitoring, electronic dataset, Fluker Post Research Project, viewed 12 June 2018, <http://www.flukerpost.com>.

Handling data with integrity to avoid plagiarism

How to handle your data:

  • Plan, Collect, Store, Use/manage
  • Analyse/visualise, Interpret results, share

Types of Plagiarism of data

  • falsification of data
  • copying of other’s data
  • paraphrasing without acknowledgement
  • not crediting authorship
  • accidental plagiarism