This page explains what metadata and controlled vocabularies are and why it's important to use them. They provide a consistent way to describe data - location, time, place name, subject, but more importantly, improve data discovery. It makes data more shareable with researchers in the same discipline because everyone is ‘talking the same language’ when searching for specific data e.g. gait analysis, medical conditions, places etc.
Why is good data description necessary? To enable others to understand and be able to use your data. This can include other researchers you collaborate with as part of a research project or when you share your research data on a data repository.
Data description (known as metadata) is essential for finding and reusing research data. Data is only as valuable as the metadata which describes and connects it. So it's worth selecting an appropriate metadata standard or schema, and whenever possible you should also use a controlled vocabulary. A controlled vocabulary provides a consistent way to describe data.
Metadata is structured information about a resource that describes characteristics such as content, quality, format, location and contact information. Creating metadata to describe research data is very similar to the process for descriptive cataloguing of library resources.
Metadata schema are sets of metadata elements (or fields) for describing a particular type of information resource. Numerous metadata schema exist for describing research data across different disciplines.
1. Read the following short ANDS introduction to Metadata to understand what metadata is and why is it the lifeblood of research data sharing!
2. Let’s have a look at the metadata contained in a PubMed record. Do you think the following record would be considered as having ‘good quality’ metadata? Hint: consider both the type and quality of information provided. What metadata is included in this record to help discovery and reuse of the data?
3. Explore the UK Digital Curation Center’s Directory of Disciplinary Metadata. You might find a schema that is applicable to your research!
Consider : Why, if metadata is the lifeblood of data discoverability and reuse, is it often neglected or not richly done when data is published?
If you have time: Sadly, it’s not hard to find examples of low quality metadata describing research data. Read this short 2 page article Avoiding Data Dumpsters - Toward Equitable and Useful Data Sharing on the power of good quality, schema-compliant metadata ( N Engl J Med. 2016 May 11. [Epub ahead of print])
In addition to selecting a metadata standard or schema, whenever possible you should also use a controlled vocabulary. A controlled vocabulary provides a consistent way to describe data - location, time, place name, subject.
But what is a controlled vocabulary?
A vocabulary sets out the common language a discipline has agreed to use to refer to concepts of interest in that discipline. Read more about this on the ANDS Introduction to Vocabularies and Research Data
The benefit of using controlled vocabularies is they significantly improve data discovery. It makes data more shareable with researchers in the same discipline because everyone is ‘talking the same language’ when searching for specific data eg plants, animals, medical conditions, places etc.
Activity 1. Start by browsing Controlling your Language: a Directory of Metadata Vocabularies from JISC in the UK. Make sure you scroll down to section 5. Conclusion - it's worth a read.
Activity 2. Go to the Bartoc - Basel Register of Thesauri, Ontologies and Classifications and search for sports.
Are there any vocabularies applicable to sports researchers in your discipline area?
How do you think we could encourage people to use controlled vocabularies in their data descriptions?