Glossary


Controlled vocabulary A defined set of words used to describe specified objects and/or events. Often this vocabulary will be expressed in a defined form (for example, XML; see below).
Data model The product of a design process that aims to identify and organize the data into logical units, with specified relationships, for use in a database.
Database One or more large structured sets of persistent data that are usually associated with software to update and query the data.
Data Warehouse A database constructed to support efficient querying of the data it contains, possibly constructed from many primary data sources.
File format The structure of a computer document that specifies how the information is organized.
Free text Text-based data expressed in a natural language (for example, English or Russian). The meaning of the text is not accessible to a computer, although it can find exact matches to words and phrases.
Metadata The "data about the data". Descriptive and contextual information about the acquired data.
Ontology A controlled vocabulary that defines terms and their relationships for a specific knowledge domain. Although ontology and data model can be used interchangeably, ontology emphasizes the vocabulary used for the entities and their relationships, whereas a data model emphasizes its structure.
Protocol A set of rules governing data exchange between computers.
Training set A defined set of data, usually used as input for a machine learning algorithm, for the purpose of calculating a set of decision rules that enable automated recognition of different types of structures or events (a set of data points selected to fit model parameters).
XML Extensible Markup language (XML; http://www.xml.org) is a form of structured text language or "markup language" designed for specifying and describing complex sets of data. XML is readable by both humans and software, making it an ideal tool for specifying data models and ontologies. Similar in syntax to HTML, XML is a language for semantic markup, whereas HTML is a language for display markup.
Web service A computational resource that can be accessed via the world wide web, typically using a web browser or other local clients.

from J. R. Swedlow, S. E. Lewis, I. G. Goldberg, "Modelling Data Across Labs, Genomes, Space, and Time," Nature Cell Biology 8, 1190-1194 (2006).