Home | Thought Leadership

AI and data – a taxonomy

By Tom Whittaker

29 Oct 2024 2 min read

The Open Data Institute - a non-profit seeking to promote trust in data - has produced a taxonomy of the data involved in developing, using and monitoring foundation AI models and systems (here). The report is described as being “a response to the way that the data used to train models is often described as if a static, singular blob, and to demonstrate the many types of data needed to build, use and monitor AI systems safely and effectively.”

It covers terms such as:

for developing AI sytems - existing data; training data; reference data; fine-tuning data; testing and validation data; benchmarks; synthetic data
for deploying AI systems - model weights; local data; prompts; model outputs;
for monitoring AI systems - data about models; data about model usage and performance; registers of models.

Whilst the taxonomy is focussed on foundation AI models and systems, the researchers suspect much of it will apply to smaller foundation models, too.

The taxonomy is a useful addition to a growing body of work seeking to improve discussion about AI systems, such as NIST's terminology of adversarial machine learning attacks and mitigations, as well as definitions (and their explanations) contained in proposed and enacted legislation (see our blog for our latest glossary on AI terms as used in proposed and enacted laws and regulations).

If you would like to discuss how current or future regulations impact what you do with AI, please contact Tom Whittaker, Brian Wong, Lucy Pegler, David Varney, or Martin Cook.

For the latest on AI law and regulation, see our blog and newsletter.

Featured

Six Pillars of ESG

Featured

Getting to Net Zero

Featured

Responsible business

Featured

Latest vacancies

AI and data – a taxonomy

Related sectors

Featured

Six Pillars of ESG

Featured

Getting to Net Zero

About us

Featured

Responsible business

Join us

Featured

Latest vacancies

AI and data – a taxonomy

Related sectors