AI and data – a taxonomy

This website will offer limited functionality in this browser. We only support the recent versions of major browsers like Chrome, Firefox, Safari, and Edge.
The Open Data Institute - a non-profit seeking to promote trust in data - has produced a taxonomy of the data involved in developing, using and monitoring foundation AI models and systems (here). The report is described as being “a response to the way that the data used to train models is often described as if a static, singular blob, and to demonstrate the many types of data needed to build, use and monitor AI systems safely and effectively.”
It covers terms such as:
Whilst the taxonomy is focussed on foundation AI models and systems, the researchers suspect much of it will apply to smaller foundation models, too.
The taxonomy is a useful addition to a growing body of work seeking to improve discussion about AI systems, such as NIST's terminology of adversarial machine learning attacks and mitigations, as well as definitions (and their explanations) contained in proposed and enacted legislation (see our blog for our latest glossary on AI terms as used in proposed and enacted laws and regulations).
If you would like to discuss how current or future regulations impact what you do with AI, please contact Tom Whittaker, Brian Wong, Lucy Pegler, David Varney, or Martin Cook.
For the latest on AI law and regulation, see our blog and newsletter.