AI testing and certification - Richard Seidl

Written by Richard Seidl | 03/18/2025

Artificial intelligence (AI) is becoming increasingly important in our technology landscape. For AI to deliver good and reliable results, regular appraisals of the systems are required. The AI assessment matrix is a systematic concept for evaluating AI systems. Both safety and innovation criteria play a central role here. The testing of AI systems poses a number of challenges and is still dependent on human expertise. In addition, social responsibility and ethical aspects such as fairness and data privacy are of great importance. These topics form the basis for a well-founded examination of the requirements and standards in AI development.

Podcast Episode: AI testing and certification

In this episode, I talk to Christoph Poetsch from TÜV AI.Lab about the testing and certification of artificial intelligence (AI). Christoph introduces the AI Assessment Matrix, a systematic concept for evaluating AI systems. We discuss the importance of safety and innovation criteria, the challenges of AI testing and the need for human expertise in the testing process. Together, we will also address the social responsibility and ethical aspects of AI.

"Our credo is to keep the innovation, but on the other hand to protect against the negative consequences of the technologies." - Christoph Poetsch

Dr. Christoph Poetsch is Senior Advisor AI Ethics and Quality at TÜV AI.Lab, an independent joint venture of key players in the global testing and certification industry. TÜV AI.Lab paves the way for trustworthy AI by developing compliance criteria and testing procedures for AI systems. Dr. Christoph Poetsch's core tasks at the AI.Lab are in the areas of AI quality, AI ethics and systematic groundwork. He holds a doctorate in philosophy from the University of Heidelberg and has completed research and visiting fellowships at Yale-NUS College Singapore, the University of Notre Dame and the École Pratique des Hautes Études in Paris.

Highlights der Episode

Testing and certification of artificial intelligence (AI)
Structured approach to the evaluation of AI systems
Important criteria for the testing of AI systems
Inclusion of human competencies in the testing process
Ethical issues such as fairness and non-discrimination

How can AI be tested? Insights into AI testing and certification

Why do we need an appraisal of AI systems?

Artificial intelligence (AI) is increasingly being used in various areas of our lives. It is therefore becoming increasingly important to ensure that AI systems are reliable, safe and ethical. The testing and certification of AI systems is therefore becoming increasingly important.

Important aspects in AI testing:

Safety assurance: Testing AI systems helps to identify potential risks and vulnerabilities in AI systems to ensure a safe and user-friendly application. But AI itself can also help to make testing more efficient, for example by automating software tests with AI (see blog article on Test automation with AI or AI can help with Test design.
Ethical standards: AI must follow certain ethical guidelines and social values. Therefore, an appraisal for ethical standards is important.
Building trust: AI must be trustworthy. Recognized certifications create trust among users and stakeholders.

Testing AI systems is particularly important in light of the complex challenges associated with the use of AI. At a time when decisions are increasingly influenced by algorithms, it is important that these very systems are transparent and comprehensible. The importance of AI testing goes beyond technical appraisal. Testing AI also involves social and ethical dimensions that are important for the well-being of society.

Testing of and with AI

It has already become clear in the first section that testing AI has several dimensions. In principle, the appraisal of AI can be subdivided into testing of and with AI. In the appraisal of AI, we also have to deal with the question: Is there a fair AI that acts according to our values? After all, fairness and justice are crucial for the responsible use of AI systems.

The TÜV AI.Lab

Christoph Poetsch heads the TÜV AI.Lab as an institution that deals with the testing and certification of artificial intelligence. Under his leadership, the Lab has developed an AI assessment matrix that enables a systematic approach to AI testing and certification. This matrix is crucial to ensure that AI systems are not only innovative but also comply with applicable safety standards.

AI assessment matrix

The developed AI assessment matrix comprises two dimensions:

The test dimension (X-axis): Various forms of testing such as product testing and documentation reviews are considered here.
The test areas (Y-axis): This dimension focuses on relevant lifecycle areas of AI systems.

The TÜV AI.Lab acts as a mediator between companies and regulatory authorities and has set itself the goal of creating a framework that enables both technical progress and takes social values into account. The topic of safety and robustness of AI systems is particularly important when it comes to the responsible use of artificial intelligence.

Dimensions of the test matrix

If you want to appraise AI systems, you need a structured approach to ensure a comprehensive analysis and avoid potential negative consequences that can result from inadequate test techniques. To meet this need, TÜV AI.Lab has included the two dimensions test dimension and test areas in its matrix.

The test dimension: This axis includes various dimensions of testing. Various test forms are mapped here, such as

Product testing
Documentation review
Process and personnel testsThis dimension aims to evaluate the performance and safety of the AI system in various aspects.A detailed test description for AI capabilities can provide valuable insights here.

The test areas: This axis focuses on relevant lifecycle areas of AI. It enables a targeted consideration of the phases in which tests should be carried out in order to verify the robustness and effectiveness of the system. These include

Data sets and their availability
System architecture and design decisions
Robustness against unforeseen influences

By combining these two dimensions, a structured framework is created that enables a systematic approach to testing AI systems. The test dimensions on the X-axis interact with the test areas on the Y-axis, resulting in a multidimensional view. This approach helps to systematically record and evaluate all relevant criteria.

Testing of AI systems

AI systems are tested in three dimensions:

1. Own testing

This dimension comprises concrete tests that are carried out directly on the AI systems. The functionality of the system is appraised under realistic conditions. The aim is to ensure that the system works as expected and that no unexpected errors occur.

2. Documentation tests

This dimension focuses on the appraisal of the existing documentation. It analyzes whether the required information about the AI system is properly documented. Comprehensive documentation is crucial for the traceability and transparency of the tested systems.

3. Process and personnel audits

This dimension looks at the processes used in the development and operation of the AI system. This includes the qualifications of the personnel responsible for the management and maintenance of the systems of systems. Adherence to standards and best practices in these processes contributes to safety and robustness.

These three dimensions provide a structured framework for evaluating AI applications and ensure that both technical and organizational aspects are taken into account.

Risk management with the AI Risk Navigator

The AI Risk Navigator is a free risk classification tool that helps companies to identify and assess potential risks associated with AI systems. The AI Risk Navigator thus supports a systematic approach to risk management.

By using the AI Risk Navigator, companies can:

Precisely classify and prioritize risks
Develop a better understanding of regulatory requirements
Improve cooperation with regulatory authorities

The AI Risk Navigator is not only a risk assessment tool, but also a strategic asset for companies operating in a complex landscape of AI technologies.

Core elements of AI regulation

There are key core requirements of AI regulation that are important for the development and implementation of safe and efficient AI systems. Several aspects need to be considered in this context:

Performance: The efficiency and effectiveness of an AI system must be guaranteed to meet the requirements of the users.
Security: Protection against potential security risks is essential. This includes both physical and digital threats that could affect the system and its users.
Robustness: AI systems should be designed to function stably under a variety of conditions. Robustness refers to the ability of a system to cope with unpredictable influences and still deliver reliable results.
Cybersecurity: Given the increasing interconnectedness of technologies, protection against cyber-attacks is a basic requirement. Robust security protocols must be integrated to prevent data misuse and other security-related incidents.

These core requirements affect not only the AI system in question, but also other systems in its environment. The interactions between different AI applications and their impact on people and the environment must be continuously assessed. This creates a holistic approach to AI testing and certification that considers both technological and ethical dimensions.

Development of standards and certifications for AI systems

Global responsibility plays a central role in the increasing use of AI systems. Several key areas are considered in the certification of AI systems:

humanity: AI systems must meet ethical standards that promote the well-being of humanity and minimize negative impacts on society.
Natural ecosystems: The development and deployment of AI technologies should be environmentally friendly to conserve resources.
Supply chain responsibility: Companies are required to consider the entire supply chain. This concerns both the origin of the materials and the social conditions under which they are produced.

These factors are crucial forKI certification processes. After all, only a responsible approach can ensure that new technologies not only bring economic benefits, but also make a positive contribution to society and the environment. With an interdisciplinary exchange between different stakeholders, standards can be developed that integrate both technical and ethical aspects.

As AI systems are increasingly being integrated into various areas of life and have an impact there, it is important to take a holistic view of the development.

Technical challenges in the testing of AI systems

The testing of AI systems brings with it a variety of technical challenges. Central to this is robustness testing, which aims to ensure that systems function stably and reliably under different conditions.

Important points in this context are

System architecture: The appraisal of the architecture of an AI system, in which both the structure and the interactions between individual components are analyzed.
Robustness tests: These tests examine how well a system responds to unforeseen inputs. The aim is to identify weak points that could lead to malfunctions.
Data availability: Accessibility to training data sets is essential for conducting effective testing. Without high quality data, it becomes difficult to simulate real scenarios and identify potential problems.

Comprehensive testing requires not only technical skills, but also an understanding of ethical issues and the impact of AI systems on users and society. A structured approach to these tests is important to ensure the long-term safety and effectiveness of AI applications.

Conclusion: The future of appraisal and certification of AI systems

The appraisal and certification of AI systems will become even more relevant in the future. The following developments will have an impact in the future:

Technological progress: New technologies require continuous development of testing standards and certification processes. With regard to the ongoing development of AI, some adjustments will be necessary in the future.
Regulatory requirements: A growing awareness of ethical standards and safety aspects is leading to stricter regulatory requirements. Companies must be prepared for the fact that testing procedures for AI are becoming increasingly important.
Multidisciplinary approaches: The involvement of experts from different disciplines promotes a holistic view of AI systems. This diversity of perspectives will become even more important in the future.
Ethical considerations: Aspects such as fairness, transparency and accountability are central to AI systems. These issues will have a significant impact on future testing standards.

It is clear that the testing and certification of AI systems is not just a technical process, but also implies a social responsibility. The coming years will show how successful the integration of innovation, safety and ethical standards will be.

Häufige Fragen

What is the importance of appraisal and certification of AI?

AI needs to be tested to ensure the safety and performance of AI systems. The testing and certification of AI helps to validate the quality of the technologies and create trust among users.

Why should AI systems be systematically appraised?

A systematic testing approach is necessary to capture the complexity of AI systems. By introducing a two-dimensional AI assessment matrix, test dimensions and test areas can be clearly defined.

What is the AI assessment matrix and what does it contain?

The AI assessment matrix consists of two axes: the X-axis represents different test dimensions, while the Y-axis represents specific test areas that are important for a comprehensive assessment of AI systems.

What is the AI Risk Navigator and what is it for?

The AI Risk Navigator is a free risk classification tool that helps companies identify potential risks in AI and act as a mediator between companies and regulators.

What are the technical challenges of testing AI systems?

One of the biggest technical challenges is robustness testing. This requires a thorough analysis of the system architecture as well as extensive testing to ensure that the AI system functions reliably under different conditions.

View full post