Test description for AI capabilities - Richard Seidl

Written by Richard Seidl | 02/18/2025

The regulation of artificial intelligence (AI) requires clear norms and standards to ensure safety and performance. The standardization of high-risk systems, in which certification bodies play a central role, is particularly challenging. Other important aspects are the documentation requirements and cyber security measures for high-performance AI systems. In view of the ambitious timetables up to 2025, the development and implementation of these standards is becoming increasingly important. A look into the future shows what further challenges and opportunities can be expected in the regulation of AI.

Podcast episode on testing AI capabilities

In this episode of the I talk to Taras Holoyad from the Federal Network Agency about the regulation of Artificial Intelligence (AI). Taras explains how norms and standards are developed for AI systems to ensure their safety and performance. He emphasizes the challenges of standardization, especially for high-risk systems, and the role of certification bodies. The discussion also covers documentation requirements and cybersecurity measures for high-performance AI systems. We also explore the urgency of developing and implementing these standards by 2025 and take a look into the future.

“In principle, artificial intelligence is unfortunately not comparable to the intelligence level of a human being, but rather a very elaborately designed algorithmic system.” - Taras Holoyad

After studying electrical engineering at the TU Braunschweig, Taras Holoyad first calculated electrical machines for road vehicles and then joined the standardization of artificial intelligence at the Federal Network Agency. During his day job, Taras works on AI regulation strategies, is project leader for the international standard for AI classification “ISO/IEC 42102” and Vice Chair at the European standardization body ETSI TC “Methods for testing and specification”. One of the goals of his teams is to achieve a uniform understanding of systems under the term “artificial intelligence” and to explain the associated quality criteria and testing processes. Taras is also working with colleagues from research, industry and the public sector to develop a package insert for AI systems, a label for AI products and a glossary on artificial intelligence so that AI systems can also be evaluated at a low threshold within our society.

Highlights of the Episode

Regulation of artificial intelligence (AI)
Standardization of AI systems
Challenges in the standardization of AI
Development of quality criteria for AI methods
Impact of standards on companies that develop AI

Test criteria for AI capabilities

Artificial intelligence (AI) plays a crucial role in software development. It makes it possible to carry out complex data analyses, recognize patterns and make automated decisions. The integration of AI in software applications not only improves efficiency, but also the user experience.

In the podcast with Taras Holoyad, the importance of the test description for AI skills is discussed in detail. The focus is on the following points:

The fundamental challenges of creating test descriptions for AI systems
The need for a standardized approach to ensure the quality and safety of AI applications
Insights into the current regulatory framework, in particular through the role of the Federal Network Agency

Taras Holoyad, an expert in the field of AI regulation, shares valuable information on how companies can test and describe their AI capabilities. This is particularly important in light of new regulatory requirements and standards in the field of artificial intelligence.

The role of the Federal Network Agency in AI regulation

The Federal Network Agency plays a crucial role in AI regulation in Germany. Its tasks cover various areas that are important for ensuring standards and market surveillance:

Tasks of the Federal Network Agency in the field of AI

Responsible for the regulation of telecommunications, postal services, railroads, electricity and gas.
Development of standards for the use of AI systems, particularly with regard to high-risk systems.

Market surveillance of high-risk systems

Monitoring the conformity and safety of products available on the market.
Collaboration with various market participants to evaluate and review risks.

Cooperation with certification bodies and independent experts

Cooperation with independent experts to create reliable certifications.
Supporting certification bodies in complying with the required standards.

Through these measures, the Federal Network Agency ensures that AI applications are safe and reliable. This promotes fundamental trust in technologies that are increasingly influencing our everyday lives.

Standardization of test criteria for AI systems according to ISO/EEC 42102

The standardization of test criteria for AI systems is a decisive step towards ensuring quality and reliability in software development. The international standard ISO/EEC 42102 plays a central role here. This standard defines two dimensions that are relevant for the test description of AI capabilities: Methods and capabilities.

Dimension 1: Methods

In the first dimension, both classic and modern approaches are considered in the AI test description.

Classic AI approaches

Symbolic AI: This method is based on logical rules and knowledge representations. It enables the understanding of complex relationships through formal logic.
Optimization techniques: These techniques aim to maximize the performance of a model by adjusting parameters and observing evaluation metrics.

Modern AI approaches

Machine learning (ML): These are algorithms that can learn from data and recognize patterns. These include supervised, unsupervised and reinforcement learning methods.
Hybrid processes: A combination of symbolic AI and machine learning that allows the benefits of both approaches to be utilized.

These different methods require specific strategies for carrying out and documenting tests. Choosing the right methodology has a significant impact on the effectiveness of the test.

Relevance of optimization processes

Optimization methods are particularly important for tests, as they help to find the best parameters for algorithms. They help to ensure that models not only work correctly, but are also efficient.

An example of an optimization method is gradient descent, which is used in many machine learning algorithms. This method minimizes the error of a prediction by gradually adjusting the weights of a model.

The application of these methods in a standardized format ensures repeatability and consistency in the test results. In this context, the importance of a structured test description is also emphasized. With clearly defined test criteria, testers can ensure that all aspects of an AI capability are adequately evaluated.

The standardization of test criteria according to ISO/EEC 42102 thus not only promotes comparability between different test scenarios, but also creates a basis for the responsible development of AI systems in accordance with regulatory requirements.

Dimension 2: Skills - perception, knowledge processing and action skills in the test description for AI systems

The capabilities of AI systems are diverse and can be divided into different categories. These capabilities include:

Perception: The ability of AI to capture data from the environment, for example by analyzing images or sounds. Classic AI systems often use simple algorithms for pattern recognition, while modern approaches use more complex neural networks.
Knowledge processing: This is the ability to store, retrieve and process information. Symbolic AI uses logical rules to represent knowledge, while hybrid processes implement a combination of classical and modern methods.
Action capabilities: This refers to the ability of AI to make decisions and execute actions based on learned knowledge. Robots and autonomous systems are examples of applications of this capability.

The international standard ISO/EEC 42102 plays a crucial role in the standardization of test criteria for these capabilities. The need for standardization is particularly important to develop consistent and traceable test descriptions for AI capabilities. This allows companies to ensure that their AI systems meet the required quality standards. Algorithms recognize content from images using techniques such as image classification or object recognition and generate new data based on the learned patterns.

Quality criteria and measurable metrics for the evaluation of AI systems in software testing

Quality assurance is a central aspect in the development of AI systems. Specific quality criteria play a decisive role in the evaluation of these systems. The most important methods in AI testing are

Supervised learning: This involves training models with labeled data. Quality assurance measures focus on the accuracy of the predictions.
Unsupervised learning: In this case, the system recognizes patterns in unlabeled data. The robustness of the model fit is crucial here.
Reinforcement learning: AI agents learn by interacting with their environment. Information security becomes an important criterion to ensure that agents do not make harmful decisions.

An example of a specific metric is the Confidence Score, which evaluates the reliability of predictions in image processing systems. A high confidence score indicates that the model has a high probability of making correct decisions. To measure the different quality criteria, the following aspects should be considered:

Correctness: How accurate are the results?
Robustness: How resilient is the system to interference or unexpected data?
Information security: How does the system protect against unauthorized access or tampering?

These criteria form the basis for an effective test description and help to evaluate and optimize the performance of AI systems.

Challenges in implementing the AI Act in relation to high-risk systems and the role of certification bodies in software testing of AI capabilities

The AI Act provides a significant regulatory framework for organizations that develop or deploy artificial intelligence. The implications are profound, particularly for organizations implementing high-risk systems. These systems are classified as security-relevant and require strict specifications to ensure security and transparency.

Requirements for high-risk systems

Security components: Systems that act as security components are subject to special requirements.
Documentation requirements: Companies must create comprehensive documentation to provide evidence of compliance with the standards.

Role of certification bodies

Certification bodies play a crucial role in the implementation of the AI Act. They are responsible for:

testing the conformity of high-risk systems with the specified requirements.
the issuing of CE markings that enable market access.

These bodies must be evaluated by independent experts to ensure that they have the necessary expertise. The pressure on these institutions is high, as not only are timely audits required, but they also need to adapt quickly to rapidly changing technologies.

Future outlook on the development and standardization of test methods for artificial intelligence in software testing

The future of AI testing methods will be heavily influenced by advances in regulation and standardization. The importance of standardization cannot be overstated, especially when it comes to developing test descriptions for AI capabilities. Future developments could include the following aspects:

Extension of standards: International standards such as ISO/EEC 42102 are being further refined to address more specific requirements for high-risk systems.
Integration of quality criteria: The focus on quality and safety in AI systems will increase. Metrics such as the confidence score for vision systems will be standardized to enable objective evaluation.
Collaboration between stakeholders: Close collaboration between regulators, certification bodies and companies will be required to find workable solutions.

The continuous development of artificial intelligence testing methods will be crucial to ensure that software developments meet new regulatory requirements. In the future, a dynamic environment could emerge in which adjustments to standards can be implemented quickly in order to keep pace with technological advances.

View full post