AI has a lot to offer us. Our imagination is required: Where do we use it? What should it do? How should it work? Regardless of the area of application, the quality requirements are high. This is where conventional test methods no longer get us very far. Results are sometimes not reproducible and therefore unpredictable, so it is important to define quality differently and then approach testing.
“What we had completely overlooked was that there was a timestamp from the camera on all the photos. And then the AI learned the timestamp - and nothing else” - Nils Röttger, Gerhard Runze
Nils Röttger has more than 15 years of experience in the field of quality assurance. He was already involved in software testing during his studies at the University of Göttingen. He has been working at imbus AG in Möhrendorf since 2008, currently as a senior consultant and project manager for mobile testing and AI testing. In his many presentations at conferences and as an author of books and specialist articles, he is constantly dealing with current topics relating to testing.
Dr. Gerhard Runze holds a doctorate in electrical engineering from the Friedrich-Alexander University of Erlangen-Nuremberg and worked in the telecommunications industry from 1999 to 2015 in various roles, including as a developer and test team leader. Since 2015, he has been working at imbus AG as a senior consultant for software quality, specializing in embedded software, agile testing and AI, and is also active as a trainer for ISTQB® training courses. He is co-author of the “German Standardization Roadmap AI”, has contributed to the ISTQB® Certified Tester AI Testing curriculum and will publish a companion book on this topic in 2023.
Highlights of this episode:
Further links:
Today we are talking about quality assurance of AI systems, the challenges and methods. Gerhard Runze and Nils Röttger share their personal experiences and give an insight into the complexity of AI tests. One thing is certain: testers need to change their mindset here; conventional testing methods won’t get very far.
Today I’m talking to Nils Röttger and Gerhard Runze about the exciting topic of quality assurance for artificial intelligence. Both are not only experts in their field, but also authors of the book ‘Basiswissen KI testen’. Their thoughts on quality in AI and how to test it effectively open up new perspectives and show that testing AI is far more than just a technical challenge.
The conversation began with a fundamental question: What does quality mean in an AI? Nils and Gerhard brought up the fact that aspects such as autonomy and ethical considerations must now be taken into account in addition to the classic quality features such as functionality and performance. They paid particular attention to the fact that the functionality of an AI is often a statistical variable - a paradigm shift for many testers.
One of the biggest challenges in AI testing is the reproducibility of test results. As Nils explains, when training a neural network, decisions are often made randomly, which can make results difficult to reproduce. Nevertheless, it is crucial to keep the basic principle of a test reproducible. This realization underlines the importance of a new way of thinking for testers when dealing with AI.
Gerhard shared a particularly striking example with us: in a heating control project, an AI mistakenly learned the timestamp visible in photos instead of the desired settings - a classic example of ‘shit in, shit out’. This aha moment clearly showed how essential a deep understanding of the data and the learning process of an AI is for successful tests.
The conversation took a turn towards specific methods for testing AIs. From metamorphic testing to pairwise testing and the use of A/B testing, various approaches were discussed. These methods allow testers to approach the unique challenges of AI testing and offer an exciting outlook on the future of software quality assurance.
Nils and Gerhard pointed out the importance of standardization in the field of AI testing. Projects such as the DIN standardization roadmap show the need for clear guidelines and ensure that quality assurance does not become a game of chance in the world of artificial intelligence.