4 min read

Software Testing of and with AI

Artificial intelligence (AI) and software testing are two important topics in today’s software and system development. Using them together or on top of each other offers the opportunity for enormous synergies.

Although artificial intelligence has been the subject of research for decades, it has been on a very media-effective triumphal march in recent years. All tasks seem solvable, all human intelligence unnecessary, the possible consequences controllable. There are many convincing demonstrations. One example is AI-controlled computer players who beat the world’s best GO player Lee Sedol in the form of Alpha GO. Incidentally, GO is much more complex than chess and the game sequences are therefore much more difficult to predict. Other AI applications can accurately recognize the content of images. This opens up a wide range of applications, from the early diagnosis of dangerous diseases to the monitoring of public spaces. But is it that simple?

Hypes

Of course, the results presented seem very convincing. But is the path that the AI has chosen to achieve the result always really intuitive? Studies have shown, for example, that some images were not classified as horse images based on the actual horses depicted. Instead, they were classified based on the piece of forest also present in many horse pictures. Others based on the signature of the photographer (who often takes pictures of horses). In this way, the miracle of AI was disenchanted by some prematurely lauded examples. Memories of the horse “Kluger Hans” came to mind, who could only count for appearances.

Failures

In addition, failures such as the accident involving an UBER vehicle were also exploited by the media, so that autonomous vehicles were soon considered a danger. It is then easy to overlook the fact that in Brandenburg alone, an average of 2 to 3 people are killed in road accidents every week. A less than perfect AI could certainly offer advantages here. But there are other issues behind this. Accordingly, this technology is sometimes exaggeratedly praised and sometimes condemned before the connections are clear.

So I see an exaggeration here, both for better and for worse. Despite all the hype, AI has a lot of potential, even in safety-critical applications. The prerequisite for this is, of course, that this technology can be well secured.

A number of questions can be asked in this context. In the following, I will touch on a few of them on various sub-topics and thus provide an introduction. As I said, science has been dealing with more in-depth questions on this topic for decades.

Evaluation of the AI

First of all, statistics play a major role here and are used for the internal evaluation of situations, images, etc. The confusion matrix can be used to evaluate prediction and reality for binary classifiers: What is predicted correctly? Where and how is the AI wrong?

There are various means of evaluating these results. For example, the harmonic mean of accuracy and sensitivity, also known as the F1 score. In any case, it is clear that the significance attributed to the result varies depending on the domain. For example, the incorrect diagnosis of the existence of a tumor (which does not actually exist) is not so bad. However, the non-detection of a (actually existing) tumor is very important for the patient’s life expectancy.

Test-Know-how

In addition, the experienced tester naturally has another question: Which of the quality assurance tools he has known for many years can be used here?

  • Do white-box test procedures even make sense or are they more like the still controversial tests to determine human intelligence?
  • Does it make sense to divide the test into different test levels, as we know it from the V-model? For complex systems that hide one or more AI-based algorithms inside, this makes perfect sense. Does it also make sense for machine learning with a large number of intermediate layers? This leads in the direction of explainability of the implementation.
  • What do we actually pay attention to during the test? Is it just a matter of the algorithm producing better results than its predecessor or do we subdivide things more precisely? Functional tests and non-functional tests? What about IT security? Even minimal changes to the design of traffic signs can have an impact: if an autonomous vehicle interprets the “30” on a km/h sign as “80” and wants to drive through the city at a corresponding speed. The effects of inconsistent situations such as the stop sign on the highway can be just as disastrous.
  • The question also arises as to when the self-learning system is actually allowed to learn? Permanently in use? If so, a commuter’s self-driving vehicle could very soon be trained for the peculiarities of the daily route. The rest will be “forgotten”. Or should the AI only be allowed to learn during servicing or development? What are the restrictions depending on the application domain?

AI for the test

On the other hand, we testers are of course tempted by another idea, namely to use the unlimited possibilities of artificial intelligence for software testing itself. There are also interesting developments in this area. One area of application that stands out is performance testing. Here, the AI can detect anomalies in system behavior and system load depending on the input data. These observations could be used to bring the system closer and closer to the load limit or beyond.

Finding similarities and commonalities can be used in many other fields. For error messages, test specifications, test object log files, the generation of test data based on data format descriptions or test sequences based on code analysis. Another exciting topic is the use of an AI as a test oracle. This raises another question: can an AI that serves as a test oracle also be used as a system to be tested? And can it do it even better than the original? The question of limits also arises: what decisions can and should we leave to an AI? Some people are reminded of the trolley problem. This is already unsolvable for humans, or at least difficult to justify in most cases. If a fatal accident is unavoidable and you can still influence the outcome, who gets to live and who has to die?

These and other thoughts are an introduction to this highly interesting topic. It is economically very interesting and will have many exciting years ahead of it.

Requirements Engineering & Software Test

A powerful combination After more than 40 years of experience in software engineering, too many projects are still overrun or fail completely. The...

Weiterlesen

Security Tests with Static Analysis

Given the increasing threat to the security of IT application systems, software testing is expected to uncover the majority of potential security...

Weiterlesen

Implementing Test Automation

At best, the introduction of test automation in a company takes place gradually. This serves to adapt the techniques and tools to the needs of the...

Weiterlesen