Artificial intelligence (AI) and software testing are two important topics in today’s software and system development. Using them together or on top of each other offers the opportunity for enormous synergies.
Although artificial intelligence has been the subject of research for decades, it has been on a very media-effective triumphal march in recent years. All tasks seem solvable, all human intelligence unnecessary, the possible consequences controllable. There are many convincing demonstrations. One example is AI-controlled computer players who beat the world’s best GO player Lee Sedol in the form of Alpha GO. Incidentally, GO is much more complex than chess and the game sequences are therefore much more difficult to predict. Other AI applications can accurately recognize the content of images. This opens up a wide range of applications, from the early diagnosis of dangerous diseases to the monitoring of public spaces. But is it that simple?
Of course, the results presented seem very convincing. But is the path that the AI has chosen to achieve the result always really intuitive? Studies have shown, for example, that some images were not classified as horse images based on the actual horses depicted. Instead, they were classified based on the piece of forest also present in many horse pictures. Others based on the signature of the photographer (who often takes pictures of horses). In this way, the miracle of AI was disenchanted by some prematurely lauded examples. Memories of the horse “Kluger Hans” came to mind, who could only count for appearances.
In addition, failures such as the accident involving an UBER vehicle were also exploited by the media, so that autonomous vehicles were soon considered a danger. It is then easy to overlook the fact that in Brandenburg alone, an average of 2 to 3 people are killed in road accidents every week. A less than perfect AI could certainly offer advantages here. But there are other issues behind this. Accordingly, this technology is sometimes exaggeratedly praised and sometimes condemned before the connections are clear.
So I see an exaggeration here, both for better and for worse. Despite all the hype, AI has a lot of potential, even in safety-critical applications. The prerequisite for this is, of course, that this technology can be well secured.
A number of questions can be asked in this context. In the following, I will touch on a few of them on various sub-topics and thus provide an introduction. As I said, science has been dealing with more in-depth questions on this topic for decades.
First of all, statistics play a major role here and are used for the internal evaluation of situations, images, etc. The confusion matrix can be used to evaluate prediction and reality for binary classifiers: What is predicted correctly? Where and how is the AI wrong?
There are various means of evaluating these results. For example, the harmonic mean of accuracy and sensitivity, also known as the F1 score. In any case, it is clear that the significance attributed to the result varies depending on the domain. For example, the incorrect diagnosis of the existence of a tumor (which does not actually exist) is not so bad. However, the non-detection of a (actually existing) tumor is very important for the patient’s life expectancy.
In addition, the experienced tester naturally has another question: Which of the quality assurance tools he has known for many years can be used here?
On the other hand, we testers are of course tempted by another idea, namely to use the unlimited possibilities of artificial intelligence for software testing itself. There are also interesting developments in this area. One area of application that stands out is performance testing. Here, the AI can detect anomalies in system behavior and system load depending on the input data. These observations could be used to bring the system closer and closer to the load limit or beyond.
Finding similarities and commonalities can be used in many other fields. For error messages, test specifications, test object log files, the generation of test data based on data format descriptions or test sequences based on code analysis. Another exciting topic is the use of an AI as a test oracle. This raises another question: can an AI that serves as a test oracle also be used as a system to be tested? And can it do it even better than the original? The question of limits also arises: what decisions can and should we leave to an AI? Some people are reminded of the trolley problem. This is already unsolvable for humans, or at least difficult to justify in most cases. If a fatal accident is unavoidable and you can still influence the outcome, who gets to live and who has to die?
These and other thoughts are an introduction to this highly interesting topic. It is economically very interesting and will have many exciting years ahead of it.