Security Tests with Static Analysis
Given the increasing threat to the security of IT application systems, software testing is expected to uncover the majority of potential security...
Analytical quality assurance offers a cost- and resource-saving way of checking software artifacts - such as requirements, UML diagrams, source code and test cases - according to predefined rules and measuring them according to criteria such as complexity, quality and quantity. By using analytical quality assurance, errors can be found and rectified at an early stage.
The following five approaches can be used to ensure the quality of software:
The psychological approach includes all measures to influence people to work better, i.e. measures such as rewards, punishments, promotions and appeals to work ethics. The constructive approach includes all measures to improve working conditions - such as the provision of better work equipment - tools, languages, techniques, processes, etc. - and the provision of workplaces that are as modern as possible - premises, communication infrastructure, tools, social networks and the like.
The analytical approach includes all measures to control the software artifacts produced by the software workers - checking and measuring the documents, models, source code texts and test cases - with the aim of identifying quality deficiencies and deviations from the target and drawing the attention of the authors of the artifacts to them.
The empirical approach includes all test measures such as unit testing, integration testing, system testing and acceptance testing. Here, the behavior of the completed software is observed in a test environment under controlled conditions in order to detect deviations from the target behavior.
The subsequent approach includes all measures that serve to subsequently improve the quality of the software artifacts after they have been completed - measures such as re-engineering, refactoring and re-implementation. In contrast to the elimination of individual defects and errors by the software developer during development, this involves targeted actions relating to an entire system or subsystem with the aim of improving the quality of that system.
It is assumed that the degree of quality improvement can be measured, and in this article we will focus on the third, analytical approach. The other four approaches are equally important and should be pursued in parallel, but each of these approaches contains enough material for a book of its own.
Before we go into the analytical measures, it is first necessary to distinguish between the two terms “deficiencies” and “defects”. A defect is the violation of a quality rule, a deviation from the norm. The rule could be that code modules must not be nested more than three times or that each input parameter should be checked for plausibility.
A defect will not necessarily affect the behavior of a software product. There are defects that can very well lead to an error, e.g. if exception handling or safety checks are missing in the code or if use cases are not fully specified. However, most types of defects will only affect the maintenance and further development of the software, e.g. if the code modules are too tightly coupled or if arbitrary data names are used. Every software artifact is subject to the laws of software evolution. In order to retain their value and remain useful, they must be constantly developed further. This further development incurs costs. If you want to minimize these costs, you must ensure that the software becomes and remains capable of further development. The two goals of analytical quality assurance follow from this:
Errors are - in contrast to defects - problems with the behavior of the software. This means that the software behaves differently than expected. Either the software aborts or it produces incorrect results. Such misbehavior can only be detected when the software is executed. This is why testing is called the empirical approach to quality assurance. The software is executed in order to examine its behavior empirically. This always involves a comparison. The actual behavior is compared with the target behavior and any deviation is initially regarded as an error.
Errors are caused by inadequacies in the code or in the design - so-called defects. However, it is also possible that the target specification or expectation is incorrect. In this case, it is not the code that is faulty, but the specification or expectation. In any case, there is a discrepancy between the target and the actual and this needs to be eliminated. The purpose of empirical quality assurance is to uncover such discrepancies, also with regard to the performance and resilience of the software. Testing is unavoidable, but is a topic in itself and will not be dealt with further here.
In theory, analytical quality assurance is a broad, almost limitless field. In practice, it can be reduced to a few automatic checks. Software generally consists of four object types on four semantic levels:
Requirements documents and source code are both texts. Behind the UML models are structured data in the form of XML documents in which the design elements and their relationships are recorded. The UML diagrams are mapped from the XML texts and vice versa. The test cases are also converted into texts, namely into test scripts that are interpreted or compiled in order to generate test data and validate test results. Initially, however, the test cases are usually stored in tables with rows and columns in which the test case attributes are recorded.
Analytical quality assurance is about checking and measuring these four types of text. The various testing and measurement measures are summarized here under the generic term software audit. In the past, they were carried out manually under names such as “design reviews”, “code inspections” and “test audits”. The author Sneed himself was responsible for inspecting structograms and chill programs in the Siemens EWSD project (Electronic Dialing System Digital) at the end of the 1970s. It took him about half a day to check just one module. The code inspection alone took over a year. Today - in the age of agile development - nobody can afford that anymore. Analytical quality assurance, like empirical quality assurance and testing, must be automated. The test and measurement objects are retrieved from the configuration management system and fed directly to the appropriate tool.
Requirement documents are checked and measured using a tool for the automatic analysis of natural language. The texts must first be marked up with either XML tags or keywords. Keywords are recommended to make the document easier to write for the end user, as it is better to include these words or tags in the text right from the start, where they can also serve as waymarkers. Examples of such keywords in German:
Typ | Bezeichnung |
ACT | Akteur |
CASE | Use-Case |
INPT | Eingabe |
MASK | GUI |
OBJT | GO |
OUTP | Ausgabe |
PATH | Hauptpfad |
POST | Nechbedingung |
PRE | Vorbedingung |
PROC | Verarbeitet |
REFS | Implementiert |
USES | Benutzt |
REQU | FUNC-REQ |
REQU | NF-REQ |
RULE | GR |
TRIG | Auslöser |
These keywords - keywords in context - are used to identify the text elements. In addition to the individual requirements, business rules, business objects, actors, interfaces, interfaces and use cases are also identified. In a semi-formal requirements text, the project-general elements such as business objects and business rules should be documented first. This is followed by the project-specific requirements, both functional and non-functional. This is followed by the system actors, system interfaces and interface patterns. Only at the end do the use cases come as a bridge to the system design. The use cases can refer to the requirements that they fulfill, the business rules that they implement, the objects that they process, the interfaces that they serve and the actors that initiate them.
The following example serves to describe a simple requirement specification for a calendar function.
Functional requirements
FUNC-REQ-02
(multilingualism): The day of the week can be returned in either German, French or Italian.FUNC-REQ-03
(Alignment): The text of the day of the week can be aligned to the left or right.FUNC-REQ-04
(error handling): If it is not possible to convert the date into a day of the week, question marks “??????” should appear in the response field.Non-functional requirements
NF-REQ-01
(response time): The response time for customer queries should be <= 1 second and the response time for customer orders <= 3 seconds.NF-REQ-02
(load capacity): The service must be able to process at least 500 orders per hour without loss of performance.NF-REQ-03
Availability: The service must be available at least 95 percent of the time, 24 hours a day, seven days a week.Business objects
GO-01
: CalendarGO-02
: DatesGO-03
: Days of the weekBusiness rain
GR-01
: (implausible date handling): If a date is implausible, the day of the week is filled with “?”.GR-02
: Calendar starting date: The calendar assumes that January 1, 1901 was a Tuesday.GR-03
: Language codes: The following language codes apply in the company: German = “1” , French = “2” , Italian = “3”, English = “4”.Use case The text analysis tool checks whether all mandatory properties and references are present and whether they are consistent, e.g:
The text analyzer also checks whether all requirement sentences are formulated correctly. Chris Rupp and her colleagues at Sophist GmbH have defined rules and checking regulations on how requirements sentences should be formulated - the so-called “Sophist rules”. For example, each requirement should be clearly identifiable, assignable and interpretable. Each requirement sentence should have a subject, object and predicate and avoid nominalizations (i.e. the combination of several actions in a substantive verb). Representative of these rule violations are, among others:
The SoftAudit tool can check compliance with such rules sentence by sentence. Finally, the analyzer can count the text elements and create a requirements metric. Among other things, the requirements, business rules, object references, use cases and processing steps in the use cases are counted. Function points, data points and use case points are also counted. The aim is to measure the size, complexity and quality of the requirements documents.
Attribut | Beschreibung |
---|---|
Bezeichnung | Wochentagermittlung |
Erfuellt | Func-Req-01, Func-Req-02,Func-Req-03, Func-Req-04. |
Implementiert | GR-01, GR-02, GR-03, GR-04, GR-05, GR-06, GR-07. |
Funktionen | FUNK-01. |
Empfaengt | REQUEST-01. |
Sendet | RESPONSE-01. |
Verarbeitet | GO-01, GO-02, GO-03. |
Auslöser | Nachricht_vom_Client |
Akteure | ClientProgramm |
Vorbedingungen | Client muss berechtigt sein. Datum muss gültig sein. |
Nachbedingung | Bei Erfüllung Wochentag in Deutsch, Französisch oder Italienisch. Bei nicht Erfüllung Wochentag = ???????. |
Hauptpfad | 1) Client sendet Nachricht mit Datum. 2) Service prüft Datum. 3) Wenn Datum gültig ist, sucht Service Wochentag in Wochentagtabelle. 4) Falls der Sprachcode 1 ist, holt Service den deutschen Wochentag. 5) Falls der Sprachcode 2 ist, holt Service den französischen Wochentag. 6) Falls der Sprachcode 3 ist, holt Service den italienischen Wochentag. 7) Service gibt gewählten Wochentag zurück. |
Nebenpfad | 8) Service gibt ?????? zurück. |
Ausnahmen | Service lehnt Auftrag ab wenn Client nicht berechtigt ist. |
Erbt | Standard Datumsfunktion. |
Benutzt | Datumsprüfung |
Erweitert | Client-Kalender |
Kommentare | Diese Service gilt nur für einen Datum seit 1900. |
To check and measure the design model, the SoftAudit tool uses an XML parser that parses the XML schema behind the UML diagrams. When parsing the XML schema, the model types such as classes, objects, methods, attributes, parameters, activities and use cases are recognized and counted. At the same time, the relationship types such as association, inheritance, usage and inclusion are recognized and evaluated. The model complexity results from the ratio of the model relationships to the model elements. The more relationships there are - relative to the number of elements - the greater the complexity of the model:
Model complexity = 1 - (model elements / model relationships)
The model size is derived from the number of model elements. The model quality is the ratio of the actual to the target. For example, each class should only have a limited number of dependencies on other classes (degree of coupling) and a limited number of attributes relative to the number of methods (degree of cohesion). Each class should have at least one attribute and two methods. The sequence diagrams may only contain references to classes that are already defined in a class diagram. In the activity diagrams, only steps of a specified use case may appear, which in turn are defined as methods in a class diagram. In this way, the consistency of the model is checked.
This is also where the first cross-check beyond semantic levels takes place. It is possible to check whether all business objects in the requirements documentation also appear as objects in the object model and whether all use cases specified in the requirements document appear in the use case diagrams. The consistency between the requirements document and the design model can therefore be checked here. Typical deficiencies in the design model are
The analysis of the UML model in turn produces two reports: a deficiency report of the discrepancies between target and actual and a measurement report with the design metrics (size, complexity and quality)
+-----------------------------------------------------+
| R E Q U I R E M E N T S I Z E M E T R I C S
+-----------------------------------------------------+
| Number of Function-Points ======> 188
| Number of Data-Points ======> 961 |
| Number of Object-Points ======> 726 |
| Number of Use Case Points ======> 108 |
| Number of Test Case Points ======> 255 |
+-----------------------------------------------------+
+-----------------------------------------------------+
| R E Q U I R E M E N T C O M P L E X I T Y M E T R I C S
+-----------------------------------------------------+
| Data Density ======> 0.236 |
| Functional Density ======> 0.100 |
| State Density ======> 0.386
| Conditional Density ======> 0.402 |
| Referential Density ======> 0.768 |
| Test Case Density ======> 0.440 |
| Overall Requirement Complexity Rating ======> 0.388 |
+-----------------------------------------------------+
+-----------------------------------------------------+
| R E Q U I R E M E N T Q U A L I T Y M E T R I C S
+-----------------------------------------------------+
| Degree of Completeness ======> 0.966 |
| Degree of Consistency ======> 0.874 |
| Degree of Stability ======> 0.896 |
| Degree of Changeability ======> 0.678 |
| Degree of Testability ======> 0.364 |
| Degree of Conformity ======> 0.941 |
| Overall Requirement Quality Rating ======> 0.786 |
+-----------------------------------------------------+
The source code is what is checked and measured the most. For this purpose, there are already a number of tools such as “Software Sonar”, “FxCop” (.NET) and “PMD” (Java), which check no less than 180 different coding rules. The “SoftAudit” tool also checks source code in twelve different languages, from IBM Assembler to Java and PHP. Some rules for the code are dependent on the respective programming language, for example:
Other rules apply to a specific class of languages, such as object-oriented languages:
public
.Then there are rules that are universally valid and apply to all languages, such as:
Such rules serve a threefold purpose:
Experienced developers will always question individual rules - and that’s a good thing. They provide an impetus to think about the existing rules and question whether they are up to date and make sense. In the end, however, a development team must agree on which rules should apply to them.
+--------------------------------------------------------+
| D E S I G N C O M P L E X I T Y M E T R I C S |
+--------------------------------------------------------+
| Class Interaction Complexity ======> 0.953 |
| Class Hierarchical Complexity ======> 0.166 |
| Class Data Complexity ======> 0.633 |
| Class Functional Complexity ======> 0.500 |
| State Complexity ======> 0.692 |
| State Transition Complexity ======> 0.420 |
| Activity Complexity ======> 0.580 |
| Usecase Complexity ======> 0.562 |
| Actor Interaction Complexity ======> 0.238 |
| Overall Design Complexity ======> 0.500 |
| Average Design Complexity ======> 0.524 |
+--------------------------------------------------------+
+--------------------------------------------------------+
| D E S I G N Q U A L I T Y M E T R I C S |
+--------------------------------------------------------+
| Class Coupling ======> 0.443 |
| Class Cohesion ======> 0.323 |
| Design Modularity ======> 0.705 |
| Design Portability ======> 0.347 |
| Design Reusability ======> 0.161 |
| Design Testability ======> 0.534 |
| Design Conformance ======> 0.773 |
| Design Completeness ======> 0.250 |
| Design Consistency ======> 0.563 |
| Design Compliance ======> 0.420 |
| Average Design Quality ======> 0.451 |
+--------------------------------------------------------+
In addition to such elementary statement controls, there are also rules for the code modules as a whole. These are rules that limit the size of the code modules, the number of encapsulated data attributes and the number of external relationships. These rules promote the modularity and reusability of the code. Other rules of this type are rules for limiting the number of parameters in an interface and the width of views of a database. Such rules promote testability. Rules for commenting and indenting nested lines of code as well as for limiting line lengths serve to improve comprehensibility.
Coding rules are therefore not there to nag developers, but to make it easier for developers to work together, just as traffic rules are there to organize road traffic. By statically checking the code and detecting rule violations, many potential problems are avoided. Above all, it prevents technical debt from spiraling out of control. In his book “Clean Code”, Robert C. Martin emphasizes the importance of static code analyzers for maintaining code quality.
Measuring the code goes hand in hand with checking it. Statements, statement types, data, data references, objects and code blocks are counted. Metrics are then calculated from these counts:
The level of code quality should of course be as high as possible on the rational scale of 0 to 1. The degree of complexity, in turn, should be as low as possible. These measured values are important points of reference for evaluating the software as a whole and provide guidance for improvement measures. When it comes to improving the quality of the code through refactoring measures, the quality of the code before and after the improvement measures must be compared. This requires a quantification of the quality.
+--------------------------------------------------------+
| quality metrics | before refurbishment | after refurbishment |
+--------------------------------------------------------+
| Degree of Modularity ======> 0.265 | ======> 0.501 |
| Degree of Portability ======> 0.862 | ======> 0.842 |
| Degree of Reusability ======> 0.960 | ======> 0.960 |
| Degree of Testability ======> 0.304 | ======> 0.548 |
| Degree of Convertibility ======> 0.120 | ======> 0.242 |
| Degree of Flexibility ======> 0.880 | ======> 0.880 |
| Degree of Conformity ======> 0.148 | ======> 0.563 |
| Degree of Maintainability ======> 0.426 | ======> 0.494 |
| Weighted Average Quality ======> 0.495 | ======> 0.628 |
+--------------------------------------------------------+
For the longest time, testware was neglected, even though test cases are essential for every software system. It never occurred to anyone to test or even measure them. Testware was never regarded as part of the actual software product. It is only delivered to customers in exceptional cases. But this argument also applies to the requirements specification and the design model. It has now been recognized that their quality must also be assured - and this also applies to the testware.
Testware refers to the test procedures, test cases and test scripts that are used to test the software system and its components. Test scripts are like program code. They have a formal syntax and can be parsed like code. There are also rules for how they should be structured. The problem here, as with the requirements, is that there is no binding language standard. Every test tool manufacturer offers its own scripting language. Nevertheless, anyone who wants to can write a tool to check and measure test scripts, and for some script types there are ready-made analysis tools such as for the WebService test script.
if (operation = "getWeekDay");
if (response = "getWeekDay1Response");
assert out.$ResponseTime < "1100";
if (object = "return");
assert out.P1_TT = "16":
assert out.P1_MM = "10";
assert out.P1_CE = "18";
assert out.P1_JJ = {81:89};
assert out.LANG_CODE = {1:3};
assert out.DIRECTION = "L";
assert out.DAY_NAME = "Mittwoch";
assert out.RETURN_CODE = {1:3};
endObject;
endResponse;
endOperation;
SoftAudit assumes that test cases are stored in an Excel spreadsheet or a relational database table in which each column contains a specific attribute of the test case. It is up to the user to define the names and types of the test case attributes and to record them in a parameter list. The test tool uses this parameter list to edit the test case table. It then checks whether all mandatory attributes have been specified, whether the test cases have been classified according to objective, purpose and type, whether the test cases have been automated and whether the test cases have already been executed. Because the test cases are related to requirements or change requests, it is also checked whether the referenced requirements actually exist. If so, a link is created between the test cases and the requirements. By inverting those links, it is determined which test cases belong to which requirements or use cases and business rules. There should be no requirement or use case without a test case. All links between test cases and use cases as well as between test cases and requirements are documented by the tool.
+----------------------------------------------------------+
| S Y S T E M C O N S I S T E N C Y M E T R I C R E P O R T |
+----------------------------------------------------------+
| LANGUAGE: GERMAN/UML/C++/TCS | DATE: 22.06.13 |
| SYSTEM: STORAGE | PAGE: 3 of 19 |
+----------------------------------------------------------+
| E N T I T Y C O U N T S |
| Number of System TestCases ======> 356 |
| Number of Requirements to be tested ======> 151 |
| Number of Requirements with TestCases ======> 124 |
| Number of Code Components to be tested ======> 229 |
| Number of Code Components with TestCases ======> 80 |
+----------------------------------------------------------+
| R E L A T I O N C O U N T S |
| Number of TestCase/Requirement Relations ======> 699 |
| Number of TestCase/Component Relations ======> 10086 |
+----------------------------------------------------------+
| D E F I C I E N C Y C O U N T S |
| Number of Requirements with no TestCase ======> 27 |
| Number of Code Components with no TestCase ======> 149 |
+----------------------------------------------------------+
| C O V E R A G E M E T R I C S |
| Requirement Test Coverage Rate ======> 0.821 |
| Code Component Test Coverage Rate ======> 0.349 |
+----------------------------------------------------------+
For the overall evaluation of a software product, it is necessary to summarize the test and measurement results of all sub-products. A whole is only as good as the sum of its individual parts plus all the relationships between the individual parts.
Software systems are more than just code. They consist of the requirements documents, the design models, the code sections and the test cases. These layers must be complete and consistent. Therefore, in addition to checking individual artifacts, their consistency must also be checked. It must be possible to trace the code modules back to the overlying architecture and requirements description. This is done either via common identifiers, via comments that refer to requirement elements, or via the test cases. Test cases link the code modules to the requirements. Each requirement has a test case that confirms the fulfillment of that requirement, and this fulfillment takes place in the code. The quality of the overall system depends on the visibility of all internal and external dependencies. Hence the effort to make those relationships visible.
The consistency check of the software begins with the requirements documentation. Each functional requirement must be fulfilled by a use case and confirmed by at least one test case. Each use case must in turn be implemented by one or more code modules. Each code module or class must be assigned to at least one use case. Static analysis alone makes it possible to recognize and register these links. The missing links can then be identified as quality defects. Typical consistency defects are
The responsible tester or quality inspector receives a report on the missing relationships in the system and can refer the developers to them without having to test.
Completeness and consistency make up one side of static software quality assurance. It is realized by comparing the measurement results. The other side is conformity with the general and product-specific quality requirements. These requirements are expressed not only in the requirements documents, but also in the numerous publications on the subject of software development, which document 50 years of experience in the construction of software systems.
The painful experience with numerous bad solutions shows what a good solution should look like. The rules for a good solution are formulated in metrics that set a kind of target benchmark for the software. A metric is nothing more than a quantified empirical value (e.g. software modules should only be loosely coupled so that each one can be developed further without influencing the others, or software requirements should be formulated so precisely that no misunderstandings can arise). Tom McCabe’s much-cited cyclomatic number is ultimately just a rule of thumb for limiting process complexity. The same applies to other measures such as inheritance depth and class size. The implementation of such empirical values can best be evaluated using quantified measures.
The aim should be to bring together the measured values from the analysis of all software artifact types in a central metrics database and compare them there. Such a metrics database requires several import interfaces to record the measured values from the various analysis tools and a graphical user interface to present users with various views of the figures. The aim is to express the quality of the software in figures. The metrics database system by Sneed has already proven itself for this purpose in several evaluation projects. In these projects, software systems with up to three million code statements and 40,000 FunctionPoints were tested and measured in order to draw conclusions for the further development of those systems.
Many decision-makers in IT are skeptical about the topic of “static software analysis”. They do not recognize the connection between the quality of system design and the quality of system behaviour. As a result, they focus their attention and resources on what they do understand, namely empirical quality assurance - testing according to the motto “the proof of the pudding is in the eating”. What they overlook is that the majority of problems with software behavior can be traced back to its design. Design flaws often lead to behavioral errors. It is much cheaper to uncover such defects through static analysis than in an elaborate test. Paul Duvall emphasizes this in his book on “Continuous Integration”.
The other goal of analytical quality assurance - ensuring further development - is often completely ignored. Software managers are too busy with their daily problems to take a look at the future.
In doing so, they overlook the fact that the causes of their day-to-day problems largely lie in the short-sightedness of the past. If they invested more in ensuring system quality, they would have fewer problems with the maintenance and further development of their systems later on.
The means for analytical quality assurance are now so advanced that the costs are hardly significant. They are minimal compared to the benefits that such analyses bring. It is therefore to be expected that analytical quality assurance will become increasingly important in the future. Although it will never completely replace testing, it can uncover many quality defects that never come to light during testing.
Given the increasing threat to the security of IT application systems, software testing is expected to uncover the majority of potential security...
Thomas Karl is Head of Quality Transformation Services and Thought Leadership Portfolio Lead of Software Engineering for Germany, Austria and...
Experience of a medical device manufacturer After 25 years of developing medical devices and software for cardiological diagnostics and outpatient...