Metamorphic Testing for C++ Software Libraries

by Andrei Lascu

The aim of the project is to provide a generalized infrastructure which generates randomly fuzzed inputs and then performs metamorphic testing on these inputs for a specific C++ software library under test. The test generation for the library under test is guided by two configuration files, in YAML format, that have to be provided by the user: one of them declares how the input data is fuzzed, while the other declares metamorphic relations, as well as generation rules for special objects (e.g. empty sets, universe sets).

At the moment, the infrastructure is split into three main components:

api_fuzzer is the component which, based on the corresponding configuration file, emits API calls required to build some random input data; the validity and quality of the input data is primarily driven by the configuration file provided by the user
set_meta_tester is the component which, given an input variable produced by the api_fuzzer, will produce a number of metamorphic variants using metamorphic relations provided in the corresponding configuration file
test_emitter is the entry-point, which links the two other components together and additionally synthesizes the other required parts of the output test file (e.g. includes, main function definition)

In addition to the infrastructure itself, we provide configuration files for input fuzzing and metamorphic testing of isl and PPL.

The project is publicly available at its Github repo.

Background

In this section, we provide a brief overview of metamorphic testing and the main targeted library, isl, along with some references for further reading.

Metamorphic Testing

In testing, one of the biggest issues to overcome is what is known as the oracle problem. That is, for a given test input, an oracle is required which is able to determine what the correct output is. For some small problems, the oracle can be a human tracing the program instructions. However, a golden oracle, one that is able to tell us to output for all test cases in all systems, is often unavailable even for moderately-complex systems.

Metamorphic testing is one technique which is able to overcome this issue. The main idea behind metamorphic testing is using equivalence relations (known as metamorphic relations) in order to generate more test inputs, increasing overall coverage of the test suite. The metamorphic relations are usually manually defined by an expert of the system under test, although some work exists which attempts to synthesize such relations via machine learning.

isl

The integer set library (isl) is a software library for Presburger arithmetic. It offers implementations for various arithmetic objects, such as sets, maps or piece-wise affine expressions, as well as a rich set of operations over them (e.g. convex hull for sets, arithmetic operations over piece-wise affine expressions). The rich set of mathematical operations was what inspired us to start with metamorphic testing for isl, as there are numerous metamorphic relations to be defined using these operations.

One application of isl is in polyhedral compilation, which involves reasoning about run-time execution of programs and possible rewrites to make better use of existing resources. For example, one could rewrite array accesses in a loop to make better use of memory locality, or merge two loops into one by reasoning about correlations between memory accesses. isl is used to model iteration spaces, data dependencies, memory access, execution schedules, all of which are either analysed or optimised in the context of polyhedral scheduling.

Infrastructure discussion

In this section, we discuss how the fuzzing and metamorphic testing phases of the test generation are performed. These are the main two components of the full test generation, with the remainder essentially being gluing the two together and adding the rest of the requirements to build a valid C++ program.

Fuzzing

The fuzzing process involves generating a sequence of API calls of the library under test in order to generate one object which is to be taken as input to the metamorphic testing process. The method of generating such object is up to the user, and of course there are various ways of generating the same object time, some better than others.

The generation process is described by the user via a YAML configuration file. Currently, the user must provide the following:

the headers to be included
inputs to the fuzzer, with corresponding types and ranges; these inputs will be randomly generated during the fuzzing process
object types in the library the fuzzer should know of; these don’t have to be all the types defined by the library, but just those types required to generate the final object
- some types can be declared as singleton_types, meaning that the fuzzer should only instantiate one object of that type in the generation process
- also, the fuzzer is aware of two primitive types at the moment, namely unsigned ints and strings
functions to be used in the generation process, with the following information:
- if the function is a member function, then the member_type field should contain the name of the type the function is a member of
- if the function returns a value, then the return_type field should be the type of the return object
- param_types should be a list of type names that the function takes as arguments
constructors for object types; these are special functions which do not require explicit member_type or return_type definitions
a sequence of operations (at the moment, this means function calls) which should yield the fuzzed object; at the moment, there are two types of operations:
- func declares that a function call should be generated; the user can specify the following:
  - the func field represents the function name
  - the return field specifies which variable the result should be returned to
  - the target field specifies what variable the function should be applied to
  - the func_params field specifies the parameters to the function call and should be used if at least one of the parameters should have a special value
- for represents a for loop comprised of function calls; the user can define the counter field, which should be a range declaring the start and end values of the iteration counter. Note that these values must both be integers. In addition, the user can also define the fields for the underlying function call, as described above.

In addition to the above, the fuzzer is able to interpret special values in the code. These are:

<string=N> is interpreted as a string with value N
<new=N> is interpreted as a new object of type N (i.e. ensure that a new object is created rather than reusing an existing one)
<var=N> is interpreted as a variable with name N; the type is derived from the function declaration
<loop_counter> can only be used in a for type operation and represents the current value of the loop counter
<input=N> is interpreted as the value of input N, where N must be the name of one of the defined inputs

The fuzzer reads in one-by-one the sequence of operations used to define how to generate the end object. When it calls a function, all the required objects (e.g. the target object to call the function on, the result object and the function parameters) are either chosen from the existing objects, or some are generated on the spot. The generation is also done by randomly choosing a suitable function (i.e. one that returns an object of that type), thus it could be a constructor or another function call. This leads to a chain reaction of multiple objects being created across various function calls by asking the fuzzer to only execute one function.

For our testing, we have prepared two fuzzing strategies for isl sets. The first one creates piece-wise affine expressions via arithmetic operations and then uses those expressions as constraints over integer sets. The issue with this approach is that, as we increase the number of constraints to generate, they quickly become incompatible, leading to fuzzing sets which evaluate to false (the rate of false sets generated are roughly 55% over 10k tests). Nevertheless, this was the main approach used and it was successful in finding bugs. Recently, after discussion with the developers of isl, we came up with another approach for generating sets, based on points. We define a number of points in the space and then use those to create a set. The issue at the moment is that, since the points are randomly generated from existing functions and the constructor does not explicitly take values for the dimensions of the points, but instead sets all dimensions to 0, most of the generated points are set to 0 in our sets. We have a plan to guide the generation process to make minimal use of the constructor, but that is not implemented at this point.

Metamorphic relations

The metamorphic relations are similarly described in a YAML configuration file. They are defined by using actual library code, as you would normally write a program with. For example, our isl metamorphic relations file defines a list of union relations thus:

union:

%m = %1.unite(%2)
%m = $1.complement().intersect(%2.complement()).complement()

This means that both both these sequences of calls are equivalent to performing the union of two sets. In the notation above, %1 and %2 both represent input objects, which mean isl sets for this particular configuration file. Furthermore, all the function calls used (e.g. unite, complement, intersect) are functions that exist in isl and can be applied to sets. Finally, %m is a notation representing the current metamorphic variable for the test and it means that we must store the result in the variable (i.e. the input objects are not modified in-place).

Further, one might observe that the second relation is an application of a DeMorgan law translated into isl function calls. This shows that we can use knowledge from other fields to generate rich metamorphic relations.

The following table describes the sort of metamorphic relations we have defined for isl and currently employ in our testing:

Relation type	Metamorphic relation
Identity	`%1.unite(%1)`
	`%1.unite(%e)`
	`%1.intersect(%1)`
	`%1.intersect(%u)`
	`%1.subtract(%e)`
	`%u.subtract(%1.complement())`
	`%1.intersect(%1.unite(%2))`
	`%1.unite(%1.intersect(%2))`
	`%1.complement().complement()`
	`%1.coalesce()`
	`%1.detect_equalities()`
Complement	`%1.complement()`
	`%u.subtract(%1)`
Subtract	`%1.subtract(%2)`
	`%1.intersect(%2.complement())`
Union	`%1.unite(%2)`
	`%1.complement().intersect(%2.complement()).complement()`
Intersect	`%1.intersect(%2)`
	`%1.complement().unite(%2.complement()).complement()`

Note that in addition to the modifier for input objects (i.e. %1 and %2), there are two more modifiers (%u and %e), which represent special sets. We also have a list of generators, where we define how to obtain such sets. In this instance, %u is meant to be the universe set, while %e is meant to be the empty set. The generator relations are:

Generation object	Relation
Universe	`%1.unite(%1.complement())`
	`%e.complement()`
Empty	`%1.intersect(%1.complement())`
	`%1.subtract(%1)`
	`%1.subtract(%u)`
	`$1.intersect(%e)`
	`%u.complement()`

As can be observed, these relations can also be chained together, leading to increasingly complex test cases being produced.

Our metamorphic test generation comprises of:

using the existing relation types, create a relation chain, up to a given limit; a relation chain represents the sequence of relation types to apply (e.g. Identity-Subtract-Identity)
For each relation in the relation chain, select a corresponding metamorphic relation; we call the result an instantiated relation chain
Replace all the modifiers (i.e. the variables starting with %) with corresponding variable names or generator relations; this yields a metamorphic variant
Add the metamorphic check, which is similarly defined in the metamorphic configuration file as a relation
Repeat the above three steps up to a given number of times; we cache each instantiated relation chain to ensure that we do not create the same metamorphic variant multiple times. If we happen to create the same instantiated relation chain, we try again, selecting other metamorphic relations. There is a hard limit to the number of times we attempt this, after which we simply give up on trying to generate further metamorphic variants.

The full metamorphic relations file can be seen here.

Preliminary results

isl

At the moment, we have reported three bugs to the developers of isl via their Google groups page. Two of these bugs have been confirmed and fixed by the developers, with a third one reported and awaiting confirmation. Further, we have run numerous tests (around 70k total) and have three further test cases which emit an assertion failure. One of the three has the same message as the most recently reported and unconfirmed bug, while the other two share the error message. This puts our total of found bugs at 4 at the least.

The reported bugs and respective commit fixes, if applicable:

Issue in isl_set_is_subset regarding identical integer divisions: report commit
Issue in isl_coalesce regarding shifted facets before wrapping: report commit
Assertion failure in isl_tab.c with message “Assertion i >= 0 failed: report

Future work

At this point, there are two major changes to be done to the project: adding an automated reducer to the generated tests and interleaving the input fuzzer with the metamorphic tester.

Of course, there are also some straightforward future extensions, such as adding more libraries to target in order to evaluate whether or not the current design of the configuration files is sufficient or many code improvements that could be made. However, we are still focusing on adding more features to the project.

Reducer

Currently, there are three approaches under consideration regarding how to do automatic test-case reduction:

Record some meta-data about how the metamorphic relations are applied to produce the test, such that they can be systematically undone later. This would involve modifying the infrastrcture appropriately and building the custom reducer, which would be a significant undertaking based on prior experience.
Take the CReduce approach, where the source code of the test is taken and random instructions are removed. The code is then checked for well-definedness via third party tools, then compiled and run again, looking for specific behaviours to manifest (e.g. stderr contains some specific error message string, the program crashes with a specific return code)
Another approach under consideration is using the reduction capabilities of Hypothesis. Broadly, this would involve replacing every call to a random number generator into Hypothesis, defining what kind of behaviour it should look for in the reduced programs and letting it do all the heavy lifting.

Interleaving fuzzing and metamorphic testing

This is something work has begun on, but has yet to finish. Currently, the infrastructure first produces some object via the fuzzer, emitting all the API calls required for this, then creates all the metamorphic variants on this single object. We want to interleave the two, believing that this will allow us to build richer and more complex tests.

The idea is that along the generation process, we would be able to emit metamorphic tests for either intermediate values of the fuzzed object or for other compatible temporary objects, in addition to the metamorphic tests emitted one the final object.

To illustrate, the current implementation is presented in:

current implementation of API fuzzer - Metamorphic tester interaction .

Note that the entirety of the input object is constructed as per the rules defined in the fuzzer before any metamorphic testing is done.

The proposed change is illustrated in:

proposed change for API fuzzer - Metamorphic tester interaction

At random points throughout the fuzzer process, we can emit some metamorphic tests. The input variable to these tests, while it has to be of the type specified in the metamorphic testing file, could be any variable in scope of that type, not just the eventual generated variable with an intermediate value.

Other libraries to target

In order to determine the limitations of our tool, the best approach is to try testing various libraries across multiple domains and fix any assumptions made or add extra features. Currently, in the near future, we plan on adding support for the Z3 C++ API and the Omega+ library (another set integer library similar to isl). Further along, we plan on targeting a imaging or video library.