by Andrei Lascu
The aim of the project is to provide a generalized infrastructure which generates randomly fuzzed inputs and then performs metamorphic testing on these inputs for a specific C++ software library under test. The test generation for the library under test is guided by two configuration files, in YAML format, that have to be provided by the user: one of them declares how the input data is fuzzed, while the other declares metamorphic relations, as well as generation rules for special objects (e.g. empty sets, universe sets).
At the moment, the infrastructure is split into three main components:
api_fuzzeris the component which, based on the corresponding configuration file, emits API calls required to build some random input data; the validity and quality of the input data is primarily driven by the configuration file provided by the user
set_meta_testeris the component which, given an input variable produced by the
api_fuzzer, will produce a number of metamorphic variants using metamorphic relations provided in the corresponding configuration file
test_emitteris the entry-point, which links the two other components together and additionally synthesizes the other required parts of the output test file (e.g. includes, main function definition)
The project is publicly available at its Github repo.
In this section, we provide a brief overview of metamorphic testing and the main targeted library, isl, along with some references for further reading.
In testing, one of the biggest issues to overcome is what is known as the oracle problem. That is, for a given test input, an oracle is required which is able to determine what the correct output is. For some small problems, the oracle can be a human tracing the program instructions. However, a golden oracle, one that is able to tell us to output for all test cases in all systems, is often unavailable even for moderately-complex systems.
Metamorphic testing is one technique which is able to overcome this issue. The main idea behind metamorphic testing is using equivalence relations (known as metamorphic relations) in order to generate more test inputs, increasing overall coverage of the test suite. The metamorphic relations are usually manually defined by an expert of the system under test, although some work exists which attempts to synthesize such relations via machine learning.
The integer set library (isl) is a software library for Presburger arithmetic. It offers implementations for various arithmetic objects, such as sets, maps or piece-wise affine expressions, as well as a rich set of operations over them (e.g. convex hull for sets, arithmetic operations over piece-wise affine expressions). The rich set of mathematical operations was what inspired us to start with metamorphic testing for isl, as there are numerous metamorphic relations to be defined using these operations.
One application of isl is in polyhedral compilation, which involves reasoning about run-time execution of programs and possible rewrites to make better use of existing resources. For example, one could rewrite array accesses in a loop to make better use of memory locality, or merge two loops into one by reasoning about correlations between memory accesses. isl is used to model iteration spaces, data dependencies, memory access, execution schedules, all of which are either analysed or optimised in the context of polyhedral scheduling.
In this section, we discuss how the fuzzing and metamorphic testing phases of the test generation are performed. These are the main two components of the full test generation, with the remainder essentially being gluing the two together and adding the rest of the requirements to build a valid C++ program.
The fuzzing process involves generating a sequence of API calls of the library under test in order to generate one object which is to be taken as input to the metamorphic testing process. The method of generating such object is up to the user, and of course there are various ways of generating the same object time, some better than others.
The generation process is described by the user via a YAML configuration file. Currently, the user must provide the following:
singleton_types, meaning that the fuzzer should only instantiate one object of that type in the generation process
unsigned ints and
member_typefield should contain the name of the type the function is a member of
return_typefield should be the type of the return object
param_typesshould be a list of type names that the function takes as arguments
funcdeclares that a function call should be generated; the user can specify the following:
funcfield represents the function name
returnfield specifies which variable the result should be returned to
targetfield specifies what variable the function should be applied to
func_paramsfield specifies the parameters to the function call and should be used if at least one of the parameters should have a special value
forrepresents a for loop comprised of function calls; the user can define the
counterfield, which should be a range declaring the start and end values of the iteration counter. Note that these values must both be integers. In addition, the user can also define the fields for the underlying function call, as described above.
In addition to the above, the fuzzer is able to interpret special values in the code. These are:
<string=N>is interpreted as a string with value N
<new=N>is interpreted as a new object of type N (i.e. ensure that a new object is created rather than reusing an existing one)
<var=N>is interpreted as a variable with name N; the type is derived from the function declaration
<loop_counter>can only be used in a
fortype operation and represents the current value of the loop counter
<input=N>is interpreted as the value of input N, where N must be the name of one of the defined inputs
The fuzzer reads in one-by-one the sequence of operations used to define how to generate the end object. When it calls a function, all the required objects (e.g. the target object to call the function on, the result object and the function parameters) are either chosen from the existing objects, or some are generated on the spot. The generation is also done by randomly choosing a suitable function (i.e. one that returns an object of that type), thus it could be a constructor or another function call. This leads to a chain reaction of multiple objects being created across various function calls by asking the fuzzer to only execute one function.
For our testing, we have prepared two fuzzing strategies for isl sets. The
first one creates piece-wise affine expressions via
arithmetic operations and then uses those expressions as constraints over
integer sets. The issue with this approach is that, as we increase the number of
constraints to generate, they quickly become incompatible, leading to fuzzing
sets which evaluate to
false (the rate of false sets generated are roughly 55%
over 10k tests). Nevertheless, this was the main approach used and it was
successful in finding bugs. Recently, after discussion with the developers of
isl, we came up with another approach for
generating sets, based on points. We define a number of points in the space and
then use those to create a set. The issue at the moment is that, since the
points are randomly generated from existing functions and the constructor does
not explicitly take values for the dimensions of the points, but instead sets
all dimensions to 0, most of the generated points are set to 0 in our sets. We
have a plan to guide the generation process to make minimal use of the
constructor, but that is not implemented at this point.
The metamorphic relations are similarly described in a YAML configuration file. They are defined by using actual library code, as you would normally write a program with. For example, our isl metamorphic relations file defines a list of union relations thus:
%m = %1.unite(%2)
%m = $1.complement().intersect(%2.complement()).complement()
This means that both both these sequences of calls are equivalent to performing
the union of two sets. In the notation above,
%2 both represent
input objects, which mean isl sets for this particular configuration file.
Furthermore, all the function calls used (e.g.
intersect) are functions that exist in isl and can be applied to sets.
%m is a notation representing the current metamorphic variable for
the test and it means that we must store the result in the variable (i.e. the
input objects are not modified in-place).
Further, one might observe that the second relation is an application of a DeMorgan law translated into isl function calls. This shows that we can use knowledge from other fields to generate rich metamorphic relations.
The following table describes the sort of metamorphic relations we have defined for isl and currently employ in our testing:
|Relation type||Metamorphic relation|
Note that in addition to the modifier for input objects (i.e.
%2), there are two more modifiers (
%e), which represent special sets. We also have a list of generators, where we define how to obtain such sets. In this instance,
%u is meant to be the universe set, while
%e is meant to be the empty set. The generator relations are:
As can be observed, these relations can also be chained together, leading to increasingly complex test cases being produced.
Our metamorphic test generation comprises of:
%) with corresponding variable names or generator relations; this yields a metamorphic variant
The full metamorphic relations file can be seen here.
At the moment, we have reported three bugs to the developers of isl via their Google groups page. Two of these bugs have been confirmed and fixed by the developers, with a third one reported and awaiting confirmation. Further, we have run numerous tests (around 70k total) and have three further test cases which emit an assertion failure. One of the three has the same message as the most recently reported and unconfirmed bug, while the other two share the error message. This puts our total of found bugs at 4 at the least.
The reported bugs and respective commit fixes, if applicable:
isl_set_is_subsetregarding identical integer divisions: report commit
isl_coalesceregarding shifted facets before wrapping: report commit
isl_tab.cwith message “Assertion
i >= 0failed: report
At this point, there are two major changes to be done to the project: adding an automated reducer to the generated tests and interleaving the input fuzzer with the metamorphic tester.
Of course, there are also some straightforward future extensions, such as adding more libraries to target in order to evaluate whether or not the current design of the configuration files is sufficient or many code improvements that could be made. However, we are still focusing on adding more features to the project.
Currently, there are three approaches under consideration regarding how to do automatic test-case reduction:
stderrcontains some specific error message string, the program crashes with a specific return code)
This is something work has begun on, but has yet to finish. Currently, the infrastructure first produces some object via the fuzzer, emitting all the API calls required for this, then creates all the metamorphic variants on this single object. We want to interleave the two, believing that this will allow us to build richer and more complex tests.
The idea is that along the generation process, we would be able to emit metamorphic tests for either intermediate values of the fuzzed object or for other compatible temporary objects, in addition to the metamorphic tests emitted one the final object.
To illustrate, the current implementation is presented in:
Note that the entirety of the input object is constructed as per the rules defined in the fuzzer before any metamorphic testing is done.
The proposed change is illustrated in:
At random points throughout the fuzzer process, we can emit some metamorphic tests. The input variable to these tests, while it has to be of the type specified in the metamorphic testing file, could be any variable in scope of that type, not just the eventual generated variable with an intermediate value.
In order to determine the limitations of our tool, the best approach is to try testing various libraries across multiple domains and fix any assumptions made or add extra features. Currently, in the near future, we plan on adding support for the Z3 C++ API and the Omega+ library (another set integer library similar to isl). Further along, we plan on targeting a imaging or video library.