by Andrei Lascu
The aim of the project is to provide a generalized infrastructure which generates randomly fuzzed inputs and then performs metamorphic testing on these inputs for a specific C++ software library under test. The test generation for the library under test is guided by two configuration files, in YAML format, that have to be provided by the user: one of them declares how the input data is fuzzed, while the other declares metamorphic relations, as well as generation rules for special objects (e.g. empty sets, universe sets).
At the moment, the infrastructure is split into three main components:
api_fuzzer
is the component which, based on the corresponding configuration
file, emits API calls required to build some random input data; the validity
and quality of the input data is primarily driven by the configuration file
provided by the userset_meta_tester
is the component which, given an input variable produced by
the api_fuzzer
, will produce a number of metamorphic variants using
metamorphic relations provided in the corresponding configuration filetest_emitter
is the entry-point, which links the two other components
together and additionally synthesizes the other required parts of the output
test file (e.g. includes, main function definition)In addition to the infrastructure itself, we provide configuration files for
input fuzzing and metamorphic testing of isl
and PPL
.
The project is publicly available at its Github repo.
In this section, we provide a brief overview of metamorphic testing and the main targeted library, isl, along with some references for further reading.
In testing, one of the biggest issues to overcome is what is known as the oracle problem. That is, for a given test input, an oracle is required which is able to determine what the correct output is. For some small problems, the oracle can be a human tracing the program instructions. However, a golden oracle, one that is able to tell us to output for all test cases in all systems, is often unavailable even for moderately-complex systems.
Metamorphic testing is one technique which is able to overcome this issue. The main idea behind metamorphic testing is using equivalence relations (known as metamorphic relations) in order to generate more test inputs, increasing overall coverage of the test suite. The metamorphic relations are usually manually defined by an expert of the system under test, although some work exists which attempts to synthesize such relations via machine learning.
The integer set library (isl) is a software library for Presburger arithmetic. It offers implementations for various arithmetic objects, such as sets, maps or piece-wise affine expressions, as well as a rich set of operations over them (e.g. convex hull for sets, arithmetic operations over piece-wise affine expressions). The rich set of mathematical operations was what inspired us to start with metamorphic testing for isl, as there are numerous metamorphic relations to be defined using these operations.
One application of isl is in polyhedral compilation, which involves reasoning about run-time execution of programs and possible rewrites to make better use of existing resources. For example, one could rewrite array accesses in a loop to make better use of memory locality, or merge two loops into one by reasoning about correlations between memory accesses. isl is used to model iteration spaces, data dependencies, memory access, execution schedules, all of which are either analysed or optimised in the context of polyhedral scheduling.
In this section, we discuss how the fuzzing and metamorphic testing phases of the test generation are performed. These are the main two components of the full test generation, with the remainder essentially being gluing the two together and adding the rest of the requirements to build a valid C++ program.
The fuzzing process involves generating a sequence of API calls of the library under test in order to generate one object which is to be taken as input to the metamorphic testing process. The method of generating such object is up to the user, and of course there are various ways of generating the same object time, some better than others.
The generation process is described by the user via a YAML configuration file. Currently, the user must provide the following:
singleton_types
, meaning that the fuzzer
should only instantiate one object of that type in the generation processunsigned int
s and string
smember_type
field should
contain the name of the type the function is a member ofreturn_type
field should be
the type of the return objectparam_types
should be a list of type names that the function takes as
argumentsmember_type
or return_type
definitionsfunc
declares that a function call should be generated; the user can
specify the following:
func
field represents the function namereturn
field specifies which variable the result should be
returned totarget
field specifies what variable the function should be
applied tofunc_params
field specifies the parameters to the function call
and should be used if at least one of the parameters should have a
special valuefor
represents a for loop comprised of function calls; the user can
define the counter
field, which should be a range declaring the start
and end values of the iteration counter. Note that these values must both
be integers. In addition, the user can also define the fields for the
underlying function call, as described above.In addition to the above, the fuzzer is able to interpret special values in the code. These are:
<string=N>
is interpreted as a string with value N<new=N>
is interpreted as a new object of type N (i.e. ensure that a new
object is created rather than reusing an existing one)<var=N>
is interpreted as a variable with name N; the type is derived from
the function declaration<loop_counter>
can only be used in a for
type operation and represents
the current value of the loop counter<input=N>
is interpreted as the value of input N, where N must be the name
of one of the defined inputsThe fuzzer reads in one-by-one the sequence of operations used to define how to generate the end object. When it calls a function, all the required objects (e.g. the target object to call the function on, the result object and the function parameters) are either chosen from the existing objects, or some are generated on the spot. The generation is also done by randomly choosing a suitable function (i.e. one that returns an object of that type), thus it could be a constructor or another function call. This leads to a chain reaction of multiple objects being created across various function calls by asking the fuzzer to only execute one function.
For our testing, we have prepared two fuzzing strategies for isl sets. The
first one creates piece-wise affine expressions via
arithmetic operations and then uses those expressions as constraints over
integer sets. The issue with this approach is that, as we increase the number of
constraints to generate, they quickly become incompatible, leading to fuzzing
sets which evaluate to false
(the rate of false sets generated are roughly 55%
over 10k tests). Nevertheless, this was the main approach used and it was
successful in finding bugs. Recently, after discussion with the developers of
isl, we came up with another approach for
generating sets, based on points. We define a number of points in the space and
then use those to create a set. The issue at the moment is that, since the
points are randomly generated from existing functions and the constructor does
not explicitly take values for the dimensions of the points, but instead sets
all dimensions to 0, most of the generated points are set to 0 in our sets. We
have a plan to guide the generation process to make minimal use of the
constructor, but that is not implemented at this point.
The metamorphic relations are similarly described in a YAML configuration file. They are defined by using actual library code, as you would normally write a program with. For example, our isl metamorphic relations file defines a list of union relations thus:
union:
%m = %1.unite(%2)
%m = $1.complement().intersect(%2.complement()).complement()
This means that both both these sequences of calls are equivalent to performing
the union of two sets. In the notation above, %1
and %2
both represent
input objects, which mean isl sets for this particular configuration file.
Furthermore, all the function calls used (e.g. unite
, complement
,
intersect
) are functions that exist in isl and can be applied to sets.
Finally, %m
is a notation representing the current metamorphic variable for
the test and it means that we must store the result in the variable (i.e. the
input objects are not modified in-place).
Further, one might observe that the second relation is an application of a DeMorgan law translated into isl function calls. This shows that we can use knowledge from other fields to generate rich metamorphic relations.
The following table describes the sort of metamorphic relations we have defined for isl and currently employ in our testing:
Relation type | Metamorphic relation |
---|---|
Identity | %1.unite(%1) |
%1.unite(%e) |
|
%1.intersect(%1) |
|
%1.intersect(%u) |
|
%1.subtract(%e) |
|
%u.subtract(%1.complement()) |
|
%1.intersect(%1.unite(%2)) |
|
%1.unite(%1.intersect(%2)) |
|
%1.complement().complement() |
|
%1.coalesce() |
|
%1.detect_equalities() |
|
Complement | %1.complement() |
%u.subtract(%1) |
|
Subtract | %1.subtract(%2) |
%1.intersect(%2.complement()) |
|
Union | %1.unite(%2) |
%1.complement().intersect(%2.complement()).complement() |
|
Intersect | %1.intersect(%2) |
%1.complement().unite(%2.complement()).complement() |
Note that in addition to the modifier for input objects (i.e. %1
and %2
), there are two more modifiers (%u
and %e
), which represent special sets. We also have a list of generators, where we define how to obtain such sets. In this instance, %u
is meant to be the universe set, while %e
is meant to be the empty set. The generator relations are:
Generation object | Relation |
---|---|
Universe | %1.unite(%1.complement()) |
%e.complement() |
|
Empty | %1.intersect(%1.complement()) |
%1.subtract(%1) |
|
%1.subtract(%u) |
|
$1.intersect(%e) |
|
%u.complement() |
As can be observed, these relations can also be chained together, leading to increasingly complex test cases being produced.
Our metamorphic test generation comprises of:
%
) with
corresponding variable names or generator relations; this yields a
metamorphic variantThe full metamorphic relations file can be seen here.
At the moment, we have reported three bugs to the developers of isl via their Google groups page. Two of these bugs have been confirmed and fixed by the developers, with a third one reported and awaiting confirmation. Further, we have run numerous tests (around 70k total) and have three further test cases which emit an assertion failure. One of the three has the same message as the most recently reported and unconfirmed bug, while the other two share the error message. This puts our total of found bugs at 4 at the least.
The reported bugs and respective commit fixes, if applicable:
isl_set_is_subset
regarding identical integer divisions:
report
commitisl_coalesce
regarding shifted facets before wrapping:
report
commitisl_tab.c
with message “Assertion i >= 0
failed:
reportAt this point, there are two major changes to be done to the project: adding an automated reducer to the generated tests and interleaving the input fuzzer with the metamorphic tester.
Of course, there are also some straightforward future extensions, such as adding more libraries to target in order to evaluate whether or not the current design of the configuration files is sufficient or many code improvements that could be made. However, we are still focusing on adding more features to the project.
Currently, there are three approaches under consideration regarding how to do automatic test-case reduction:
stderr
contains some specific error message
string, the program crashes with a specific return code)This is something work has begun on, but has yet to finish. Currently, the infrastructure first produces some object via the fuzzer, emitting all the API calls required for this, then creates all the metamorphic variants on this single object. We want to interleave the two, believing that this will allow us to build richer and more complex tests.
The idea is that along the generation process, we would be able to emit metamorphic tests for either intermediate values of the fuzzed object or for other compatible temporary objects, in addition to the metamorphic tests emitted one the final object.
To illustrate, the current implementation is presented in:
.
Note that the entirety of the input object is constructed as per the rules defined in the fuzzer before any metamorphic testing is done.
The proposed change is illustrated in:
At random points throughout the fuzzer process, we can emit some metamorphic tests. The input variable to these tests, while it has to be of the type specified in the metamorphic testing file, could be any variable in scope of that type, not just the eventual generated variable with an intermediate value.
In order to determine the limitations of our tool, the best approach is to try testing various libraries across multiple domains and fix any assumptions made or add extra features. Currently, in the near future, we plan on adding support for the Z3 C++ API and the Omega+ library (another set integer library similar to isl). Further along, we plan on targeting a imaging or video library.