Detect Data Races with TeSSLa
Data races occur in multi threaded programs when two or more threads access the same memory location concurrently, and at least one of the accesses is for writing, and the threads are not using any exclusive locks to control their accesses to that memory. C program race.c is an example of a race prone program.
Here we have two threads, both calling the functions that should count up to 100. Since there are no synchronization mechanisms between threads on the shared variable
x a data race is present. For example, let us assume that the current value of
x is 5. One thread may read the current value of
x, but before this thread increments
x, the other thread may also read the same value. Then, both threads may store 6 in
x. Such situation happens any time both threads read the same value. So, the final value of
x may vary in each run.
If the file traces.log contains the trace of one execution of the C code, with the TeSSLa specification given bellow, we can do two things. First, we can issue an event in the
dataRace stream when we detect that two threads are reading the shared variable and at least one of them is writing. Second, we can issue an event in
badInterleave stream when we detect that the threads have indeed interleaved in such way that the value of the shared variable is corrupted. An event in dataRace is issued regardless of whether threads interleave correctly or not in the current execution.
TeSSLa provides a command line tool and a web interface which we can use to check the trace and find the data race and interleave errors.
All the above C program, the trace generated by the program and the TeSSLa specification are stored in our COEMS open data portal as dataset Data Race Trace Example 1 . With the web interface of the TeSSla tool , the TeSSLa specification, the trace data and the corresponding C program will be automatically downloaded from the data portal, and you can start to run TeSSLa to detect data race yourself. It is also possible to generate the trace data by running the C code in the tool. In addition, there are APIs for directly checking the TeSSLa specification inside the data portal, and obtain the result immediately.
The Specification Language TeSSLa
TeSSLa can express a lot of different properties like real time properties, reasoning about sequential executions orders, calculating and comparing statistics and so on. From the beginning on, TeSSLa was built in a way that the generated monitors can be synthesized and executed in hardware (more concrete: FPGAs) such that we are able to still process even huge amounts of data online by exploiting the high parallelism of the hardware. This led to a very specific set of basic functions, but these functions can be efficiently rebuild in hardware.
Furthermore, the goal was that industry software engineers are able to understand and write TeSSLa specifications easily. Hence, TeSSLa is equipped with a strong macro system such that we are able to define a huge standard library on top of the basic functions so software engineers can use TeSSLa. It is easy to define new functions based on existing ones if needed. With this standard library it is also feasible to use TeSSLa for creating software monitors, which might also reason about objects, because the very specific basic operators defined for efficiently creating hardware monitors are not needed to write specifications (neither for hardware nor for software monitors). Of course, for complete freedom you might want to use them, but for most of the general use cases, it is not necessary.
Of course, some features are still missing in TeSSLa that are desirable for a specification language. High level data domains would be a possible addition to TeSSLa. Currently, streams can only have simple data domains as types like integers, rational numbers, strings and so on. But data domains like sets, queues or maps would be an interesting addition. Such data domains in TeSSLa would enable us to do quantification in specifications and remember an arbitrary amount of values for later usage. But the addition of something like this needs a distinction between hardware and software monitors. In software, those data types can just be used. But for hardware monitors it is necessary to not allow general sets, queues and maps but restricted versions of those to allow quantification and memorization of values to some extend such that the specification can still be synthesized efficiently on hardware.
More information on TeSSLa can be found here:
- Rapidly Adjustable Non-Intrusive Online Monitoring for Multi-core Systems, SBMF 2017
- TeSSLa: Runtime Verification of Non-synchronized Real-Time Streams (under review)