Description
Limitation
Threads assume that they will always record leave events for the regions they visit in LIFO order, due to the fact that each thread maintains a stack of OTF2 region definitions for the regions it visits. Any callback that corresponds to entering or leaving a region invokes trace_event_enter
or trace_event_leave
.
Signatures:
void trace_event_enter(trace_location_def_t *self, trace_region_def_t *region);
void trace_event_leave(trace_location_def_t *self);
In trace_event_enter
:
/* Push region onto location's region stack */
stack_push(self->rgn_stack, (data_item_t) {.ptr = region});
In trace_event_leave
:
/* For the region-end event, the region was previously pushed onto the
location's region stack so should now be at the top (as long as regions
are correctly nested) */
trace_region_def_t *region = NULL;
stack_pop(self->rgn_stack, (data_item_t*) ®ion);
Problem
This presents a problem because threads can switch between partially-complete tasks. For example, consider thread x
executing the untied task p
which enters a task-scheduling region, records a region-enter event, pushes the region definition onto its stack and suspends the task. If thread y
then resumes and completes p
, it would record a leave event against the task-scheduling region which x
previously entered - the region-leave event will not be recorded by the thread that recorded the region-enter event, or against the correct region definition, and both threads will appear to have entered a different number of regions than they left.
A similar error is possible with tied tasks, in which region-leave and region-enter events could become unmatched in the trace. A thread will eventually record region-leave events for all region-enter events (since it must eventually complete all the tasks it started) but the task scheduling means the order of these events is not fixed. I suspect a workaround is possible for tied tasks during post-processing by breaking up event sequences at task-switch events and then stitching each event back together with its sub-sequences in the correct order.
Possible Fixes
As this limitation is due to a low-lying design decision I think it will need a fairly significant re-write of Otter. Ideas include:
- Have tasks maintain a stack of the regions encountered instead of threads. Should be possible as there is always a task being executed (implicit if not an explicit task) so threads can just query the task's stack to record events against the correct region definition.
- Have all regions represented by singleton definitions except for those which can be given persistent definitions (parallel & task regions only AFAIK) - I don't like this idea as it might look strange in the trace if a program only appears to have exactly 1 instance of each region...