Threads assume they will record leave events in LIFO order (can be violated for tasks)

### Limitation

Threads assume that they will always record leave events for the regions they visit in LIFO order, due to the fact that each thread maintains a stack of OTF2 region definitions for the regions it visits. Any callback that corresponds to entering or leaving a region invokes `trace_event_enter` or `trace_event_leave`.

Signatures:

```c
void trace_event_enter(trace_location_def_t *self, trace_region_def_t *region);
void trace_event_leave(trace_location_def_t *self);
```

In `trace_event_enter`:

```c
/* Push region onto location's region stack */
stack_push(self->rgn_stack, (data_item_t) {.ptr = region});
```

In `trace_event_leave`:

```c
/* For the region-end event, the region was previously pushed onto the 
   location's region stack so should now be at the top (as long as regions
   are correctly nested) */
trace_region_def_t *region = NULL;
stack_pop(self->rgn_stack, (data_item_t*) &region);
```
### Problem

This presents a problem because threads can switch between partially-complete tasks. For example, consider thread `x` executing the untied task `p` which enters a task-scheduling region, records a region-enter event, pushes the region definition onto its stack and suspends the task. If thread `y` then resumes and completes `p`, it would record a leave event against the task-scheduling region which `x` previously entered - the region-leave event will not be recorded by the thread that recorded the region-enter event, or against the correct region definition, and both threads will appear to have entered a different number of regions than they left.

A similar error is possible with tied tasks, in which region-leave and region-enter events could become unmatched in the trace. A thread will eventually record region-leave events for all region-enter events (since it must eventually complete all the tasks it started) but the task scheduling means the order of these events is not fixed. I suspect a workaround is possible for tied tasks during post-processing by breaking up event sequences at task-switch events and then stitching each event back together with its sub-sequences in the correct order.

### Possible Fixes

As this limitation is due to a low-lying design decision I think it will need a fairly significant re-write of Otter. Ideas include:

- Have tasks maintain a stack of the regions encountered instead of threads. Should be possible as there is always a task being executed (implicit if not an explicit task) so threads can just query the task's stack to record events against the correct region definition.
- Have all regions represented by singleton definitions except for those which can be given persistent definitions (parallel & task regions only AFAIK) - I don't like this idea as it might look strange in the trace if a program only appears to have exactly 1 instance of each region...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Threads assume they will record leave events in LIFO order (can be violated for tasks) #12

Limitation

Problem

Possible Fixes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Threads assume they will record leave events in LIFO order (can be violated for tasks) #12

Description

Limitation

Problem

Possible Fixes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions