Description
Issue: Unapplied changes lost after conflict and subsequent successful sync
Description:
When a db-sync
pull operation results in conflicts, and these conflicts are not resolved before a subsequent successful sync occurs, the unapplied changes from the initial conflicted sync appear to be lost or forgotten by db-sync
.
Steps to Reproduce:
- Perform a
db-sync
pull operation where changes in the Mergin Maps project conflict with changes in the local database's 'modified' schema. Observe that conflicts are reported. - Retry the
dbsync
operation. Conflicts are still mentioned, indicating the previous conflicts were not fully resolved or applied. - Introduce new, non-conflicting changes to the Mergin Maps project.
- Perform another
db-sync
pull operation. This sync completes successfully, and the database is updated with the new changes from step 3. - Observe that the unapplied changes from step 1 (the initial conflicted sync) are no longer present in the 'modified' schema and are not considered pending changes by
db-sync
.
Expected Behavior:
db-sync
should retain and continue to track unapplied changes resulting from conflicts, prompting the user or providing a mechanism to resolve and apply them, even after subsequent successful sync operations.
Observed Behavior:
Unapplied changes from a conflicted sync are lost after a subsequent successful sync that updates the 'base' schema.
Possible Cause:
Based on the analysis of the db-sync
code (dbsync.py
) and the behavior of geodiff
, the issue likely stems from how conflicts are handled during the rebase process and the subsequent state management.
The pull
function in dbsync.py
uses _geodiff_rebase
when local database changes exist (tmp_base2our
is not empty) and there are incoming changes from Mergin Maps (tmp_base2their
).
Relevant code snippet from dbsync.py
:
if not needs_rebase:
logging.debug("Applying new version [no rebase]")
_geodiff_apply_changeset(conn_cfg.driver, conn_cfg.conn_info, conn_cfg.base, tmp_base2their, ignored_tables)
_geodiff_apply_changeset(conn_cfg.driver, conn_cfg.conn_info, conn_cfg.modified, tmp_base2their, ignored_tables)
else:
logging.debug("Applying new version [WITH rebase]")
tmp_conflicts = os.path.join(tmp_dir, f"{project_name}-dbsync-pull-conflicts")
_geodiff_rebase(
conn_cfg.driver,
conn_cfg.conn_info,
conn_cfg.base,
conn_cfg.modified,
tmp_base2their,
tmp_conflicts,
ignored_tables,
)
_geodiff_apply_changeset(conn_cfg.driver, conn_cfg.conn_info, conn_cfg.base, tmp_base2their, ignored_tables)
The _geodiff_rebase
function generates a conflict file (tmp_conflicts
), but the current db-sync
logic does not appear to explicitly read or process this file to re-attempt applying the conflicted changes.
When a conflict occurs during rebase-db
, some changes might not be applied to the 'modified' schema. Subsequent dbsync
retries might hit the same conflicts if the state hasn't changed. However, if a new sync (Sync 2) occurs with changes that do not conflict with the current state of the 'modified' schema, the rebase-db
might succeed for these new changes. Crucially, the _geodiff_apply_changeset
call after the rebase applies the tmp_base2their
changeset (representing the Mergin Maps changes relative to the original base before the rebase) to the 'base' schema. This updates the 'base' schema to the state after Sync 2.
The unapplied changes from Sync 1, if not successfully integrated into the 'modified' schema during the initial conflicted rebase, are now compared against a new 'base' schema. This change in the reference point can cause db-sync
to no longer correctly identify these unapplied changes as pending, effectively losing them from the synchronization process.
Why is this important? We well never be able to map Geopackage to PostgreSQL types perfectly, and this errors will appear sometimes, especially when we use PostgreSQL as the init source, so it is importan that we don't lose conflicted changesets. Sometimes this issues can be resolved with minimal tweaking on PostgreSQL side without another inital sync.