Skip to content

Unsuccessful diffs forgotten after conflicts are resolved #152

Open
@dracic

Description

@dracic

Issue: Unapplied changes lost after conflict and subsequent successful sync

Description:

When a db-sync pull operation results in conflicts, and these conflicts are not resolved before a subsequent successful sync occurs, the unapplied changes from the initial conflicted sync appear to be lost or forgotten by db-sync.

Steps to Reproduce:

  1. Perform a db-sync pull operation where changes in the Mergin Maps project conflict with changes in the local database's 'modified' schema. Observe that conflicts are reported.
  2. Retry the dbsync operation. Conflicts are still mentioned, indicating the previous conflicts were not fully resolved or applied.
  3. Introduce new, non-conflicting changes to the Mergin Maps project.
  4. Perform another db-sync pull operation. This sync completes successfully, and the database is updated with the new changes from step 3.
  5. Observe that the unapplied changes from step 1 (the initial conflicted sync) are no longer present in the 'modified' schema and are not considered pending changes by db-sync.

Expected Behavior:

db-sync should retain and continue to track unapplied changes resulting from conflicts, prompting the user or providing a mechanism to resolve and apply them, even after subsequent successful sync operations.

Observed Behavior:

Unapplied changes from a conflicted sync are lost after a subsequent successful sync that updates the 'base' schema.

Possible Cause:

Based on the analysis of the db-sync code (dbsync.py) and the behavior of geodiff, the issue likely stems from how conflicts are handled during the rebase process and the subsequent state management.

The pull function in dbsync.py uses _geodiff_rebase when local database changes exist (tmp_base2our is not empty) and there are incoming changes from Mergin Maps (tmp_base2their).

Relevant code snippet from dbsync.py:

    if not needs_rebase:
        logging.debug("Applying new version [no rebase]")
        _geodiff_apply_changeset(conn_cfg.driver, conn_cfg.conn_info, conn_cfg.base, tmp_base2their, ignored_tables)
        _geodiff_apply_changeset(conn_cfg.driver, conn_cfg.conn_info, conn_cfg.modified, tmp_base2their, ignored_tables)
    else:
        logging.debug("Applying new version [WITH rebase]")
        tmp_conflicts = os.path.join(tmp_dir, f"{project_name}-dbsync-pull-conflicts")
        _geodiff_rebase(
            conn_cfg.driver,
            conn_cfg.conn_info,
            conn_cfg.base,
            conn_cfg.modified,
            tmp_base2their,
            tmp_conflicts,
            ignored_tables,
        )
        _geodiff_apply_changeset(conn_cfg.driver, conn_cfg.conn_info, conn_cfg.base, tmp_base2their, ignored_tables)

The _geodiff_rebase function generates a conflict file (tmp_conflicts), but the current db-sync logic does not appear to explicitly read or process this file to re-attempt applying the conflicted changes.

When a conflict occurs during rebase-db, some changes might not be applied to the 'modified' schema. Subsequent dbsync retries might hit the same conflicts if the state hasn't changed. However, if a new sync (Sync 2) occurs with changes that do not conflict with the current state of the 'modified' schema, the rebase-db might succeed for these new changes. Crucially, the _geodiff_apply_changeset call after the rebase applies the tmp_base2their changeset (representing the Mergin Maps changes relative to the original base before the rebase) to the 'base' schema. This updates the 'base' schema to the state after Sync 2.

The unapplied changes from Sync 1, if not successfully integrated into the 'modified' schema during the initial conflicted rebase, are now compared against a new 'base' schema. This change in the reference point can cause db-sync to no longer correctly identify these unapplied changes as pending, effectively losing them from the synchronization process.

Why is this important? We well never be able to map Geopackage to PostgreSQL types perfectly, and this errors will appear sometimes, especially when we use PostgreSQL as the init source, so it is importan that we don't lose conflicted changesets. Sometimes this issues can be resolved with minimal tweaking on PostgreSQL side without another inital sync.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions