Skip to content

Test failures under GCC13&14 #233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
armintoepfer opened this issue Apr 1, 2025 · 13 comments · Fixed by #237
Closed

Test failures under GCC13&14 #233

armintoepfer opened this issue Apr 1, 2025 · 13 comments · Fixed by #237
Labels

Comments

@armintoepfer
Copy link

Thank you for your work on edlib. We've been using it over many years.

Today, I've tried compiling and testing edlib with GCC 13 and 14, but hit a wall. There seems to be a buffer overflow.

Reproduce:

meson setup build_fail --werror --buildtype debugoptimized -Db_sanitize=address,undefined -Db_lundef=true
ninja -C build_fail test

Error message:

1/3 hello           OK              0.04s
2/3 aligner         OK              0.04s
3/3 runTests        FAIL            9.14s   killed by signal 6 SIGABRT
>>> MESON_TEST_ITERATION=1 MALLOC_PERTURB_=145 UBSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 MSAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1:print_stacktrace=1 ASAN_OPTIONS=halt_on_error=1:abort_on_error=1:print_summary=1 /git/edlib/build_fail/runTests
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
stderr:
../edlib/src/edlib.cpp:659:17: runtime error: load of address 0x50300003daf0 with insufficient space for an object of type 'int'
0x50300003daf0: note: pointer points here
 00 70 41 18  0a 00 00 00 be be be be  00 00 00 00 00 00 00 00  03 02 00 00 20 00 00 00  01 0b 00 00
              ^
    #0 0x42831d in myersCalcEditDistanceSemiGlobal ../edlib/src/edlib.cpp:659
    #1 0x42af77 in edlibAlign ../edlib/src/edlib.cpp:201
    #2 0x4039e8 in test12() ../test/runTests.cpp:435
    #3 0x404c0e in runTests() ../test/runTests.cpp:582
    #4 0x402b4f in main ../test/runTests.cpp:69
    #5 0x7ffffe7915cf in __libc_start_call_main (/lib64/libc.so.6+0x295cf) (BuildId: d78a44ae94f1d320342e0ff6c2315b2b589063f8)
    #6 0x7ffffe79167f in __libc_start_main@GLIBC_2.2.5 (/lib64/libc.so.6+0x2967f) (BuildId: d78a44ae94f1d320342e0ff6c2315b2b589063f8)
    #7 0x402c18  (/git/edlib/build_fail/runTests+0x402c18)

SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../edlib/src/edlib.cpp:659:17 in
―――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
@SoapZA
Copy link
Contributor

SoapZA commented Apr 1, 2025

I've done some very hacky debugging and found a buffer underflow:

@@ -591,7 +591,7 @@ static int myersCalcEditDistanceSemiGlobal(const Word* const Peq, const int W,
             bl->score = (bl - 1)->score - hout + WORD_SIZE +
                         calculateBlock(bl->P, bl->M, *Peq_c, hout, bl->P, bl->M);
         } else {
-            while (lastBlock >= firstBlock && bl->score >= k + WORD_SIZE) {
+            while (lastBlock > firstBlock && bl->score >= k + WORD_SIZE) {
                 lastBlock--;
                 bl--;
                 Peq_c--;
@@ -603,7 +603,7 @@ static int myersCalcEditDistanceSemiGlobal(const Word* const Peq, const int W,
         //
         // Reduce the band by decreasing last block if possible.
         if (c % STRONG_REDUCE_NUM == 0) {
-            while (lastBlock >= 0 && lastBlock >= firstBlock && allBlockCellsLarger(*bl, k)) {
+            while (lastBlock > 0 && lastBlock > firstBlock && allBlockCellsLarger(*bl, k)) {
                 lastBlock--;
                 bl--;
                 Peq_c--;
-- 

with these changes the sanitizer warnings go away.

@Martinsos
Copy link
Owner

Hey all! I haven't had much time to work on edlib recently as I am occupied with other projects, so I am a bit out of it, but I am happy to help get this resolved. If you want to create a PR and help me understand exactly what it does, I would be happy to merge it.

Some questions for start:

  • Interesting that this started happening with newer GCCs, what do you think is the reason? What had to change to cause that?
  • I recently had CI run for macos 14 + gcc 13 (https://github.com/Martinsos/edlib/actions/runs/13738943369/job/38425917199), and also ubuntu 24 + gcc 14 (https://github.com/Martinsos/edlib/actions/runs/13738943369/job/38425917182) -> how is it that there was no failure there? Should we update the CI to do some more thorough checks, if it missed this?
  • Hm, I see what you did there @SoapZA . I can't remember off the bat why there is >= there and not >, but I am quite sure I put it for a reason, so I would be careful with changing it without having a deeper understanding of what this change might affect. How much have you observed this piece of code, do you have any insights to share?

@armintoepfer
Copy link
Author

I'll try getting your CI testing a debugoptimized build with UB- and ASAN. #234

@SoapZA
Copy link
Contributor

SoapZA commented Apr 1, 2025

@Martinsos what's the rationale for all the "naked"/raw pointers? Using a std::vector + libstdc++'s debug mode makes it trivial to find the line that causes the issue.

@Martinsos
Copy link
Owner

@Martinsos what's the rationale for all the "naked"/raw pointers? Using a std::vector + libstdc++'s debug mode makes it trivial to find the line that causes the issue.

Hah no good rationale: I wrote this code more than 10 years ago on my last year of masters, I wouldn't say I was writing bad code, I was paying a lot of attention to keeping it clean, but my C/C++ knowledge wasn't at a very high level. Neither is it today, last years I am mostly coding in Haskell and Javascript/Typescript.

Which is what makes it even a bit hard for me to track the conversation in this very github issue, as I am out of the loop on he professional C++ dev in the last X years. But I can probably catch up stuff quickly if provided with some extra context, and am happy to support improvements!

@Martinsos
Copy link
Owner

Btw at one point I did create an issue to use smart pointers: #84 -> but at that point I probably had a better idea what I wanted to do then I have now, I would need some reminding.

@blawrence-ont
Copy link
Contributor

I took a look at this previously and I believe this is a GCC bug because:

  1. The issue isn't reported if optimisations are disabled.
  2. The issue doesn't reproduce with clang's UBSan.
  3. ASAN (both clang and GCC) doesn't report an invalid load, even if Block has its padding filled.
  4. The issue isn't reported if blocks is changed to a std::vector<Block>.
  5. Printing the address of bl before the error is reported shows that it's at the same address as blocks (and maxNumBlocks was non-0).

At the time I didn't have time to try and make a minimized repro case, but if you're hitting it for 2 versions of GCC then it might be worth trying to make one.

FWIW we've ran (part of) this library through sanitizers as part of dorado for the past 2 years and haven't caught anything, though obviously that doesn't mean that there isn't a problem hiding in an edge case!

@SoapZA
Copy link
Contributor

SoapZA commented Apr 7, 2025

Use the following patch:

From cd2525706b3f16999c8694c4ab42d9a3ca86d8dc Mon Sep 17 00:00:00 2001
From: David Seifert <[email protected]>
Date: Mon, 7 Apr 2025 16:01:42 +0200
Subject: [PATCH] Find the bug using libstdc++'s debug mode

---
 edlib/src/edlib.cpp | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/edlib/src/edlib.cpp b/edlib/src/edlib.cpp
index 3cae48c..0d5f56f 100644
--- a/edlib/src/edlib.cpp
+++ b/edlib/src/edlib.cpp
@@ -557,9 +557,7 @@ static int myersCalcEditDistanceSemiGlobal(
     // lastBlock is 0-based index of last block in Ukkonen band.
     int firstBlock = 0;
     int lastBlock = min(ceilDiv(k + 1, WORD_SIZE), maxNumBlocks) - 1; // y in Myers
-    Block *bl; // Current block
-
-    Block* blocks = new Block[maxNumBlocks];
+    std::vector<Block> blocks(maxNumBlocks);
 
     // For HW, solution will never be larger then queryLength.
     if (mode == EDLIB_MODE_HW) {
@@ -571,7 +569,7 @@ static int myersCalcEditDistanceSemiGlobal(
     const int STRONG_REDUCE_NUM = 2048;
 
     // Initialize P, M and score
-    bl = blocks;
+    auto bl = blocks.begin(); // Current block
     for (int b = 0; b <= lastBlock; b++) {
         bl->score = (b + 1) * WORD_SIZE;
         bl->P = static_cast<Word>(-1); // All 1s
@@ -588,7 +586,7 @@ static int myersCalcEditDistanceSemiGlobal(
 
         //----------------------- Calculate column -------------------------//
         int hout = startHout;
-        bl = blocks + firstBlock;
+        bl = blocks.begin() + firstBlock;
         Peq_c += firstBlock;
         for (int b = firstBlock; b <= lastBlock; b++) {
             hout = calculateBlock(bl->P, bl->M, *Peq_c, hout, bl->P, bl->M);
@@ -649,7 +647,6 @@ static int myersCalcEditDistanceSemiGlobal(
                 *numPositions_ = static_cast<int>(positions.size());
                 copy(positions.begin(), positions.end(), *positions_);
             }
-            delete[] blocks;
             return EDLIB_STATUS_OK;
         }
         //------------------------------------------------------------------//
@@ -699,7 +696,6 @@ static int myersCalcEditDistanceSemiGlobal(
         copy(positions.begin(), positions.end(), *positions_);
     }
 
-    delete[] blocks;
     return EDLIB_STATUS_OK;
 }
 
-- 
2.49.0

compile the codebase using meson setup --buildtype debug -Dcpp_debugstl=true build . and you'll get with ninja test:

/usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14/debug/safe_iterator.h:895:
In function:
    gnu_debug::_Safe_iterator<_Iterator, _Sequence, 
    std::random_access_iterator_tag> gnu_debug::_Safe_iterator<_Iterator, 
    _Sequence, std::random_access_iterator_tag>::operator--(int) [with 
    _Iterator = gnu_cxx::normal_iterator<Block*, std::vector<Block, 
    std::allocator<Block> > >; _Sequence = std::debug::vector<Block>]

Error: attempt to decrement a dereferenceable (start-of-sequence) iterator.

Objects involved in the operation:
    iterator "this" @ 0x7ffe226cb9f0 {
      type = gnu_cxx::normal_iterator<Block*, std::vector<Block, std::allocator<Block> > > (mutable iterator);
      state = dereferenceable (start-of-sequence);
      references sequence with type 'std::debug::vector<Block, std::allocator<Block> >' @ 0x7ffe226cba20
    }

with the stacktrace showing

#0  0x00007ffff7aa26cc in ?? () from /usr/lib64/libc.so.6
#1  0x00007ffff7a4a426 in raise () from /usr/lib64/libc.so.6
#2  0x00007ffff7a3234b in abort () from /usr/lib64/libc.so.6
#3  0x00007ffff7ca3ca1 in __gnu_debug::_Error_formatter::_M_error() const () from /usr/lib/gcc/x86_64-pc-linux-gnu/15/libstdc++.so.6
#4  0x0000555555563bad in __gnu_debug::_Safe_iterator<__gnu_cxx::__normal_iterator<Block*, std::__cxx1998::vector<Block, std::allocator<Block> > >, std::__debug::vector<Block, std::allocator<Block> >, std::random_access_iterator_tag>::operator-- (this=0x7ffffffecf90)
    at /usr/lib/gcc/x86_64-pc-linux-gnu/14/include/g++-v14/debug/safe_iterator.h:895
#5  0x000055555555f524 in myersCalcEditDistanceSemiGlobal (Peq=0x55555558dfb0, W=8, maxNumBlocks=1, queryLength=56, target=0x5555555a69a3 "\020\002\a\v\006\001\n\017\006", targetLength=4037, k=38, mode=EDLIB_MODE_SHW, bestScore_=0x7ffffffed14c, positions_=0x7ffffffed190, 
    numPositions_=0x7ffffffed150) at ../edlib/src/edlib.cpp:609
#6  0x000055555555e200 in edlibAlign (queryOriginal=0x555555593800 "\001\003\005\005\374\370\003\006\373\373\t\002\002\373\377\370\006\370\377\375\002\004\371\370\370", <incomplete sequence \375>, queryLength=56, 
    targetOriginal=0x55555559fad0 "\a\005\006\a\371\a\t\371", <incomplete sequence \367>, targetLength=7336, config=...) at ../edlib/src/edlib.cpp:248
#7  0x00005555555571b2 in runRandomTests (numTests=100, mode=EDLIB_MODE_HW, findAlignment=true) at ../test/runTests.cpp:115
#8  0x0000555555556dd4 in main (argc=1, argv=0x7fffffffd4d8) at ../test/runTests.cpp:45

The problem really is the "undershoot" in

            while (lastBlock >= firstBlock && bl->score >= k + WORD_SIZE) {
                lastBlock--; bl--; Peq_c--;
            }

which is UB. Logically, that whole loop makes no sense, since if you ignore the second term, after the loop bl will always be before firstBlock, hence my question above.

@blawrence-ont
Copy link
Contributor

@SoapZA Good catch. However note the increment that follows:

edlib/edlib/src/edlib.cpp

Lines 624 to 630 in 1e8f7ee

// For HW, even if all cells are > k, there still may be solution in next
// column because starting conditions at upper boundary are 0.
// That means that first block is always candidate for solution,
// and we can never end calculation before last column.
if (mode == EDLIB_MODE_HW && lastBlock == -1) {
lastBlock++; bl++; Peq_c++;
}

For all intents and purposes this undoes the out-of-bounds decrement. But from a C++ language lawyer side this is indeed UB since it's no longer pointing into the addressable area (or the past the end iterator) that blocks is pointing to.

I guess a "fix" would be to pad blocks to hold an additional element so that when indexing back by 1 (even if only temporarily) it's safe to do so, ie something like:

Block* blocks_with_padding = new Block[1 + maxNumBlocks];
Block* blocks = blocks_with_padding + 1;

Obviously a neater fix would be preferred, but that explains why GCC (and only GCC) is complaining about it - recent versions have improved object bounds checks (fortification stuff).

I wonder if using rend() or similar would allow for a before-the-start iterator? Hmm, actually now I'm confused why rend() isn't UB too.

@SoapZA
Copy link
Contributor

SoapZA commented Apr 7, 2025

rend(): https://devblogs.microsoft.com/oldnewthing/20211112-00/?p=105908 reverse iterators always point to +1 of the logical element.

@SoapZA
Copy link
Contributor

SoapZA commented Apr 7, 2025

other idea: what about going away from pointers and just using indices? those can obviously be negative and are independent of GCC's object range checks, and they make indirection through operator[] easier to debug.

@SoapZA
Copy link
Contributor

SoapZA commented Apr 7, 2025

The following code doesn't trigger for me anymore:

From 1a191ade4229dd73f7a5061c10347e9d145b67dd Mon Sep 17 00:00:00 2001
From: David Seifert <[email protected]>
Date: Mon, 7 Apr 2025 16:01:42 +0200
Subject: [PATCH] Use logical indices over pointers

---
 edlib/src/edlib.cpp | 37 ++++++++++++++++---------------------
 1 file changed, 16 insertions(+), 21 deletions(-)

diff --git a/edlib/src/edlib.cpp b/edlib/src/edlib.cpp
index 3cae48c..ce2b56d 100644
--- a/edlib/src/edlib.cpp
+++ b/edlib/src/edlib.cpp
@@ -557,9 +557,7 @@ static int myersCalcEditDistanceSemiGlobal(
     // lastBlock is 0-based index of last block in Ukkonen band.
     int firstBlock = 0;
     int lastBlock = min(ceilDiv(k + 1, WORD_SIZE), maxNumBlocks) - 1; // y in Myers
-    Block *bl; // Current block
-
-    Block* blocks = new Block[maxNumBlocks];
+    std::vector<Block> blocks(maxNumBlocks);
 
     // For HW, solution will never be larger then queryLength.
     if (mode == EDLIB_MODE_HW) {
@@ -571,43 +569,42 @@ static int myersCalcEditDistanceSemiGlobal(
     const int STRONG_REDUCE_NUM = 2048;
 
     // Initialize P, M and score
-    bl = blocks;
     for (int b = 0; b <= lastBlock; b++) {
-        bl->score = (b + 1) * WORD_SIZE;
-        bl->P = static_cast<Word>(-1); // All 1s
-        bl->M = static_cast<Word>(0);
-        bl++;
+        blocks[b].score = (b + 1) * WORD_SIZE;
+        blocks[b].P = static_cast<Word>(-1); // All 1s
+        blocks[b].M = static_cast<Word>(0);
     }
 
     int bestScore = -1;
     vector<int> positions; // TODO: Maybe put this on heap?
     const int startHout = mode == EDLIB_MODE_HW ? 0 : 1; // If 0 then gap before query is not penalized;
     const unsigned char* targetChar = target;
+    int bl = 0;
     for (int c = 0; c < targetLength; c++) { // for each column
         const Word* Peq_c = Peq + (*targetChar) * maxNumBlocks;
 
         //----------------------- Calculate column -------------------------//
         int hout = startHout;
-        bl = blocks + firstBlock;
+        bl = firstBlock;
         Peq_c += firstBlock;
         for (int b = firstBlock; b <= lastBlock; b++) {
-            hout = calculateBlock(bl->P, bl->M, *Peq_c, hout, bl->P, bl->M);
-            bl->score += hout;
+            hout = calculateBlock(blocks[bl].P, blocks[bl].M, *Peq_c, hout, blocks[bl].P, blocks[bl].M);
+            blocks[bl].score += hout;
             bl++; Peq_c++;
         }
         bl--; Peq_c--;
         //------------------------------------------------------------------//
 
         //---------- Adjust number of blocks according to Ukkonen ----------//
-        if ((lastBlock < maxNumBlocks - 1) && (bl->score - hout <= k) // bl is pointing to last block
+        if ((lastBlock < maxNumBlocks - 1) && (blocks[bl].score - hout <= k) // bl is pointing to last block
             && ((*(Peq_c + 1) & WORD_1) || hout < 0)) { // Peq_c is pointing to last block
             // If score of left block is not too big, calculate one more block
             lastBlock++; bl++; Peq_c++;
-            bl->P = static_cast<Word>(-1); // All 1s
-            bl->M = static_cast<Word>(0);
-            bl->score = (bl - 1)->score - hout + WORD_SIZE + calculateBlock(bl->P, bl->M, *Peq_c, hout, bl->P, bl->M);
+            blocks[bl].P = static_cast<Word>(-1); // All 1s
+            blocks[bl].M = static_cast<Word>(0);
+            blocks[bl].score = blocks[bl - 1].score - hout + WORD_SIZE + calculateBlock(blocks[bl].P, blocks[bl].M, *Peq_c, hout, blocks[bl].P, blocks[bl].M);
         } else {
-            while (lastBlock >= firstBlock && bl->score >= k + WORD_SIZE) {
+            while (lastBlock >= firstBlock && blocks[bl].score >= k + WORD_SIZE) {
                 lastBlock--; bl--; Peq_c--;
             }
         }
@@ -617,7 +614,7 @@ static int myersCalcEditDistanceSemiGlobal(
         //
         // Reduce the band by decreasing last block if possible.
         if (c % STRONG_REDUCE_NUM == 0) {
-            while (lastBlock >= 0 && lastBlock >= firstBlock && allBlockCellsLarger(*bl, k)) {
+            while (lastBlock >= 0 && lastBlock >= firstBlock && allBlockCellsLarger(blocks[bl], k)) {
                 lastBlock--; bl--; Peq_c--;
             }
         }
@@ -649,14 +646,13 @@ static int myersCalcEditDistanceSemiGlobal(
                 *numPositions_ = static_cast<int>(positions.size());
                 copy(positions.begin(), positions.end(), *positions_);
             }
-            delete[] blocks;
             return EDLIB_STATUS_OK;
         }
         //------------------------------------------------------------------//
 
         //------------------------- Update best score ----------------------//
         if (lastBlock == maxNumBlocks - 1) {
-            int colScore = bl->score;
+            int colScore = blocks[bl].score;
             if (colScore <= k) { // Scores > k dont have correct values (so we cannot use them), but are certainly > k.
                 // NOTE: Score that I find in column c is actually score from column c-W
                 if (bestScore == -1 || colScore <= bestScore) {
@@ -679,7 +675,7 @@ static int myersCalcEditDistanceSemiGlobal(
 
     // Obtain results for last W columns from last column.
     if (lastBlock == maxNumBlocks - 1) {
-        vector<int> blockScores = getBlockCellValues(*bl);
+        vector<int> blockScores = getBlockCellValues(blocks[bl]);
         for (int i = 0; i < W; i++) {
             int colScore = blockScores[i + 1];
             if (colScore <= k && (bestScore == -1 || colScore <= bestScore)) {
@@ -699,7 +695,6 @@ static int myersCalcEditDistanceSemiGlobal(
         copy(positions.begin(), positions.end(), *positions_);
     }
 
-    delete[] blocks;
     return EDLIB_STATUS_OK;
 }

SoapZA added a commit to SoapZA/edlib that referenced this issue Apr 14, 2025
* Forming a pointer "one before" the start of a range is UB:
  https://devblogs.microsoft.com/oldnewthing/20211112-00/?p=105908
* GCC 13's UBSAN detects this now:
  https://gcc.gnu.org/cgit/gcc/commit?id=28896b38fabce8
* By using a (signed) index, we can avoid forming invalid pointers.

Fixes Martinsos#233
@Martinsos
Copy link
Owner

Martinsos commented May 11, 2025

Thanks @armintoepfer @blawrence-ont and @SoapZA: thank you all for this discussion, I learned quite a bit, and I believe this is fixed now on master -> I merged the PR from @SoapZA !

A couple of thoughts:

  • While firstBlock and lastBlock were/are indices, I defined bl as a pointer, as you noticed. I think I remember why -> because I found it more readable to do bl-> in multiple places, then blocks[bl].. I agree it is probably not worth it, but it must have felt nicer/sturdier to me at the time when I was writing this code. I however wasn't aware that having a pointer point to location before the array is UB.
  • It is good that we, as a fix now, didn't just change lastBlock >= firstBlock ... lastBlock-- to lastBlock > firstBlock ... lastBlock-- (as proposed at the start of this discussion) because there is a reason why there is >= there: we do want lastBlock to possibly become firstBlock - 1 because that indicates that the ukkonen band doesn't exist any more, there is code later // If band stops to exist finish if (lastBlock < firstBlock) { that would never trigger. Maybe that doesn't even have big impact on performance in most cases or even ever, but it would be an omission nevertheless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants