Race conditions in the Qserv worker

Description

The following crash was observed in Qserv worker code:

Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00007fedece2c455 in std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) () from /lib64/libstdc++.so.6 (gdb) where #0 0x00007fedece2c455 in std::_Rb_tree_insert_and_rebalance(bool, std::_Rb_tree_node_base*, std::_Rb_tree_node_base*, std::_Rb_tree_node_base&) () from /lib64/libstdc++.so.6 #1 0x00007fede919a01a in std::_Rb_tree<int, std::pair<int const, lsst::qserv::wpublish::QueriesAndChunks::ChunkTimePercent>, std::_Select1st<std::pair<int const, lsst::qserv::wpublish::QueriesAndChunks::ChunkTimePercent> >, std::less<int>, std::allocator<std::pair<int const, lsst::qserv::wpublish::QueriesAndChunks::ChunkTimePercent> > >::_M_insert_node ( this=this@entry=0x7fe92c001408, __x=__x@entry=0x0, __p=<optimized out>, __z=__z@entry=0x7fe92c44c890) at /usr/include/c++/11/bits/stl_tree.h:2337 #2 0x00007fede919b189 in std::_Rb_tree<int, std::pair<int const, lsst::qserv::wpublish::QueriesAndChunks::ChunkTimePercent>, std::_Select1st<std::pair<int const, lsst::qserv::wpublish::QueriesAndChunks::ChunkTimePercent> >, std::less<int>, std::allocator<std::pair<int const, lsst::qserv::wpublish::QueriesAndChunks::ChunkTimePercent> > >::_M_emplace_hint_unique<std::piecewise_construct_t const&, std::tuple<int const&>, std::tuple<> >(std::_Rb_tree_const_iterator<std::pair<int const, lsst::qserv::wpublish::QueriesAndChunks::ChunkTimePercent> >, std::piecewise_construct_t const&, std::tuple<int const&>&&, std::tuple<>&&) (this=this@entry=0x7fe92c001408, __pos=...) at /usr/include/c++/11/bits/stl_tree.h:2438 #3 0x00007fede9193955 in std::map<int, lsst::qserv::wpublish::QueriesAndChunks::ChunkTimePercent, std::less<int>, std::allocator<std::pair<int const, lsst::qserv::wpublish::QueriesAndChunks::ChunkTimePercent> > >::operator[] (__k=@0x7feddd2c499c: 162811, this=0x7fe92c001408) at /usr/include/c++/11/bits/stl_tree.h:350 #4 lsst::qserv::wpublish::QueriesAndChunks::_calcScanTableSums[abi:cxx11]() (this=this@entry=0x2458dc0) at /home/gapon/code/qserv/src/wpublish/QueriesAndChunks.cc:455 #5 0x00007fede9195643 in lsst::qserv::wpublish::QueriesAndChunks::examineAll (this=0x2458dc0) at /home/gapon/code/qserv/src/wpublish/QueriesAndChunks.cc:295

Upon further inspection of the code and its dependencies, a potential race condition in using an iterator for a map outside a lock was found.

Issue Matrix

hide

Activity

John Gates April 1, 2025 at 3:08 PM

Looks good.

Igor Gaponenko April 1, 2025 at 1:03 AM

Done
Pinned fields
Click on the next to a field label to start pinning.

Details

Assignee

Reporter

Reviewers

John Gates

RubinTeam

Data Access and Database

Components

Sprint

Checklist

Created April 1, 2025 at 12:38 AM
Updated April 2, 2025 at 2:45 AM
Resolved April 1, 2025 at 11:38 PM