From 1ae1c1ff3f344c6c1cec8f4b78c725e8e4b9a134 Mon Sep 17 00:00:00 2001 From: Alan Frindell Date: Fri, 15 May 2015 10:42:48 -0700 Subject: [PATCH] Fix race issue in EventBase::runInEventBaseThreadAndWait (t6779663) Summary: Huge thanks to @afrind for debugging this issue with me and found the root cause. As per the comment from @afrind for the diff D1823407, there is a tricky race issue. The main thread could have left and reset the condition_variable from its stack but the EventBase thread tries to access it afterwards due to race and could be blocked indefinitely. This caused the server-side IO threads not able to pick up the incoming connections for the proxygen case. The fix is to use a simpler struct barrier and get a hold of the shared_ptr instead of the same object in a safer way. Test Plan: The original issue reproes very easily in HDFS XDC encryption case. Servers easily enter into bad state and we got high volume of timeouts from the client. With the fix, this does not happen anymore after the fix being deployed at 11:15PM. Here is the Scuba log before and after the fix: https://fburl.com/109969805 And here is the correspond Scuba diagram for successful calls in the same test: https://fburl.com/109971729 The throughput improved a lot after the fix. Reviewed By: davejwatson@fb.com Subscribers: doug, folly-diffs@, yfeldblum, chalfant, afrind, brettp, dougw, fma FB internal diff: D2071646 Signature: t1:2071646:1431709609:10fb033536f9e4fb428dea8ba68f6a9a051616c0 --- folly/io/async/EventBase.cpp | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/folly/io/async/EventBase.cpp b/folly/io/async/EventBase.cpp index 1c3ad397..daf4049b 100644 --- a/folly/io/async/EventBase.cpp +++ b/folly/io/async/EventBase.cpp @@ -599,8 +599,10 @@ bool EventBase::runInEventBaseThreadAndWait(const Cob& fn) { SCOPE_EXIT { std::unique_lock l(m); ready = true; - l.unlock(); cv.notify_one(); + // We cannot release the lock before notify_one, because a spurious + // wakeup in the waiting thread may lead to cv and m going out of scope + // prematurely. }; fn(); }); -- 2.34.1