一:背景
1.讲故事
这篇文章起源于昨天的一位朋友发给我的dump文件,说它的程序出现了卡死,看了下程序的主线程栈,居然又碰到了 OnUserPreferenceChanged
导致的挂死问题,真的是经典中的经典,线程栈如下:
0:000:x86> !clrstack OS Thread Id: 0x4eb688 (0) Child SP IP Call Site 002fed38 0000002b [HelperMethodFrame_1OBJ: 002fed38] System.Threading.WaitHandle.WaitOneNative(System.Runtime.InteropServices.SafeHandle, UInt32, Boolean, Boolean) 002fee1c 5cddad21 System.Threading.WaitHandle.InternalWaitOne(System.Runtime.InteropServices.SafeHandle, Int64, Boolean, Boolean) 002fee34 5cddace8 System.Threading.WaitHandle.WaitOne(Int32, Boolean) 002fee48 538d876c System.Windows.Forms.Control.WaitForWaitHandle(System.Threading.WaitHandle) 002fee88 53c5214a System.Windows.Forms.Control.MarshaledInvoke(System.Windows.Forms.Control, System.Delegate, System.Object[], Boolean) 002fee8c 538dab4b [InlinedCallFrame: 002fee8c] 002fef14 538dab4b System.Windows.Forms.Control.Invoke(System.Delegate, System.Object[]) 002fef48 53b03bc6 System.Windows.Forms.WindowsFormsSynchronizationContext.Send(System.Threading.SendOrPostCallback, System.Object) 002fef60 5c774708 Microsoft.Win32.SystemEvents+SystemEventInvokeInfo.Invoke(Boolean, System.Object[]) 002fef94 5c6616ec Microsoft.Win32.SystemEvents.RaiseEvent(Boolean, System.Object, System.Object[]) 002fefe8 5c660cd4 Microsoft.Win32.SystemEvents.OnUserPreferenceChanged(Int32, IntPtr, IntPtr) 002ff008 5c882c98 Microsoft.Win32.SystemEvents.WindowProc(IntPtr, Int32, IntPtr, IntPtr) ...
说实话,这种dump从去年看到今年,应该不下五次了,都看烦了,其形成原因是:
- 未在主线程中生成用户控件,导致用 WindowsFormsSynchronizationContext.Send 跨线程封送时,对方无法响应请求进而挂死
虽然知道原因,但有一个非常大的遗憾就是在 dump 中找不到到底是哪一个控件,只能笼统的告诉朋友,让其洞察下代码是哪里用了工作线程创建了 用户控件, 有些朋友根据这个信息成功的找到,也有朋友因为各种原因没有找到,比较遗憾。
为了不让这些朋友的遗憾延续下去,这一篇做一个系统归纳,希望能助这些朋友一臂之力。
二:解决方案
1. 背景
这个问题的形成详情,我在去年的一篇文章为:记一次 .NET 某新能源汽车锂电池检测程序 UI挂死分析 中已经做过分享,因为 dump 中找不到问题的 Control,所以也留下了一些遗憾,这一篇就做个补充。
2. 问题突破点分析
熟悉 WinForm 底层的朋友应该知道,一旦在 工作线程 上创建了 Control 控件,框架会自动给这个线程配备一个 WindowsFormsSynchronizationContext
和其底层的 MarshalingControl
,这个是有源码支撑的,大家可以找下 Control 的构造函数,简化后的源码如下:
public class Control : Component { internal Control(bool autoInstallSyncContext) { //*** if (autoInstallSyncContext) { WindowsFormsSynchronizationContext.InstallIfNeeded(); } } } public sealed class WindowsFormsSynchronizationContext : SynchronizationContext, IDisposable { private Control controlToSendTo; private WeakReference destinationThreadRef; public WindowsFormsSynchronizationContext() { DestinationThread = Thread.CurrentThread; Application.ThreadContext threadContext = Application.ThreadContext.FromCurrent(); if (threadContext != null) { controlToSendTo = threadContext.MarshalingControl; } } internal static void InstallIfNeeded() { try { SynchronizationContext synchronizationContext = AsyncOperationManager.SynchronizationContext; if (synchronizationContext == null || synchronizationContext.GetType() == typeof(SynchronizationContext)) { AsyncOperationManager.SynchronizationContext = new WindowsFormsSynchronizationContext(); } } finally { inSyncContextInstallation = false; } } } public sealed class WindowsFormsSynchronizationContext : SynchronizationContext, IDisposable { public WindowsFormsSynchronizationContext() { DestinationThread = Thread.CurrentThread; Application.ThreadContext threadContext = Application.ThreadContext.FromCurrent(); if (threadContext != null) { controlToSendTo = threadContext.MarshalingControl; } } } internal sealed class ThreadContext { internal Control MarshalingControl { get { lock (this) { if (marshalingControl == null) { marshalingControl = new MarshalingControl(); } return marshalingControl; } } } }
这段代码可以挖到下面两点信息。
- 一旦 Control 创建在工作线程上,那这个线程就会安装一个 WindowsFormsSynchronizationContext 变量,比如此时就存在两个对象了。
0:000:x86> !dso OS Thread Id: 0x4eb688 (0) ESP/REG Object Name 002FEC40 025a0fb0 System.Windows.Forms.WindowsFormsSynchronizationContext ... 002FEF44 0260992c System.Object[] (System.Object[]) 002FEF48 02d69164 System.Windows.Forms.WindowsFormsSynchronizationContext ...
工作线程ID
会记录在内部的 destinationThreadRef 字段中,我们试探下02d69164
。
0:000:x86> !do 02d69164 Name: System.Windows.Forms.WindowsFormsSynchronizationContext Fields: MT Field Offset Type VT Attr Value Name ... 533c2204 4002522 8 ...ows.Forms.Control 0 instance 02d69218 controlToSendTo 5cef92d0 4002523 c System.WeakReference 0 instance 02d69178 destinationThreadRef 0:000:x86> !DumpObj /d 02d69178 Name: System.WeakReference MethodTable: 5cef92d0 EEClass: 5cabf0cc Size: 12(0xc) bytes File: C:WindowsMicrosoft.NetassemblyGAC_32mscorlibv4.0_4.0.0.0__b77a5c561934e089mscorlib.dll Fields: MT Field Offset Type VT Attr Value Name 5cee2bdc 400070a 4 System.IntPtr 1 instance 111828 m_handle 0:000:x86> !do poi(111828) Name: System.Threading.Thread Fields: MT Field Offset Type VT Attr Value Name 5cee1638 40018ca 28 System.Int32 1 instance 9 m_ManagedThreadId ...
从上面的输出中可以看到,9号线程
曾经创建了不该创建的 Control,所以找出这个 Control 就是解决问题的关键,这也是最难的。
3. 如何找到问题 Control
以我目前的技术实力,从 dump 中确实找不到,但我可以运行时监测,突破点就是一旦这个 Control 在工作线程中创建,底层会安排一个 WindowsFormsSynchronizationContext 以及 MarshalingControl 对象,我们拦截他们的生成构造就好了。
为了方便讲述,先上一段测试代码,在 backgroundWorker1_DoWork
方法中创建一个 Button 控件。
namespace WindowsFormsApp1 { public partial class Form1 : Form { public Form1() { InitializeComponent(); } private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e) { Button btn = new Button(); } private void Form1_Load(object sender, EventArgs e) { } private void button1_Click(object sender, EventArgs e) { backgroundWorker1.RunWorkerAsync(); } } }
接下来在 MarshalingControl
的构造函数上下一个bp断点来自动化记录,观察 new Button 的时候是否命中。
0:007> !name2ee System_Windows_Forms_ni System.Windows.Forms.Application+MarshalingControl..ctor Module: 5b9b1000 Assembly: System.Windows.Forms.dll Token: 0600554a MethodDesc: 5b9fe594 Name: System.Windows.Forms.Application+MarshalingControl..ctor() JITTED Code Address: 5bb5d1a4 0:007> bp 5bb5d1a4 "!clrstack; gc" 0:007> g OS Thread Id: 0x249c (9) Child SP IP Call Site 067ff2f0 5bb5d1a4 System.Windows.Forms.Application+MarshalingControl..ctor() 067ff2f4 5bb70224 System.Windows.Forms.Application+ThreadContext.get_MarshalingControl() 067ff324 5bb6fe5d System.Windows.Forms.WindowsFormsSynchronizationContext..ctor() 067ff338 5bb6fd4d System.Windows.Forms.WindowsFormsSynchronizationContext.InstallIfNeeded() 067ff364 5bb6e9a0 System.Windows.Forms.Control..ctor(Boolean) 067ff41c 5bbcd5cc System.Windows.Forms.ButtonBase..ctor() 067ff428 5bbcd531 System.Windows.Forms.Button..ctor() 067ff434 02342500 WindowsFormsApp1.Form1.backgroundWorker1_DoWork(System.Object, System.ComponentModel.DoWorkEventArgs) 067ff488 630ee649 System.ComponentModel.BackgroundWorker.OnDoWork(System.ComponentModel.DoWorkEventArgs) [f:ddNDPfxsrccompmodsystemcomponentmodelBackgroundWorker.cs @ 107] 067ff49c 630ee55d System.ComponentModel.BackgroundWorker.WorkerThreadStart(System.Object) [f:ddNDPfxsrccompmodsystemcomponentmodelBackgroundWorker.cs @ 245] 067ff6a0 7c69f036 [HelperMethodFrame_PROTECTOBJ: 067ff6a0] System.Runtime.Remoting.Messaging.StackBuilderSink._PrivateProcessMessage(IntPtr, System.Object[], System.Object, System.Object[] ByRef) 067ff95c 6197c82c System.Runtime.Remoting.Messaging.StackBuilderSink.AsyncProcessMessage(System.Runtime.Remoting.Messaging.IMessage, System.Runtime.Remoting.Messaging.IMessageSink) 067ff9b0 61978274 System.Runtime.Remoting.Proxies.AgileAsyncWorkerItem.DoAsyncCall() [f:ddndpclrsrcBCLsystemruntimeremotingremotingproxy.cs @ 760] 067ff9bc 61978238 System.Runtime.Remoting.Proxies.AgileAsyncWorkerItem.ThreadPoolCallBack(System.Object) [f:ddndpclrsrcBCLsystemruntimeremotingremotingproxy.cs @ 753] 067ff9c0 6104e7b4 System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(System.Object) [f:ddndpclrsrcBCLsystemthreadingthreadpool.cs @ 1274] 067ff9c8 61078604 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) [f:ddndpclrsrcBCLsystemthreadingexecutioncontext.cs @ 980] 067ffa34 61078537 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean) [f:ddndpclrsrcBCLsystemthreadingexecutioncontext.cs @ 928] 067ffa48 6104f445 System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem() [f:ddndpclrsrcBCLsystemthreadingthreadpool.cs @ 1252] 067ffa5c 6104eb7d System.Threading.ThreadPoolWorkQueue.Dispatch() [f:ddndpclrsrcBCLsystemthreadingthreadpool.cs @ 820] 067ffaac 6104e9db System.Threading._ThreadPoolWaitCallback.PerformWaitCallback() [f:ddndpclrsrcBCLsystemthreadingthreadpool.cs @ 1161] 067ffccc 7c69f036 [DebuggerU2MCatchHandlerFrame: 067ffccc]
从线程栈可以清晰的追踪到原来是 backgroundWorker1_DoWork
下的 Button
创建的,这就是问题的根源。。。
三:总结
在我一百多dump的分析旅程中,这个问题真的太高频了,补充此篇真心希望能帮助这些朋友在焦虑中找到问题Control, 一毫之善,与人方便。