记一次PyQT5 core dump调试过程

1. 首先设置系统允许生成core dump文件

步骤一:开启core dump文件生成

ulimit -c unlimited

步骤二:设置core dump文件位置

vi /etc/sysctl.conf

修改(添加)如下两个变量

kernel.core_pattern =/var/core/core_%e_%p

kernel.core_uses_pid= 0

这里是改为生成目录在/var/core/,%e代表程序名称,%p是进程ID

如果想直接生成在可执行文件相同目录,前面不要加任何目录,直接

kernel.core_pattern =core_%e_%p

步骤三:让修改生效

sysctl -p/etc/sysctl.conf

2. 准备工作

准备工作

安装 gdb 和 python2.7-dbg:

sudo apt-get install gdb python2.7-dbg

设置 /proc/sys/kernel/yama/ptrace_scope:

echo 0 |sudo tee /proc/sys/kernel/yama/ptrace_scope

3. gdb调试Python core文件的方法

使用gdb打开并调试core文件

gdb python core
$ gdb python core
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...Reading symbols from /usr/lib/debug/.build-id/04/9b3068eb18127661de41257e012a54934fb0ee.debug...done.
done.


[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `python2 mantra_hmi_pro.py'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f04fdf36428 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
54	../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f047f7fe700 (LWP 31079))]

可用的 python 相关的命令

可以通过输入 py 然后加 tab 键的方式来查看可用的命令:

(gdb) py
py-bt               py-down             py-locals           py-up               python-interactive
py-bt-full          py-list             py-print            python

可以通过 help cmd 查看各个命令的说明:

(gdb) help py-bt
Display the current python frame and all the frames within its call stack (if any)
(gdb) py-list 
 258                        else:
 259                            self.window.label_3.setText("Joint3 ( %.3f deg )" % float(curr_joints[2] / pi * 180.0))
 260                    if curr_joints[3] < 0:
 261                        if float(curr_joints[3] / pi * 180.0) <= -100:
 262                            self.window.label_4.setText("Joint4 (%.2f deg )" % float(curr_joints[3] / pi * 180.0))
>263                        else:
 264                            self.window.label_4.setText("Joint4 (%.3f deg )" % float(curr_joints[3] / pi * 180.0))
 265                    else:
 266                        if float(curr_joints[3] / pi * 180.0) >= 100:
 267                            self.window.label_4.setText("Joint4 ( %.2f deg )" % float(curr_joints[3] / pi * 180.0))
 268                        else:
(gdb) py-bt
Traceback (most recent call first):
  File "mantra_hmi_pro.py", line 264, in run
    self.window.label_4.setText("Joint4 (%.3f deg )" % float(curr_joints[3] / pi * 180.0))

 可以看出,程序挂在264行,self.window.label_4.setText函数处

(gdb) bt
#0  0x00007f04fdf36428 in __GI_raise (sig=sig@entry=6)
    at ../sysdeps/unix/sysv/linux/raise.c:54
#1  0x00007f04fdf3802a in __GI_abort () at abort.c:89
#2  0x00007f04fdf787ea in __libc_message (do_abort=do_abort@entry=2, 
    fmt=fmt@entry=0x7f04fe091ed8 "*** Error in `%s': %s: 0x%s ***\n")
    at ../sysdeps/posix/libc_fatal.c:175
#3  0x00007f04fdf83651 in malloc_printerr (ar_ptr=0x7f047f7fd400, ptr=0x7f0470003bdf, 
    str=0x7f04fe0922e0 "malloc(): memory corruption (fast)", action=3) at malloc.c:5006
#4  _int_malloc (av=av@entry=0x7f0470000020, bytes=bytes@entry=68) at malloc.c:3386
#5  0x00007f04fdf85184 in __GI___libc_malloc (bytes=68) at malloc.c:2913
#6  0x00007f04e254ee28 in QArrayData::allocate(unsigned long, unsigned long, unsigned long, QFlags<QArrayData::AllocationOption>) () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#7  0x00007f04e25dba63 in QString::QString(int, Qt::Initialization) ()
   from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#8  0x00007f04e278c5c7 in ?? () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#9  0x00007f04e25e2182 in QString::fromUtf8_helper(char const*, int) ()
   from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#10 0x00007f04e25e21f4 in QString::fromAscii_helper(char const*, int) ()
   from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
#11 0x00007f04e2b93523 in ?? ()
   from /usr/lib/python2.7/dist-packages/PyQt5/QtCore.x86_64-linux-gnu.so
#12 0x00007f04df802c3c in ?? () from /usr/lib/python2.7/dist-packages/sip.x86_64-linux-gnu.so
#13 0x00007f04df804ab0 in ?? () from /usr/lib/python2.7/dist-packages/sip.x86_64-linux-gnu.so
#14 0x00007f04df8052a3 in ?? () from /usr/lib/python2.7/dist-packages/sip.x86_64-linux-gnu.so
#15 0x00007f04df805510 in ?? () from /usr/lib/python2.7/dist-packages/sip.x86_64-linux-gnu.so
#16 0x00007f04db03d8fc in ?? ()
   from /usr/lib/python2.7/dist-packages/PyQt5/QtWidgets.x86_64-linux-gnu.so
---Type <return> to continue, or q <return> to quit---
#17 0x00000000004bc9ba in call_function (oparg=<optimized out>, pp_stack=0x7f047f7fd920)
    at ../Python/ceval.c:4350
#18 PyEval_EvalFrameEx () at ../Python/ceval.c:2987
#19 0x00000000004ba036 in PyEval_EvalCodeEx () at ../Python/ceval.c:3582
#20 0x00000000004d5909 in function_call.lto_priv () at ../Objects/funcobject.c:523
#21 0x00000000004eec9e in PyObject_Call (kw=0x0, 
    arg=(<WindowThread(window=<MyWindow(fp=<file at remote 0x7f04da327d20>, xyz_step=<float at remote 0x237e5a0>, group='arm', mCmButton_1=<QPushButton at remote 0x7f04c40ada68>, label_18=<QLabel at remote 0x7f04c40aaa68>, label_step_1=<QLabel at remote 0x7f04d9f0bd60>, comboBox=<QComboBox at remote 0x7f04c40aa0e8>, label_14=<QLabel at remote 0x7f04c40ad8a0>, label_3=<QLabel at remote 0x7f04c40aa8a0>, label_12=<QLabel at remote 0x7f04c40ad640>, label_13=<QLabel at remote 0x7f04c40ad808>, label_10=<QLabel at remote 0x7f04c40ad770>, label_11=<QLabel at remote 0x7f04c40ad478>, setHomeButton=<QPushButton at remote 0x7f04d9f0b8a0>, label_5=<QLabel at remote 0x7f04d9f0bf28>, label_6=<QLabel at remote 0x7f04c40aa3e0>, label_7=<QLabel at remote 0x7f04d9f0bc30>, label_1=<QLabel at remote 0x7f04c40aa938>, label_2=<QLabel at remote 0x7f04c40aa808>, horizontalSlider=<QSlider at remote 0x7f04d9f0b770>, joint_step=<float at remote 0x237e588>, label_8=<QLabel at remote 0x7f04d9f0b808>, label_9=<QLabel at remote 0x7f04c40ad348>, pus...(truncated), func=<function at remote 0x7f04d8aee758>) at ../Objects/abstract.c:2546
#22 instancemethod_call.lto_priv () at ../Objects/classobject.c:2602
#23 0x00000000004a5a9e in PyObject_Call () at ../Objects/abstract.c:2546
#24 0x00000000004c6380 in PyEval_CallObjectWithKeywords () at ../Python/ceval.c:4219
#25 0x00007f04df801e84 in ?? () from /usr/lib/python2.7/dist-packages/sip.x86_64-linux-gnu.so
#26 0x00007f04e2afd700 in ?? ()
   from /usr/lib/python2.7/dist-packages/PyQt5/QtCore.x86_64-linux-gnu.so
#27 0x00007f04e2bbf1b3 in ?? ()
   from /usr/lib/python2.7/dist-packages/PyQt5/QtCore.x86_64-linux-gnu.so
#28 0x00007f04e254d7be in ?? () from /usr/lib/x86_64-linux-gnu/libQt5Core.so.5
---Type <return> to continue, or q <return> to quit---
#29 0x00007f04fe2d26ba in start_thread (arg=0x7f047f7fe700) at pthread_create.c:333
#30 0x00007f04fe00841d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

从调用堆栈可以看出,程序挂是因为QArrayData::allocate函数中malloc失败,即问题在于Python传给QT的QString::QString。

代码中setText传入的是Python字符串,在pyqt5中,这是合法的。但是具体什么原因导致传入字符串时产生core dump原因未知。

4. 问题解决

上面的debug已经找到了导致core dump的代码,即PyQT5的label.setText()函数,但是这个函数仅仅是写一下界面中label的字符串而已,没有数组访问越界等问题,而最终的错误出现在malloc,并且是malloc(): memory corruption即内存破坏。

虽然没有找到问题的直接解决办法,因为label.setText()不存在多线程同步互斥方面的问题,仅一个线程会执行此函数。

重点来了,label.setText()所处的这个线程是界面刷新线程,即用来更新界面数据的线程。要解决这个问题,肯定是从这个界面刷新线程入手。先看看,界面刷新线程的写法:

class WindowThread(QtCore.QThread):
    def __init__(self, window_):
        super(WindowThread, self).__init__()
        self.window = window_

    def run(self):
        global movej_rad_deg_flag
        r = rospy.Rate(2)  # 2hz
        time.sleep(1)  # 休眠一秒等待界面初始化

        while not rospy.is_shutdown():
            # print(curr_joints)
            # 关节角刷新显示
            if movej_rad_deg_flag is 0:
                if curr_joints[0] < 0:
                    self.window.label_1.setText("Joint1 (%.3f rad )" % float(curr_joints[0]))
            ...
            ...

这个线程是可以工作的,它继承自QtCore.QThread,并且传入主窗口实例,关键就是这个传入实例的过程:self.window = window_,这应该是导致界面中label更新字符串时出错的原因。但经过测试,线程中的self.window和传入的window_的id是一样的,即两个是同一对象,指向同一内存地址。这就很玄学了!!!

不管怎么样,反正这么写界面更新线程肯定是存在问题的,参考博文——PyQt5多线程刷新界面防假死,重写下界面更新线程:

class UpdateThread(QtCore.QThread):
    """
    界面刷新线程,通过信号通知主窗口类中实现的刷新函数

    这很重要,通过将主窗口实例传入界面刷新线程实例的方法不可行
    """
    update_signal = pyqtSignal()

    def __init__(self):
        super(UpdateThread, self).__init__()

    def __del__(self):
        self.wait()

    def run(self):
        r = rospy.Rate(5)  # 2hz
        while not rospy.is_shutdown():
            self.update_signal.emit()
            r.sleep()
    
    def stop(self):
        self.terminate()

主窗口类:

class MyWindow(QtWidgets.QWidget, Ui_Form):
    def __init__(self):
        super(MyWindow, self).__init__()
        
        # 启动界面刷新进程
        self.update_thread = UpdateThread()  # 创建线程
        self.update_thread.update_signal.connect(self.update)  # 连接信号
        self.update_thread.start()

    def update(self):
        self.label1.setText("test")

即在主窗口对象中创建界面更新线程对象,然后在界面更新线程中通过信号,通知主窗口对象中的update()函数更新界面信息。这样的话就不用传递主窗口对象到线程对象中了,果然,这的确解决了问题!!!

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章