利用指针进行编程时碰到程序崩掉的解决方法（内存溢出）

程序崩了，咋办？
2012-04-22
标签：程序休闲职场
版权声明：原创作品，允许转载，转载时请务必以超链接形式标明文章原始出处、作者信息和本声明。否则将追究法律责任。http://xzpeter.blog.51cto.com/783279/329052
这两天因为调程序，自己简单的总结了一下C编程中碰到的内存有关的问题和注意事项。
1. 内存溢出是啥？
举个栈溢出的例子。所有的在函数内部申请的局部变量都是保存在栈中的。比如：

    #include <string.h>

    void fn(void)
    {
        char a[100];
        char *p = a;
        bzero(p, 1000);
    }

    int main(int argc, char *argv[])
    {
        fn();
        return 0;
    }

这里，数组a就会保存在栈中。当栈溢出时，最容易出现的问题是返回指针被修改，进而函数返回时会发现返回的代码段指针错误，提示：“stack smashing detected...":

    peter@ubuntu-910:~/codes/testspace$ ./testspace
    *** stack smashing detected ***: <unknown> terminated
    ======= Backtrace: =========
    /lib/tls/i686/cmov/libc.so.6(__fortify_fail+0x48)[0x2f7008]
    /lib/tls/i686/cmov/libc.so.6(__fortify_fail+0x0)[0x2f6fc0]
    [0x80484b2]
    [0x0]
    ======= Memory map: ========
    00215000-00216000 r-xp 00000000 00:00 0          [vdso]
    00216000-00354000 r-xp 00000000 08:07 5206       /lib/tls/i686/cmov/libc-2.10.1.so
    00354000-00355000 ---p 0013e000 08:07 5206       /lib/tls/i686/cmov/libc-2.10.1.so
    00355000-00357000 r--p 0013e000 08:07 5206       /lib/tls/i686/cmov/libc-2.10.1.so
    00357000-00358000 rw-p 00140000 08:07 5206       /lib/tls/i686/cmov/libc-2.10.1.so
    00358000-0035b000 rw-p 00000000 00:00 0
    00c38000-00c4d000 r-xp 00000000 08:07 5220       /lib/tls/i686/cmov/libpthread-2.10.1.so
    00c4d000-00c4e000 r--p 00014000 08:07 5220       /lib/tls/i686/cmov/libpthread-2.10.1.so
    00c4e000-00c4f000 rw-p 00015000 08:07 5220       /lib/tls/i686/cmov/libpthread-2.10.1.so
    00c4f000-00c51000 rw-p 00000000 00:00 0
    00cfc000-00d18000 r-xp 00000000 08:07 4652       /lib/libgcc_s.so.1
    00d18000-00d19000 r--p 0001b000 08:07 4652       /lib/libgcc_s.so.1
    00d19000-00d1a000 rw-p 0001c000 08:07 4652       /lib/libgcc_s.so.1
    00f63000-00f7e000 r-xp 00000000 08:07 5168       /lib/ld-2.10.1.so
    00f7e000-00f7f000 r--p 0001a000 08:07 5168       /lib/ld-2.10.1.so
    00f7f000-00f80000 rw-p 0001b000 08:07 5168       /lib/ld-2.10.1.so
    08048000-08049000 r-xp 00000000 08:08 264941     /home/peter/codes/testspace/testspace
    08049000-0804a000 r--p 00000000 08:08 264941     /home/peter/codes/testspace/testspace
    0804a000-0804b000 rw-p 00001000 08:08 264941     /home/peter/codes/testspace/testspace
    08a74000-08a95000 rw-p 00000000 00:00 0          [heap]
    b785e000-b7860000 rw-p 00000000 00:00 0
    b7874000-b7876000 rw-p 00000000 00:00 0
    bffad000-bffc2000 rw-p 00000000 00:00 0          [stack]
    已放弃

这类问题其实比较简单，起码在linux系统中，在程序崩溃的同时，系统往往会打印出一些backtrace和memory map之类的东西，其中backtrace可以非常有效的让我们发现栈溢出发生的函数位置。如果函数比较深（比如我们这种情况），或者系统没有打印bt的信息，而是直接段错误了，可以用gdb跟踪，然后用backtrace命令看：

    peter@ubuntu-910:~/codes/testspace$ gdb
    GNU gdb (GDB) 7.0-ubuntu
    Copyright (C) 2009 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law. Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "i486-linux-gnu".
    For bug reporting instructions, please see:
    <http://www.gnu.org/software/gdb/bugs/>.
    (gdb) file testspace
    Reading symbols from /home/peter/codes/testspace/testspace...done.
    (gdb) r
    Starting program: /home/peter/codes/testspace/testspace
    [Thread debugging using libthread_db enabled]
    *** stack smashing detected ***: <unknown> terminated
    ======= Backtrace: =========
    /lib/tls/i686/cmov/libc.so.6(__fortify_fail+0x48)[0x228008]
    /lib/tls/i686/cmov/libc.so.6(__fortify_fail+0x0)[0x227fc0]
    [0x80484b2]
    [0x0]
    ======= Memory map: ========
    00110000-0012b000 r-xp 00000000 08:07 5168       /lib/ld-2.10.1.so
    0012b000-0012c000 r--p 0001a000 08:07 5168       /lib/ld-2.10.1.so
    0012c000-0012d000 rw-p 0001b000 08:07 5168       /lib/ld-2.10.1.so
    0012d000-0012e000 r-xp 00000000 00:00 0          [vdso]
    0012e000-00143000 r-xp 00000000 08:07 5220       /lib/tls/i686/cmov/libpthread-2.10.1.so
    00143000-00144000 r--p 00014000 08:07 5220       /lib/tls/i686/cmov/libpthread-2.10.1.so
    00144000-00145000 rw-p 00015000 08:07 5220       /lib/tls/i686/cmov/libpthread-2.10.1.so
    00145000-00147000 rw-p 00000000 00:00 0
    00147000-00285000 r-xp 00000000 08:07 5206       /lib/tls/i686/cmov/libc-2.10.1.so
    00285000-00286000 ---p 0013e000 08:07 5206       /lib/tls/i686/cmov/libc-2.10.1.so
    00286000-00288000 r--p 0013e000 08:07 5206       /lib/tls/i686/cmov/libc-2.10.1.so
    00288000-00289000 rw-p 00140000 08:07 5206       /lib/tls/i686/cmov/libc-2.10.1.so
    00289000-0028c000 rw-p 00000000 00:00 0
    0028c000-002a8000 r-xp 00000000 08:07 4652       /lib/libgcc_s.so.1
    002a8000-002a9000 r--p 0001b000 08:07 4652       /lib/libgcc_s.so.1
    002a9000-002aa000 rw-p 0001c000 08:07 4652       /lib/libgcc_s.so.1
    08048000-08049000 r-xp 00000000 08:08 264941     /home/peter/codes/testspace/testspace
    08049000-0804a000 r--p 00000000 08:08 264941     /home/peter/codes/testspace/testspace
    0804a000-0804b000 rw-p 00001000 08:08 264941     /home/peter/codes/testspace/testspace
    0804b000-0806c000 rw-p 00000000 00:00 0          [heap]
    b7fe8000-b7fea000 rw-p 00000000 00:00 0
    b7ffe000-b8000000 rw-p 00000000 00:00 0
    bffeb000-c0000000 rw-p 00000000 00:00 0          [stack]

    Program received signal SIGABRT, Aborted.
    0x0012d422 in __kernel_vsyscall ()
    (gdb) bt
    #0 0x0012d422 in __kernel_vsyscall ()
    #1 0x001714d1 in raise () from /lib/tls/i686/cmov/libc.so.6
    #2 0x00174932 in abort () from /lib/tls/i686/cmov/libc.so.6
    #3 0x001a7fc5 in ?? () from /lib/tls/i686/cmov/libc.so.6
    #4 0x00228008 in __fortify_fail () from /lib/tls/i686/cmov/libc.so.6
    #5 0x00227fc0 in __stack_chk_fail () from /lib/tls/i686/cmov/libc.so.6
    #6 0x080484b2 in fn () at test.c:8
    #7 0x00000000 in ?? ()

这里便看到了：

    # #6 0x080484b2 in fn () at test.c:8

以便我们锁定问题。
很多时候，当内存溢出问题不严重时，并不会直接终止我们程序的运行。但是，我们会在调试程序中碰到非常奇怪的问题，比如某一个变量无缘无故变成乱码，不管是在堆中，还是栈中。这便很有可能是指针的错误使用导致的。这种情况出现时，一种调试方法是：使用gdb加载程序，并用watch锁定被改成乱码的变量。这样，如果这个变量被修改，程序便会停下来，我们就可以看到底是哪条语句修改了这个程序。
2. 内存泄漏
内存泄漏只会是在堆中申请的内存没有释放而导致的。也就是，我们在malloc()后没有及时的进行free()。这里，可以利用现有的一些软件帮助我们调试，如Valgrind(http://valgrind.org)。使用方法请参见其主页的帮助文档。
3. 缓冲区：能大就大点
很多内存溢出的问题都是因为缓冲区不够大。因此，我们在开辟缓冲区的时候，一定要给使用打出余量，不能每次想申请多少就申请多少，要想到这部分内存的用途，并进行上限估计。估不出来的时候尽量放大点。
当然，不能随便的放大，可能会出现问题，比如：栈内申请空间过大，程序一使用变量直接段错误。
4. snprintf比sprintf好，那么strncpy就比strcpy好？！
有经验的前辈总是这样说：”小同志，不要随便用sprintf()，要用snprintf()，这样如果打印的数据溢出了可以保护呀！“我们发现，这样做虽然要多写一个参数，但是的确比原来的程序安全了！何乐不为。
之后，我们又看到了strncpy()，一看就高兴！又带一个n！马上用了一下：

    #include <stdio.h>
    #include <string.h>

    void fn(void)
    {
        char a[10];
        strncpy(a, "hello", 100);
    }

    int main(int argc, char *argv[])
    {
        fn();
        return 0;
    }

很好，程序崩了。
有心的人早就发现了，长度100明显不对阿。可是有人也就想了，为啥10个字节还不够放"hello"这些玩意呢？man一下才知道：

    STRCPY(3)                                              Linux Programmer's Manual                                              STRCPY(3)

    NAME
           strcpy, strncpy - copy a string

    SYNOPSIS
           #include <string.h>

           char *strcpy(char *dest, const char *src);

           char *strncpy(char *dest, const char *src, size_t n);

    DESCRIPTION
           The strcpy() function copies the string pointed to by src, including the terminating null byte ('\0'), to the buffer pointed to
           by dest. The strings may not overlap, and the destination string dest must be large enough to receive the copy.

           The strncpy() function is similar, except that at most n bytes of src are copied. Warning: If there is no null byte among the
           first n bytes of src, the string placed in dest will not be null terminated.

           If the length of src is less than n, strncpy() pads the remainder of dest with null bytes.

           A simple implementation of strncpy() might be:

               char*
               strncpy(char *dest, const char *src, size_t n){
                   size_t i;

                   for (i = 0 ; i < n && src[i] != '\0' ; i++)
                       dest[i] = src[i];
                   for ( ; i < n ; i++)
                       dest[i] = '\0';

                   return dest;
               }

关键是最后的一句：

    "If the length of src is less than n, strncpy() pads the remainder of dest
    with null bytes. "

也就是说，strncpy并不仅仅是做一个n长度的保护，而会把剩下的字符清为0x00。要知道，snprintf()是没这档子事情的。所以，我们要记住：
snprintf()总是比sprintf()安全，但是strncpy()和strcpy()比就不一定了。
总之，程序出问题是怎么也避免不了的。特别是出现诡异的问题的时候，要学会冷静分析产生问题的结果。往往这些问题都是我们编程过程中的错误导致的，而不是我们见鬼了。要对自己解决问题的能力有信心嘛！
程序这东西就是这样，用好了，越用越顺手；用不好，死都不知道怎么死的。

二        sovle the the error:"stack 已放弃"：*** glibc detected *** free(): invalid pointer:解决方法
最近写了一个snmp方面的程序，就是实现snmp的get操作，在编译运行后报了标题上的错误，先在网上搜索了下，发现很多都有同样的错误，大概原因是在用mallc申请了存储空间后所返回的指针在之后的操作中使所返回的指针的指向发生了变化，例如：

*sp = (char *)malloc (10*sizeof(int));

while (*sp != '/0'){
          c = *sp ++;
          fputc(c,fp);
      }
      free(sp);

这样对sp进行操作以后，使sp的指向发生了变化，在运行时就会出现free invalid pointer的错误

解决办法很简单（可能有点幼稚，如果高手有更好的方法，请指教）

在对sp malloc以后申明一个char *p;的指针，用于保存sp的指向，如下：

char *p;

p = sp;

而后在while操作完成后，free之前将p重新赋给sp就可以了

sp ＝ p；

最后free就不会有问题了

利用指针进行编程时碰到程序崩掉的解决方法（内存溢出）

.NET有哪些好用的定时任务调度框架

Python 将PDF转为PDF/A、PDF/X，以及PDF/A转回PDF

elk3

Kafka存储机制

aws语音呼叫调用，告警电话

深度学习框架火焰图pprof和CUDA Nsys配置指南

爬虫两种绕过5s盾的方法

【转】[C#] WebAPI 防止并发调用二（冥等性）

【转】[SQL Server]关掉 SSMS 的 IntelliSense

号称能打败MLP的KAN到底行不行？数学核心原理全面解析

red hat下的yum安裝

利用指針進行編程時碰到程序崩掉的解決方法（內存溢出）

在QT中，通過QMessageBox類或者其他顯示中文

動態庫和靜態庫的生成 linux c／c＋＋

轉Configure，Makefile.am, Makefile.in, Makefile文件之間的關係

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結