深入淺出虛擬內存

通過/proc文件系統探究虛擬內存

大家好!

今天的文章講的是如何通過/proc文件系統找到正在運行的進程的字符串所在的虛擬內存地址,並通過更改此內存地址的內容來更改字符串內容,閒言少敘,進入正題。

wo虛擬內存

虛擬內存是一種實現在計算機軟硬件之間的內存管理技術,它將程序使用到的內存地址(虛擬地址)映射到計算機內存中的物理地址,虛擬內存使得應用程序從繁瑣的管理內存空間任務中解放出來,提高了內存隔離帶來的安全性,虛擬內存地址通常是連續的地址空間,由操作系統的內存管理模塊控制,在觸發缺頁中斷時利用分頁技術將實際的物理內存分配給虛擬內存,而且64位機器虛擬內存的空間大小遠超出實際物理內存的大小,使得進程可以使用比物理內存大小更多的內存空間。

在深入研究虛擬內存前,有幾個關鍵點:

  • 每個進程都有它自己的虛擬內存

  • 虛擬內存的大小取決於系統的體系結構

  • 不同操作管理有着不同的管理虛擬內存的方式,但大多數操作系統的虛擬內存結構如下圖:

圖片來源:Holberton School

上圖並不是特別詳細的內存管理圖,高地址其實還有內核空間等等,但這不是這篇文章的主題。從圖中可以看到高地址存儲着命令行參數和環境變量,之後是棧空間、堆空間和可執行程序,其中棧空間向下延伸,堆空間向上增長,堆空間需要使用malloc分配,是動態分配的內存的一部分。

首先通過一個簡單的C程序探究虛擬內存。

#include <stdlib.h>
#include <stdio.h>
#include <string.h>


/**
 * main - 使用strdup創建一個字符串的拷貝,strdup內部會使用malloc分配空間,
 * 返回新空間的地址,這段地址空間需要外部自行使用free釋放
 *
 * Return: EXIT_FAILURE if malloc failed. Otherwise EXIT_SUCCESS
 */
int main(void)
{
    char *s;


    s = strdup("test_memory");
    if (s == NULL)
    {
        fprintf(stderr, "Can't allocate mem with malloc\n");
        return (EXIT_FAILURE);
    }
    printf("%p\n", (void *)s);
    return (EXIT_SUCCESS);
}


編譯運行:gcc -Wall -Wextra -pedantic -Werror main.c -o test; ./test
輸出:0x88f010

我的機器是64位機器,進程的虛擬內存高地址爲0xffffffffffffffff, 低地址爲0x0,而0x88f010遠小於0xffffffffffffffff,因此大概可以推斷出被複制的字符串的地址(堆地址)是在內存低地址附近,具體可以通過/proc文件系統驗證。
ls /proc目錄可以看到好多文件,這裏主要關注/proc/[pid]/mem和/proc/[pid]/maps

mem & maps

man proc
/proc/[pid]/mem    This file can be used to access the pages of a process's memory through open(2), read(2), and lseek(2).
/proc/[pid]/maps    A  file containing the currently mapped memory regions and their access permissions.          See mmap(2) for some further information about memory mappings.
              The format of the file is:
       address           perms offset  dev   inode       pathname       00400000-00452000 r-xp 00000000 08:02 173521      /usr/bin/dbus-daemon       00651000-00652000 r--p 00051000 08:02 173521      /usr/bin/dbus-daemon       00652000-00655000 rw-p 00052000 08:02 173521      /usr/bin/dbus-daemon       00e03000-00e24000 rw-p 00000000 00:00 0           [heap]       00e24000-011f7000 rw-p 00000000 00:00 0           [heap]       ...       35b1800000-35b1820000 r-xp 00000000 08:02 135522  /usr/lib64/ld-2.15.so       35b1a1f000-35b1a20000 r--p 0001f000 08:02 135522  /usr/lib64/ld-2.15.so       35b1a20000-35b1a21000 rw-p 00020000 08:02 135522  /usr/lib64/ld-2.15.so       35b1a21000-35b1a22000 rw-p 00000000 00:00 0       35b1c00000-35b1dac000 r-xp 00000000 08:02 135870  /usr/lib64/libc-2.15.so       35b1dac000-35b1fac000 ---p 001ac000 08:02 135870  /usr/lib64/libc-2.15.so       35b1fac000-35b1fb0000 r--p 001ac000 08:02 135870  /usr/lib64/libc-2.15.so       35b1fb0000-35b1fb2000 rw-p 001b0000 08:02 135870  /usr/lib64/libc-2.15.so       ...       f2c6ff8c000-7f2c7078c000 rw-p 00000000 00:00 0    [stack:986]       ...       7fffb2c0d000-7fffb2c2e000 rw-p 00000000 00:00 0   [stack]       7fffb2d48000-7fffb2d49000 r-xp 00000000 00:00 0   [vdso]
              The address field is the address space in the process that the mapping occupies.          The perms field is a set of permissions:
                   r = read                   w = write                   x = execute                   s = shared                   p = private (copy on write)
              The offset field is the offset into the file/whatever;          dev is the device (major:minor); inode is the inode on that device.   0  indicates              that no inode is associated with the memory region,          as would be the case with BSS (uninitialized data).
              The  pathname field will usually be the file that is backing the mapping.          For ELF files, you can easily coordinate with the offset field              by looking at the Offset field in the ELF program headers (readelf -l).
              There are additional helpful pseudo-paths:
                   [stack]                          The initial process's (also known as the main thread's) stack.
                   [stack:<tid>] (since Linux 3.4)                          A thread's stack (where the <tid> is a thread ID).              It corresponds to the /proc/[pid]/task/[tid]/ path.
                   [vdso] The virtual dynamically linked shared object.
                   [heap] The process's heap.
              If the pathname field is blank, this is an anonymous mapping as obtained via the mmap(2) function.          There is no easy  way  to  coordinate              this back to a process's source, short of running it through gdb(1), strace(1), or similar.
              Under Linux 2.0 there is no field giving pathname.

通過mem文件可以訪問和修改整個進程的內存頁,通過maps可以看到進程當前已映射的內存區域,有地址和訪問權限偏移量等,從maps中可以看到堆空間是在低地址而棧空間是在高地址.  從maps中可以看到heap的訪問權限是rw,即可寫,所以可以通過堆地址找到上個示例程序中字符串的地址,並通過修改mem文件對應地址的內容,就可以修改字符串的內容啦,程序:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>


/**              
 * main - uses strdup to create a new string, loops forever-ever
 *                
 * Return: EXIT_FAILURE if malloc failed. Other never returns
 */
int main(void)
{
     char *s;
     unsigned long int i;


     s = strdup("test_memory");
     if (s == NULL)
     {
          fprintf(stderr, "Can't allocate mem with malloc\n");
          return (EXIT_FAILURE);
     }
     i = 0;
     while (s)
     {
          printf("[%lu] %s (%p)\n", i, s, (void *)s);
          sleep(1);
          i++;
     }
     return (EXIT_SUCCESS);
}
編譯運行:gcc -Wall -Wextra -pedantic -Werror main.c -o loop; ./loop
輸出:
[0] test_memory (0x21dc010)
[1] test_memory (0x21dc010)
[2] test_memory (0x21dc010)
[3] test_memory (0x21dc010)
[4] test_memory (0x21dc010)
[5] test_memory (0x21dc010)
[6] test_memory (0x21dc010)
...

這裏可以寫一個腳本通過/proc文件系統找到字符串所在位置並修改其內容,相應的輸出也會更改。

首先找到進程的進程號

ps aux | grep ./loop | grep -v grep
zjucad    2542  0.0  0.0   4352   636 pts/3    S+   12:28   0:00 ./loop

2542即爲loop程序的進程號,cat /proc/2542/maps得到

00400000-00401000 r-xp 00000000 08:01 811716                             /home/zjucad/wangzhiqiang/loop
00600000-00601000 r--p 00000000 08:01 811716                             /home/zjucad/wangzhiqiang/loop
00601000-00602000 rw-p 00001000 08:01 811716                             /home/zjucad/wangzhiqiang/loop
021dc000-021fd000 rw-p 00000000 00:00 0                                  [heap]
7f2adae2a000-7f2adafea000 r-xp 00000000 08:01 8661324                    /lib/x86_64-linux-gnu/libc-2.23.so
7f2adafea000-7f2adb1ea000 ---p 001c0000 08:01 8661324                    /lib/x86_64-linux-gnu/libc-2.23.so
7f2adb1ea000-7f2adb1ee000 r--p 001c0000 08:01 8661324                    /lib/x86_64-linux-gnu/libc-2.23.so
7f2adb1ee000-7f2adb1f0000 rw-p 001c4000 08:01 8661324                    /lib/x86_64-linux-gnu/libc-2.23.so
7f2adb1f0000-7f2adb1f4000 rw-p 00000000 00:00 0
7f2adb1f4000-7f2adb21a000 r-xp 00000000 08:01 8661310                    /lib/x86_64-linux-gnu/ld-2.23.so
7f2adb3fa000-7f2adb3fd000 rw-p 00000000 00:00 0
7f2adb419000-7f2adb41a000 r--p 00025000 08:01 8661310                    /lib/x86_64-linux-gnu/ld-2.23.so
7f2adb41a000-7f2adb41b000 rw-p 00026000 08:01 8661310                    /lib/x86_64-linux-gnu/ld-2.23.so
7f2adb41b000-7f2adb41c000 rw-p 00000000 00:00 0
7ffd51bb3000-7ffd51bd4000 rw-p 00000000 00:00 0                          [stack]
7ffd51bdd000-7ffd51be0000 r--p 00000000 00:00 0                          [vvar]
7ffd51be0000-7ffd51be2000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

看見堆地址範圍021dc000-021fd000,並且可讀可寫,而且  021dc000<0x21dc010<021fd000,這就可以確認字符串的地址在堆中,在堆中的索引是0x10(至於爲什麼是0x10,後面會講到),這時可以通過mem文件到0x21dc010地址修改內容,字符串輸出的內容也會隨之更改,這裏通過python腳本實現此功能。

#!/usr/bin/env python3
'''            
Locates and replaces the first occurrence of a string in the heap
of a process    


Usage: ./read_write_heap.py PID search_string replace_by_string
Where:          
- PID is the pid of the target process
- search_string is the ASCII string you are looking to overwrite
- replace_by_string is the ASCII string you want to replace
  search_string with
'''


import sys


def print_usage_and_exit():
    print('Usage: {} pid search write'.format(sys.argv[0]))
    sys.exit(1)


# check usage  
if len(sys.argv) != 4:
    print_usage_and_exit()


# get the pid from args
pid = int(sys.argv[1])
if pid <= 0:
    print_usage_and_exit()
search_string = str(sys.argv[2])
if search_string  == "":
    print_usage_and_exit()
write_string = str(sys.argv[3])
if search_string  == "":
    print_usage_and_exit()


# open the maps and mem files of the process
maps_filename = "/proc/{}/maps".format(pid)
print("[*] maps: {}".format(maps_filename))
mem_filename = "/proc/{}/mem".format(pid)
print("[*] mem: {}".format(mem_filename))


# try opening the maps file
try:
    maps_file = open('/proc/{}/maps'.format(pid), 'r')
except IOError as e:
    print("[ERROR] Can not open file {}:".format(maps_filename))
    print("        I/O error({}): {}".format(e.errno, e.strerror))
    sys.exit(1)


for line in maps_file:
    sline = line.split(' ')
    # check if we found the heap
    if sline[-1][:-1] != "[heap]":
        continue
    print("[*] Found [heap]:")


    # parse line
    addr = sline[0]
    perm = sline[1]
    offset = sline[2]
    device = sline[3]
    inode = sline[4]
    pathname = sline[-1][:-1]
    print("\tpathname = {}".format(pathname))
    print("\taddresses = {}".format(addr))
    print("\tpermisions = {}".format(perm))
    print("\toffset = {}".format(offset))
    print("\tinode = {}".format(inode))


    # check if there is read and write permission
    if perm[0] != 'r' or perm[1] != 'w':
        print("[*] {} does not have read/write permission".format(pathname))
        maps_file.close()
        exit(0)


    # get start and end of the heap in the virtual memory
    addr = addr.split("-")
    if len(addr) != 2: # never trust anyone, not even your OS :)
        print("[*] Wrong addr format")
        maps_file.close()
        exit(1)
    addr_start = int(addr[0], 16)
    addr_end = int(addr[1], 16)
    print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end))


    # open and read mem
    try:
        mem_file = open(mem_filename, 'rb+')
    except IOError as e:
        print("[ERROR] Can not open file {}:".format(mem_filename))
        print("        I/O error({}): {}".format(e.errno, e.strerror))
        maps_file.close()
        exit(1)


    # read heap  
    mem_file.seek(addr_start)
    heap = mem_file.read(addr_end - addr_start)


    # find string
    try:
        i = heap.index(bytes(search_string, "ASCII"))
    except Exception:
        print("Can't find '{}'".format(search_string))
        maps_file.close()
        mem_file.close()
        exit(0)
    print("[*] Found '{}' at {:x}".format(search_string, i))


    # write the new string
    print("[*] Writing '{}' at {:x}".format(write_string, addr_start + i))
    mem_file.seek(addr_start + i)
    mem_file.write(bytes(write_string, "ASCII"))


    # close files
    maps_file.close()
    mem_file.close()


    # there is only one heap in our example
    break

運行這個Python腳本

zjucad@zjucad-ONDA-H110-MINI-V3-01:~/wangzhiqiang$ sudo ./loop.py 2542 test_memory test_hello
[*] maps: /proc/2542/maps
[*] mem: /proc/2542/mem
[*] Found [heap]:
        pathname = [heap]
        addresses = 021dc000-021fd000
        permisions = rw-p
        offset = 00000000
        inode = 0
        Addr start [21dc000] | end [21fd000]
[*] Found 'test_memory' at 10
[*] Writing 'test_hello' at 21dc010

同時字符串輸出的內容也已更改

[633] test_memory (0x21dc010)
[634] test_memory (0x21dc010)
[635] test_memory (0x21dc010)
[636] test_memory (0x21dc010)
[637] test_memory (0x21dc010)
[638] test_memory (0x21dc010)
[639] test_memory (0x21dc010)
[640] test_helloy (0x21dc010)
[641] test_helloy (0x21dc010)
[642] test_helloy (0x21dc010)
[643] test_helloy (0x21dc010)
[644] test_helloy (0x21dc010)
[645] test_helloy (0x21dc010)

實驗成功!

本文是參考17年Julien Barbier發佈在Holberton的文章, 結合自己理解提煉出來的,代碼都自己實踐過,有問題的朋友可以在下方評論區留言,我看到會回覆,歡迎大家分享交流

原文:

https://blog.holbertonschool.com/hack-the-virtual-memory-c-strings-proc/

喜歡本文的朋友,歡迎關注公衆號 程序員小灰,收看更多精彩內容

點個[在看],是對小灰最大的支持!
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章