輕量級容器程序

陽哥的程序:https://github.com/Pro-YY/jail

主進程:

  1. argp_parse 解析輸入參數。
  2. 用 root 身份建立了 cgroup(限制一組進程的資源),rlimit(限制一個進程或者一個用戶的資源)。
  3. 調用帶 CLONE_NEW* 標誌的 clone() 創建子進程。
  4. 配置網絡。
  5. 寫 eventfd,通知子進程,子進程繼續執行。
  6. 向 epoll 中註冊信號、超時事件。開啓事件循環,讀取並處理 epoll 中的信號、超時事件。

子進程:

  1. 在新的命名空間內獲取參數,包括要執行的命令等。
  2. 修改主機名。
  3. 在 eventfd 的讀事件上阻塞,讀到後繼續執行。
  4. 配置網絡,掛載 rootfs,seccomp 限制系統調用,prctl 限制功能。
  5. 調用 execve 執行命令。

用法

$ ./jail --help
Usage: jail [OPTION...] [<program> [<argument>...]]
Jail, a pretty sandbox to run program.

  -b, --base=STRING          Mount base dir, default to '/tmp'
  -d, --detach               Detach process as deaemon
  -e, --env=STRING           Environment variables
      --ip=ADDRESS           Assign ip address, within 172.17.0.1/16
  -n, --name=STRING          Jail name, default to random string
  -r, --root=STRING          Rootfs, default to '/'
  -t, --timeout=SECONDS      Running timeout
  -v, --verbose              Make the operation more talkative
  -w, --writable             Make rootfs writable mount
  -?, --help                 Give this help list
      --usage                Give a short usage message
  -V, --version              Print program version

Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.

Report bugs to <brookeyang@vip.qq.com>.

在容器中執行完命令後退出容器:

$ sudo ./jail /bin/ls
bin   data  etc   imgcreate_linux_install_0.1.23  initrd.img.old  lib64       media  opt   root  sbin  srv  tmp  var	  vmlinuz.old
boot  dev   home  initrd.img			  lib		  lost+found  mnt    proc  run	 snap  sys  usr  vmlinuz

在容器中執行命令

$ sudo ./jail /usr/local/node/bin/node
>
Error: Could not open history file.
REPL session history will not be persisted.
>
>

命名空間

用戶命名空間

主機:

$ ps -ef | grep jail
root     20743 13977  0 20:03 pts/0    00:00:00 sudo ./jail /bin/sh -n myJail

$ cat /proc/20743/uid_map
         0          0 4294967295
$ cat /proc/20743/gid_map
         0          0 4294967295

容器:

# id
uid=0(root) gid=0(root) groups=0(root)

容器進程在主機內的 uid = 20743 映射到了 容器內到 uid = 0

UTS 命名空間

在新的 UTS 命名空間中修改了主機名:

$ sudo ./jail /bin/sh -n jail
# hostname
jail

PID 命名空間

主機中:

$ ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 Nov01 ?        00:00:09 /sbin/init
root         2     0  0 Nov01 ?        00:00:00 [kthreadd]
root         4     2  0 Nov01 ?        00:00:00 [kworker/0:0H]
root         6     2  0 Nov01 ?        00:00:00 [mm_percpu_wq]
root         7     2  0 Nov01 ?        00:00:01 [ksoftirqd/0]
root         8     2  0 Nov01 ?        00:00:27 [rcu_sched]
...
root     20743 13977  0 20:03 pts/0    00:00:00 sudo ./jail /bin/sh -n myJail
...

容器內只兩個進程:

# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 15:02 pts/1    00:00:00 /bin/sh
root        15     1  0 15:06 pts/1    00:00:00 ps -ef
# ls /proc
1	   cmdline    dma	   interrupts  key-users    loadavg  mounts	   schedstat  swaps	     tty		zoneinfo
12	   consoles   driver	   iomem       keys	    locks    mtrr	   scsi       sys	     uptime
acpi	   cpuinfo    execdomains  ioports     kmsg	    mdstat   net	   self       sysrq-trigger  version
buddyinfo  crypto     fb	   irq	       kpagecgroup  meminfo  pagetypeinfo  slabinfo   sysvipc	     version_signature
bus	   devices    filesystems  kallsyms    kpagecount   misc     partitions    softirqs   thread-self    vmallocinfo
cgroups    diskstats  fs	   kcore       kpageflags   modules  sched_debug   stat       timer_list     vmstat

網絡命名空間

主機:

...

jail0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.17.255.254  netmask 255.255.0.0  broadcast 172.17.255.255
        ether e6:14:c9:b9:27:a0  txqueuelen 1000  (Ethernet)
        RX packets 216  bytes 14220 (14.2 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
        
veth-myJail: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        ether e6:14:c9:b9:27:a0  txqueuelen 1000  (Ethernet)
        RX packets 13  bytes 1006 (1.0 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 0  bytes 0 (0.0 B)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

容器:

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 172.17.0.1  netmask 255.255.0.0  broadcast 0.0.0.0
        inet6 fe80::860:a7ff:fea2:10f9  prefixlen 64  scopeid 0x20<link>
        ether 0a:60:a7:a2:10:f9  txqueuelen 1000  (Ethernet)
        RX packets 0  bytes 0 (0.0 B)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 13  bytes 1006 (1.0 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
...

容器與宿主機通信

主機中添加網橋 jail0,網絡段:172.17.255.254/16
創建一對 veth pair 的網卡,從一邊發送包,另一邊就能收到。
其中一個網卡 veth-myJail 打到 jail0 網橋上,另一個網卡 eth0 塞到容器中。
在容器內設置網卡 eth0 地址:172.17.0.1

容器與互聯網通信

如果從容器內訪問互聯網,需使用 SNAT
先設置 net.ipv4.ip_forward = 1,開啓物理機的轉發功能,直接做路由器。
然後在主機上,添加一條 iptables 規則:

iptables -t nat -A POSTROUTING -s 172.17.0.0/16 -j MASQUERADE

掛載命名空間

主機中:

$ cat /proc/20743/mountinfo | sed 's/ - .*//'
23 29 0:22 / /sys rw,nosuid,nodev,noexec,relatime shared:7
24 29 0:4 / /proc rw,nosuid,nodev,noexec,relatime shared:13
25 29 0:6 / /dev rw,nosuid,relatime shared:2
26 25 0:23 / /dev/pts rw,nosuid,noexec,relatime shared:3
27 29 0:24 / /run rw,nosuid,noexec,relatime shared:5
29 0 252:1 / / rw,relatime shared:1
30 23 0:7 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime shared:8
31 25 0:26 / /dev/shm rw,nosuid,nodev shared:4
32 27 0:27 / /run/lock rw,nosuid,nodev,noexec,relatime shared:6
33 23 0:28 / /sys/fs/cgroup ro,nosuid,nodev,noexec shared:9
34 33 0:29 / /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime shared:10
35 33 0:30 / /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime shared:11
36 23 0:31 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime shared:12
37 33 0:32 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime shared:14
38 33 0:33 / /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime shared:15
39 33 0:34 / /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime shared:16
40 33 0:35 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime shared:17
41 33 0:36 / /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime shared:18
42 33 0:37 / /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime shared:19
43 33 0:38 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime shared:20
44 33 0:39 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime shared:21
45 33 0:40 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime shared:22
46 33 0:41 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime shared:23
47 33 0:42 / /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime shared:24
48 24 0:43 / /proc/sys/fs/binfmt_misc rw,relatime shared:25
49 25 0:19 / /dev/mqueue rw,relatime shared:26
51 23 0:8 / /sys/kernel/debug rw,relatime shared:27
50 25 0:44 / /dev/hugepages rw,relatime shared:28
52 23 0:20 / /sys/kernel/config rw,relatime shared:29
53 23 0:45 / /sys/fs/fuse/connections rw,relatime shared:30
262 29 0:48 / /var/lib/lxcfs rw,nosuid,nodev,relatime shared:144
276 27 0:24 /netns /run/netns rw,nosuid,noexec,relatime shared:5
283 276 0:3 net:[4026532213] /run/netns/netns1 rw shared:155
284 27 0:3 net:[4026532213] /run/netns/netns1 rw shared:155
269 27 0:49 / /run/user/500 rw,nosuid,nodev,relatime shared:148

容器中:

# cat /proc/self/mountinfo | sed 's/ - .*//'
351 313 252:1 / / ro,relatime master:1
352 351 0:6 / /dev rw,nosuid,relatime master:2
353 352 0:23 / /dev/pts rw,nosuid,noexec,relatime master:3
354 352 0:26 / /dev/shm rw,nosuid,nodev master:4
355 352 0:19 / /dev/mqueue rw,relatime master:26
356 352 0:44 / /dev/hugepages rw,relatime master:28
357 351 0:24 / /run rw,nosuid,noexec,relatime master:5
358 357 0:27 / /run/lock rw,nosuid,nodev,noexec,relatime master:6
359 357 0:24 /netns /run/netns rw,nosuid,noexec,relatime master:5
360 359 0:3 net:[4026532213] /run/netns/netns1 rw master:155
361 357 0:3 net:[4026532213] /run/netns/netns1 rw master:155
362 357 0:49 / /run/user/500 rw,nosuid,nodev,relatime master:148
363 351 0:22 / /sys rw,nosuid,nodev,noexec,relatime master:7
364 363 0:7 / /sys/kernel/security rw,nosuid,nodev,noexec,relatime master:8
365 363 0:28 / /sys/fs/cgroup ro,nosuid,nodev,noexec master:9
366 365 0:29 /../../.. /sys/fs/cgroup/unified rw,nosuid,nodev,noexec,relatime master:10
367 365 0:30 /../../.. /sys/fs/cgroup/systemd rw,nosuid,nodev,noexec,relatime master:11
368 365 0:32 / /sys/fs/cgroup/cpuset rw,nosuid,nodev,noexec,relatime master:14
369 365 0:33 /.. /sys/fs/cgroup/memory rw,nosuid,nodev,noexec,relatime master:15
370 365 0:34 /.. /sys/fs/cgroup/blkio rw,nosuid,nodev,noexec,relatime master:16
371 365 0:35 / /sys/fs/cgroup/perf_event rw,nosuid,nodev,noexec,relatime master:17
372 365 0:36 /.. /sys/fs/cgroup/cpu,cpuacct rw,nosuid,nodev,noexec,relatime master:18
373 365 0:37 /../../.. /sys/fs/cgroup/pids rw,nosuid,nodev,noexec,relatime master:19
374 365 0:38 / /sys/fs/cgroup/rdma rw,nosuid,nodev,noexec,relatime master:20
375 365 0:39 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime master:21
376 365 0:40 / /sys/fs/cgroup/net_cls,net_prio rw,nosuid,nodev,noexec,relatime master:22
377 365 0:41 / /sys/fs/cgroup/hugetlb rw,nosuid,nodev,noexec,relatime master:23
378 365 0:42 /.. /sys/fs/cgroup/devices rw,nosuid,nodev,noexec,relatime master:24
379 363 0:31 / /sys/fs/pstore rw,nosuid,nodev,noexec,relatime master:12
380 363 0:8 / /sys/kernel/debug rw,relatime master:27
381 363 0:20 / /sys/kernel/config rw,relatime master:29
382 363 0:45 / /sys/fs/fuse/connections rw,relatime master:30
383 351 0:4 / /proc rw,nosuid,nodev,noexec,relatime master:13
384 383 0:43 / /proc/sys/fs/binfmt_misc rw,relatime master:25
385 351 0:48 / /var/lib/lxcfs rw,nosuid,nodev,relatime master:144
386 383 0:51 / /proc ro,relatime
314 351 0:52 / /tmp rw,relatime

cgroup

內存:

$ ls -l /sys/fs/cgroup/memory
...
drwx------  2 root root 0 Nov 10 01:20 myJail
...

$ sudo cat /sys/fs/cgroup/memory/myJail/memory.limit_in_bytes
49999872

$ sudo cat /sys/fs/cgroup/memory/myJail/memory.kmem.limit_in_bytes
49999872

cpu:

$ ls -l /sys/fs/cgroup/cpu
lrwxrwxrwx 1 root root 11 Nov  1 14:50 /sys/fs/cgroup/cpu -> cpu,cpuacct

$ ls -l /sys/fs/cgroup/cpu,cpuacct
...
drwx------  2 root root 0 Nov  9 22:01 myJail
...

$ sudo cat /sys/fs/cgroup/cpu,cpuacct/myJail/cpu.cfs_period_us
1000000

$ sudo cat /sys/fs/cgroup/cpu,cpuacct/myJail/cpu.cfs_quota_us
1000000
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章