容器文件系统实现

容器的核心技术是由 Linux Namespace + Cgroups 实现的。而容器的文件系统是通过 Mount Namespace 实现。

容器的文件系统在用户视角上看和宿主机没什么区别,但背后实现原理不同,在容器中执行 df -h 命令后可以看到容器的根目录是 overlay 类型,不同于宿主机看到的 ext4xfs

1
2
3
4
5
6
7
root@rke2-cilium-01:~# df -h /
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/ubuntu--vg-ubuntu--lv 97G 42G 56G 43% /

root@rke2-cilium-01:~# kubectl exec -it nginx-56f7986d7-rqsr9 -- df -h /
Filesystem Size Used Avail Use% Mounted on
overlay 97G 70G 28G 72% /

UnionFS

UnionFS(联合文件系统)是一种将多个目录(或文件系统)合并成一个统一视图的文件系统。

用户看到的是一个统一的文件系统,其实底层是多个文件系统叠加在一起的。

通常用于只读和可写层的组合,比如:

  • 只读层(基础镜像)
  • 可写层(用户更改)

典型用法:Docker 镜像分层机制就使用类似 UnionFS(如 OverlayFS)来实现。

OverlayFS

UnionFS 的实现有很多种,Docker 支持 OverlayFS、Fuse-overlayfs、Devicemapper、Btrfs、ZFS、VFS、AUFS(已废弃) ,前面运行 df -h 看到的 overlay 就是指 OverlayFS。

在 Linux 内核 3.18 版本起,OverlayFS 被合并到 Linux 内核主分支,从此之后 OverlayFS 被作为各个 Linux 发行版里缺省使用的容器文件系统。

OverlayFS 由四个目录组成:

  • lowerdir:只读层,该层无法修改,可以指定多个 lower。
  • upperdir:读写层,容器数据修改保存的地方。
  • mergeddir:最终呈现给用户的目录。
  • workdir:工作目录,指 OverlayFS 工作时临时使用的目录,保证文件操作的原子性,挂载后会被清空。

OverlayFS 的实现就是将 lower 层和 upper 层联合挂载到 merged 层,使其拥有两个层中的所有文件和目录。

OverlayFS 挂载操作

创建各个类型的目录:

1
root@test-1:~# mkdir -pv /root/test/{lower_1,lower_2,upper,work,merged}

lower 目录和 upper 目录创建文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
root@test-1:~# echo "this is lower_1 file" > /root/test/lower_1/lower_1_file.txt
root@test-1:~# echo "this is lower_2 file" > /root/test/lower_2/lower_2_file.txt
root@test-1:~# echo "this is upper file" > /root/test/upper/upper_file.txt

root@test-1:~# tree test
test
├── lower_1
│   └── lower_1_file.txt
├── lower_2
│   └── lower_2_file.txt
├── merged
├── upper
│   └── upper_file.txt
└── work

5 directories, 3 files

通过 mountlower 层和 upper 层联合挂载到 merged 层:

1
2
3
4
# mount -t overlay overlay -o lowerdir=<lowerdir1-dir>:<lowerdir2-dir>:<lowerdir3-dir>,upperdir=<upper-dir>,workdir=<work-dir> <merded-dir>
root@test-1:~# mount -t overlay overlay -o lowerdir=/root/test/lower_1:/root/test/lower_2,upperdir=/root/test/upper,workdir=/root/test/work /root/test/merged
root@test-1:~# mount | grep overlay
overlay on /root/test/merged type overlay (rw,relatime,lowerdir=/root/test/lower_1:/root/test/lower_2,upperdir=/root/test/upper,workdir=/root/test/work)

结果就是在 merged 目录下能够看到 lowerupper 的文件:

1
2
3
4
5
root@test-1:~# ls -l /root/test/merged
total 12
-rw-r--r-- 1 root root 21 Jun 3 21:54 lower_1_file.txt
-rw-r--r-- 1 root root 21 Jun 3 21:54 lower_2_file.txt
-rw-r--r-- 1 root root 19 Jun 3 21:54 upper_file.txt

当挂载完成 OverlayFS 以后,对文件系统的任何操作都只能在 merged 目录中进行,用户不允许再直接或间接的到底层文件系统的原始 lowerupper 目录下修改文件或目录,否则可能会出现一些无法预料的后果。

OverlayFS 删除操作

merged 目录删除文件或者目录的时候,存在三种情况。

删除来源于 upper 层的文件

upperdir 中也会被删除:

1
2
3
4
5
6
7
8
root@test-1:~# ls -l /root/test/merged
total 12
-rw-r--r-- 1 root root 21 Jun 3 21:54 lower_1_file.txt
-rw-r--r-- 1 root root 21 Jun 3 21:54 lower_2_file.txt
-rw-r--r-- 1 root root 19 Jun 3 21:54 upper_file.txt
root@test-1:~# rm /root/test/merged/upper_file.txt
root@test-1:~# ls -l /root/test/upper/
total 0

删除来源于 lower 层的文件

lowerdir 中不会被删除,但会在 upper 层创建一个 Whiteout 文件,用于标识这个文件已经被删除,从而在 merged 层起到删除的效果:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
root@test-1:~# ls -l /root/test/merged
total 8
-rw-r--r-- 1 root root 21 Jun 3 21:54 lower_1_file.txt
-rw-r--r-- 1 root root 21 Jun 3 21:54 lower_2_file.txt
root@test-1:~# rm /root/test/merged/lower_1_file.txt

root@test-1:~# ls -l /root/test/merged
total 4
-rw-r--r-- 1 root root 21 Jun 3 21:54 lower_2_file.txt

root@test-1:~# ls -l /root/test/lower_1/
total 4
-rw-r--r-- 1 root root 21 Jun 3 21:54 lower_1_file.txt

root@test-1:~# ls -l /root/test/upper/
total 0
c--------- 2 root root 0, 0 Jun 3 22:05 lower_1_file.txt

Whiteout 文件在用户删除文件时创建,用于屏蔽底层的同名文件,同时该文件在 merged 层是不可见的,所以用户就看不到被删除的文件或目录了。

Whiteout 文件是主次设备号都为 0 的字符设备(可以通过 mknod 命令创建),当用户在 merged 层通过 ls 命令(通过 readddir 系统调用)检查父目录的目录项时,OverlayFS 会自动过滤掉和 Whiteout 文件自身以及和它同名的 lower 层文件和目录,达到了隐藏文件的目的,让用户以为文件已经被删除了。

要删除的文件是 upper 层覆盖 lower 层的文件

这种情况 OverlayFS 即需要删除 upper 层对应文件系统中的文件或目录,也需要在 upper 层创建 Whiteout 文件,让 upper 层的文件被删除后不至于 lower 层的文件被暴露出来:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
root@test-1:~# mkdir -pv /root/test_2/{lower_1,lower_2,upper,work,merged}

root@test-1:~# echo "just a test in lower_1" > /root/test_2/lower_1/test.txt
root@test-1:~# echo "just a test in lower_2" > /root/test_2/lower_2/test.txt
root@test-1:~# echo "just a test in upper" > /root/test_2/upper/test.txt

root@test-1:~# tree /root/test_2
/root/test_2
├── lower_1
│   └── test.txt
├── lower_2
│   └── test.txt
├── merged
├── upper
│   └── test.txt
└── work

5 directories, 3 files

root@test-1:~# mount -t overlay overlay -o lowerdir=/root/test_2/lower_1:/root/test_2/lower_2,upperdir=/root/test_2/upper,workdir=/root/test_2/work /root/test_2/merged
root@test-1:~# mount | grep overlay
overlay on /root/test_2/merged type overlay (rw,relatime,lowerdir=/root/test_2/lower_1:/root/test_2/lower_2,upperdir=/root/test_2/upper,workdir=/root/test_2/work)

root@test-1:~# cat /root/test_2/merged/test.txt
just a test in upper

进行删除操作,这里会删除的是 upper 层的同名文件,删除后两个 lower 层的文件依旧存在,同时 upper 层只剩一个 Whiteout 类型的同名文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
root@test-1:~# ls -l /root/test_2/merged/
total 4
-rw-r--r-- 1 root root 21 Jun 3 22:12 test.txt

root@test-1:~# rm /root/test_2/merged/test.txt

root@test-1:~# ls -l /root/test_2/merged/
total 0

root@test-1:~# ls -l /root/test_2/lower_1/
total 4
-rw-r--r-- 1 root root 23 Jun 3 22:12 test.txt

root@test-1:~# ls -l /root/test_2/lower_2/
total 4
-rw-r--r-- 1 root root 23 Jun 3 22:12 test.txt

root@test-1:~# ls -l /root/test_2/upper/
total 0
c--------- 2 root root 0, 0 Jun 3 22:15 test.txt

OverlayFS 新建操作

在 OverlayFS 新建文件存在两种情况。

新建的文件在 lower 层都不存在对应的文件或目录

会直接在 upper 层中对应的目录下新创建文件或目录:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
root@test-1:~# tree /root/test
/root/test
├── lower_1
│   └── lower_1_file.txt
├── lower_2
│   └── lower_2_file.txt
├── merged
│   └── lower_2_file.txt
├── upper
│   └── lower_1_file.txt
└── work
└── work
└── #3

6 directories, 5 files

root@test-1:~# echo "this is new file 1" > /root/test/merged/new_file_1.txt
root@test-1:~# tree /root/test
/root/test
├── lower_1
│   └── lower_1_file.txt
├── lower_2
│   └── lower_2_file.txt
├── merged
│   ├── lower_2_file.txt
│   └── new_file_1.txt
├── upper
│   ├── lower_1_file.txt
│   └── new_file_1.txt
└── work
└── work
└── #3

6 directories, 7 files

创建一个在 lower 层已经存在且在 upper 层有 Whiteout 文件的同名文件

这种情况下,OverlayFS 要删除 upper 层中的 Whiteout 文件,并创建新的同名文件,所以在 merged 层看到的 lower_1_file.txt 就是 upper 层的 lower_1_file.txt

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
root@test-1:~# tree /root/test
/root/test
├── lower_1
│   └── lower_1_file.txt
├── lower_2
│   └── lower_2_file.txt
├── merged
│   ├── lower_2_file.txt
│   └── new_file_1.txt
├── upper
│   ├── lower_1_file.txt
│   └── new_file_1.txt
└── work
└── work
└── #3

6 directories, 7 files

root@test-1:~# echo "this is lower_1 file in merged" > /root/test/merged/lower_1_file.txt

root@test-1:~# ls -l /root/test/merged/
total 12
-rw-r--r-- 1 root root 31 Jun 3 22:27 lower_1_file.txt
-rw-r--r-- 1 root root 21 Jun 3 21:54 lower_2_file.txt
-rw-r--r-- 1 root root 19 Jun 3 22:20 new_file_1.txt

root@test-1:~# cat /root/test/merged/lower_1_file.txt
this is lower_1 file in merged

root@test-1:~# cat /root/test/lower_1/lower_1_file.txt
this is lower_1 file

root@test-1:~# cat /root/test/upper/lower_1_file.txt
this is lower_1 file in merged

OverlayFS 修改操作

在 OverlayFS 修改文件存在两种情况。

修改来源与 upper 层的文件

merged 层修改来自 upper 层的文件,upper 层的文件也会被修改:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
root@test-1:~# tree /root/test
/root/test
├── lower_1
│   └── lower_1_file.txt
├── lower_2
│   └── lower_2_file.txt
├── merged
│   ├── lower_1_file.txt
│   ├── lower_2_file.txt
│   └── new_file_1.txt
├── upper
│   ├── lower_1_file.txt
│   └── new_file_1.txt
└── work
└── work
└── #3

6 directories, 8 files

root@test-1:~# cat /root/test/merged/new_file_1.txt
this is new file 1
root@test-1:~# cat /root/test/upper/new_file_1.txt
this is new file 1

root@test-1:~# echo "change file context in new_file_1.txt" >> /root/test/merged/new_file_1.txt

root@test-1:~# cat /root/test/merged/new_file_1.txt
this is new file 1
change file context in new_file_1.txt
root@test-1:~# cat /root/test/upper/new_file_1.txt
this is new file 1
change file context in new_file_1.txt

修改来源于 lower 层的文件

用户如果修改来自 lower 层的文件,由于 lower 层是只读层不允许修改,所以 OverlayFS 会先复制一份到 upper 层里,然后再进行修改操作,这就是 OverlayFS 的写时复制(copy-up)特性。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
root@test-1:~# tree /root/test
/root/test
├── lower_1
│   └── lower_1_file.txt
├── lower_2
│   └── lower_2_file.txt
├── merged
│   ├── lower_1_file.txt
│   ├── lower_2_file.txt
│   └── new_file_1.txt
├── upper
│   ├── lower_1_file.txt
│   └── new_file_1.txt
└── work
└── work
└── #3

6 directories, 8 files

root@test-1:~# cat /root/test/merged/lower_2_file.txt
this is lower_2 file
root@test-1:~# cat /root/test/lower_2/lower_2_file.txt
this is lower_2 file

root@test-1:~# echo "change file context in lower_2_file.txt" >> /root/test/merged/lower_2_file.txt

root@test-1:~# cat /root/test/merged/lower_2_file.txt
this is lower_2 file
change file context in lower_2_file.txt
root@test-1:~# cat /root/test/lower_2/lower_2_file.txt
this is lower_2 file

root@test-1:~# tree /root/test
/root/test
├── lower_1
│   └── lower_1_file.txt
├── lower_2
│   └── lower_2_file.txt
├── merged
│   ├── lower_1_file.txt
│   ├── lower_2_file.txt
│   └── new_file_1.txt
├── upper
│   ├── lower_1_file.txt
│   ├── lower_2_file.txt
│   └── new_file_1.txt
└── work
└── work
└── #3

6 directories, 9 files

root@test-1:~# cat /root/test/upper/lower_2_file.txt
this is lower_2 file
change file context in lower_2_file.txt

容器使用 OverlayFS

从系统的 mount 命令,可以看到 Docker 是如何通过 OverlayFS 挂载镜像文件的,容器的镜像可以分成多个层(layer),每层对应 OverlayFS 里的一个 lower 目录。

mount 的时候可以支持配置多个 lower 目录,也就可以支持多层的镜像文件。在容器启动后,对镜像文件中修改就会被保存在 upper 层里了:

1
2
3
root@docker-rancher:~# mount | grep overlay
overlay on /var/lib/docker/overlay2/cd127ce0c9234a83bfa6a8af489523ac5b4d0f75193376e12c4fb7f7ee9fac60/merged type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/GJSFUGAX44YTMZ5OEJU4EXI3V3:/var/lib/docker/overlay2/l/32VGIMNNHRGYLLLQMKQ6YSLCA5:/var/lib/docker/overlay2/l/NVATVK5SZLFEWVJDHMSOTH6E2S:/var/lib/docker/overlay2/l/LI5UP7MHJZPMFRX5HO6DBIJEXW:/var/lib/docker/overlay2/l/S6ZRHMIZB2OGWJDCWMG2SWD7OS:/var/lib/docker/overlay2/l/EL5MTVHUJTMMZVEVE6DWVKFCWN,upperdir=/var/lib/docker/overlay2/cd127ce0c9234a83bfa6a8af489523ac5b4d0f75193376e12c4fb7f7ee9fac60/diff,workdir=/var/lib/docker/overlay2/cd127ce0c9234a83bfa6a8af489523ac5b4d0f75193376e12c4fb7f7ee9fac60/work)
overlay on /var/lib/docker/overlay2/2230388d2e79adfb7d077ef2efc96e2a12bf0da12a1ecdf5a4f17184f7b3dbbf/merged type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/WKVVQYWDX6XM3WQHFJZ3HEVXHA:/var/lib/docker/overlay2/l/2CX3IUCN65XNPMTWYAKXDY36YY:/var/lib/docker/overlay2/l/73UR25NJGJRBHGV4XW76QS23NU:/var/lib/docker/overlay2/l/U5X27QU3RFE72DW4MKLW5B5AA2:/var/lib/docker/overlay2/l/U3T6GTKZI3BVEYC76NT3UB4QKM:/var/lib/docker/overlay2/l/XF5V6O44CJFWHWGXCBVZNDWD3J:/var/lib/docker/overlay2/l/MSBV2RGM5OXHK2ZXI3KDKGQESW:/var/lib/docker/overlay2/l/JLAAA7DI2PCNETKIP3QWNIKJ5V:/var/lib/docker/overlay2/l/RR7DLQW2OR3KJ4ROSWHGKUO7ZJ:/var/lib/docker/overlay2/l/ZE6CWQSI5BLQ4EDMNRZYZXXXUR:/var/lib/docker/overlay2/l/WZHHR5UGHAYDPPM32EXW7LDU7O:/var/lib/docker/overlay2/l/IDOVDLLKLV3O52ISM7OFDTKMZR:/var/lib/docker/overlay2/l/QS5CLJTQ2J2KD4DRBZ2D7SNJH3:/var/lib/docker/overlay2/l/YPMNY7WF5O525QRRKR3CDLVQ2O:/var/lib/docker/overlay2/l/NUBYELS36USEBZQ2YPUEHRJQM4:/var/lib/docker/overlay2/l/QXH2NMXKWNDUZ5OWFCAFXD2IJB:/var/lib/docker/overlay2/l/GLNEY32NVXKYJR7VXTOQJKGOVR:/var/lib/docker/overlay2/l/NLSYPVG4Y6DFRAGFWPRU674NNY:/var/lib/docker/overlay2/l/PDHSJV7EF3QZT6L7M4JIB2XMPU,upperdir=/var/lib/docker/overlay2/2230388d2e79adfb7d077ef2efc96e2a12bf0da12a1ecdf5a4f17184f7b3dbbf/diff,workdir=/var/lib/docker/overlay2/2230388d2e79adfb7d077ef2efc96e2a12bf0da12a1ecdf5a4f17184f7b3dbbf/work)

Overlay2

OverlayFS 提供了两个存储驱动,Overlay 和 Overlay2。早期 Docker 使用 Overlay 一段时间后会出现 too many links problem 的错误,所以现在高版本都缺省使用 Overlay2。

镜像结构

通过 docker inspect 查看镜像的具体信息,可以看到 Data 下包含 OverlayFS 的结构:

1
2
3
4
5
6
7
8
9
10
root@docker-rancher:~# docker inspect harbor.warnerchen.com/library/python:3.10.17-alpine | jq '.[0].GraphDriver'
{
"Data": {
"LowerDir": "/var/lib/docker/overlay2/2195a75779888833dba571d39ce769c18cd0deb7691ab133b8bdc994b1eefed5/diff:/var/lib/docker/overlay2/6f5fbe2ac28dc120fcda8306d5c51abfbdd9cde0152a0cf2894aa5cb8de5ad75/diff:/var/lib/docker/overlay2/7a4baa9c0f88400665780444686214aabde0c626da08ecd7b3f8805c2c7c063f/diff",
"MergedDir": "/var/lib/docker/overlay2/b6834f9d8f65721c250691c691db10da2d92314bd24cda5a31b1db71d4248ff3/merged",
"UpperDir": "/var/lib/docker/overlay2/b6834f9d8f65721c250691c691db10da2d92314bd24cda5a31b1db71d4248ff3/diff",
"WorkDir": "/var/lib/docker/overlay2/b6834f9d8f65721c250691c691db10da2d92314bd24cda5a31b1db71d4248ff3/work"
},
"Name": "overlay2"
}

/var/lib/docker/overlay2 下就包含所有镜像的 loweruppermergedwork 层级,同时还有一个 l 目录,这个目录里是一些短标识命名的软链接,链接到上面这些目录,目的是短标识用于避免 mount 命令参数页面大小限制:

1
2
3
4
5
6
7
8
root@docker-rancher:~# ls /var/lib/docker/overlay2/l -l
total 0
lrwxrwxrwx 1 root root 72 Feb 7 20:54 245UGIT5O7BRVNYUCJRVOZU5NL -> ../38928d95f9c344b85fc2d4be2d0cfc621389da81a585b89091be1063a5d82ca7/diff
lrwxrwxrwx 1 root root 72 Dec 26 16:25 24DGEJWBI2GKVF6PQUI45IKZ2O -> ../9f1229bbb00f7dba2fea670cef286aa13498c94454df42b5b67af4cd3a54f9da/diff
lrwxrwxrwx 1 root root 72 Dec 25 16:32 24P6O5MZCXGY2QX5KXEXAAQO37 -> ../6bf5c7cb636eb0345849ace23fde6935a07ee5d7bc1bb7160e3058882079dcf8/diff
lrwxrwxrwx 1 root root 72 Dec 4 15:27 26WQOKXKRSNS4JW4YRJZMTMU4Y -> ../c824524f1baf30988dc66b504ac6b21d34dec52e68aecded117fdc9f2ca7d4c5/diff
lrwxrwxrwx 1 root root 72 Mar 3 09:35 2C2EPWTZN3ZFMKJPDSJ46L2I7Y -> ../a848cd02f9af6a522db2729795fbc81be9a72d098dba7ba8a3b4ce9adeadc211/diff
...

除了 l 之外的目录下有这些内容:

1
2
root@docker-rancher:~# ls /var/lib/docker/overlay2/38928d95f9c344b85fc2d4be2d0cfc621389da81a585b89091be1063a5d82ca7
committed diff link lower work

diff 目录存放这一层的内容:

1
2
3
root@docker-rancher:~# ls -l /var/lib/docker/overlay2/38928d95f9c344b85fc2d4be2d0cfc621389da81a585b89091be1063a5d82ca7/diff/
total 0
drwxr-xr-x 3 root root 17 Dec 22 2023 var

link 存放这一层标识软连接的名字:

1
2
3
4
5
root@docker-rancher:~# cat /var/lib/docker/overlay2/38928d95f9c344b85fc2d4be2d0cfc621389da81a585b89091be1063a5d82ca7/link
245UGIT5O7BRVNYUCJRVOZU5NL

root@docker-rancher:~# ls /var/lib/docker/overlay2/l/245UGIT5O7BRVNYUCJRVOZU5NL -l
lrwxrwxrwx 1 root root 72 Feb 7 20:54 /var/lib/docker/overlay2/l/245UGIT5O7BRVNYUCJRVOZU5NL -> ../38928d95f9c344b85fc2d4be2d0cfc621389da81a585b89091be1063a5d82ca7/diff

lower 存放父级层的短标识软链接,在下面的例子可以看到包含 12 个父级层:

1
2
root@docker-rancher:~# cat /var/lib/docker/overlay2/38928d95f9c344b85fc2d4be2d0cfc621389da81a585b89091be1063a5d82ca7/lower
l/UKTWL45I3NR64VRKLUJCZ6OUND:l/EYJ65BNGSIN4I3BWDYEBWJZLFV:l/DUUWKXY3ZCN4W72WBQFAM5F4LL:l/PGY5775MXVJ4YZGCGZ72X55YHJ:l/54QXKMILAIUG4GNDHYLUNZA3VP:l/OLTHSV5LEWJQI2LZYVXBUJGXDR:l/I7TSGZLYNBFVCCFTPHLANTNRFA:l/YXK7SXSPTKGSJEZIRI4RKRWTCQ:l/EDEFPTNDR52PSECMFQVUK5KMTV:l/5Z6S55QOXLI6H6P3ARC5LLUH6G:l/3L54QDH7VRLPBOSWQRFYNG3SLT:l/M2VE6ZFUJIYBOIQ4IZAC2VB4YZ

如果当前层是最底层,即没有父级层,那么就没有 lower 文件:

1
2
3
4
5
root@docker-rancher:~# ls -l /var/lib/docker/overlay2/l/M2VE6ZFUJIYBOIQ4IZAC2VB4YZ
lrwxrwxrwx 1 root root 72 Dec 17 13:27 /var/lib/docker/overlay2/l/M2VE6ZFUJIYBOIQ4IZAC2VB4YZ -> ../d2817390a98897582488ca4e0d9be2cffa8fb9ef5ce6a3a6df1566967c10b7e7/diff

root@docker-rancher:~# ls /var/lib/docker/overlay2/d2817390a98897582488ca4e0d9be2cffa8fb9ef5ce6a3a6df1566967c10b7e7
committed diff link

work 目录就是 OverlayFS 工作时存放临时文件的目录,完成工作后就会清空该目录。

commited 是一个空文件,用于标志此层已被 commit 成为镜像层。

镜像元数据

镜像的元数据存储在 /var/lib/docker/image/<storage_driver>/imagedb/content/sha256/ 目录下,名称是以镜像 ID 命名的文件:

1
2
3
4
root@docker-rancher:~# ls -l /var/lib/docker/image/overlay2/imagedb/content/sha256/
total 1052
-rw------- 1 root root 6786 May 23 14:22 02a306e6b8c083a472a0e16bd5864cddf8c613f5c650f02f08d90cd3d396a616
...

因为文件名称是以镜像 ID 命名,所以可以通过 docker inspect 查看镜像具体信息:

1
2
3
4
5
6
7
8
9
10
root@docker-rancher:~# docker inspect 02a306e6b8c083a472a0e16bd5864cddf8c613f5c650f02f08d90cd3d396a616
[
{
"Id": "sha256:02a306e6b8c083a472a0e16bd5864cddf8c613f5c650f02f08d90cd3d396a616",
"RepoTags": [
"harbor.warnerchen.com/rancher/shell:v0.1.25"
],
...
}
]

这些文件以 JSON 格式存放了镜像的具体信息:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
cat /var/lib/docker/image/overlay2/imagedb/content/sha256/02a306e6b8c083a472a0e16bd5864cddf8c613f5c650f02f08d90cd3d396a616 | jq
{
"architecture": "amd64",
"author": "SUSE LINUX GmbH (https://www.suse.com/)",
"config": {
"User": "1000",
"Env": [
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
],
"Cmd": [
"welcome"
],
...
}

其中最重要的就是 diff_ids,每一个 ID 对应一个 layer,其排列也是有顺序的,从上到下依次代表 layer 的最底层和最顶层:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
root@docker-rancher:~# cat /var/lib/docker/image/overlay2/imagedb/content/sha256/02a306e6b8c083a472a0e16bd5864cddf8c613f5c650f02f08d90cd3d396a616 | jq .rootfs
{
"type": "layers",
"diff_ids": [
"sha256:22088c8157a5dbf54df5a93e343e2370583c5500e39e8cd781fd248ce5c823a1",
"sha256:f332fc93ca03fe20ab1e3df14803a6f2521cdd944858ef0571c19c970976311d",
"sha256:6ef290e55fa55ec1f3188f2f865e64762d5ca7f536aedfd4998fe6b0dc12e7ac",
"sha256:c2db1864a7160e63cabcb5159b307c4c3c55e7d9bb8b0d24737f25ec54009f81",
"sha256:ff508a238556a6cdb932ad2a50286f15e3d17d01a75e1bbfe7b47f722268f29f",
"sha256:c43d97986e9aaebb70bfbaa93cdfe8274171737dac5a9b59f38c843b7aee3b3a",
"sha256:e9d66ec1cf5d038b7ea11c922ee12d9a0e94767f9a8e171c01a1501de3221cd8",
"sha256:5f70bf18a086007016e948b04aed3b82103a36bea41755b6cddfaf10ace3c6ef"
]
}

Layer 元数据

Layer 分为 roLayermountedLayerroLayer 用于描述只读的镜像层,mountedLayer 用于描述可读写的容器层。

roLayer

roLayer 存储的内容主要有该镜像层的校验码 diffID父镜像层 chainIDstorage_driver存储当前镜像层文件的 cacheID该镜像层的 size 等内容。这些元数据被保存在 /var/lib/docker/image/<storage_driver>/layerdb/sha256/<chainID>/

1
2
3
4
5
6
7
8
9
10
11
12
13
root@docker-rancher:~# ls -l /var/lib/docker/image/overlay2/layerdb/sha256/
total 0
drwx------ 2 root root 85 Dec 26 16:25 0039114f0040d66de95932fc6860e7bb10b00ba7795dacd4cd539ed9d80d9876

root@docker-rancher:~# tree /var/lib/docker/image/overlay2/layerdb/sha256/0039114f0040d66de95932fc6860e7bb10b00ba7795dacd4cd539ed9d80d9876
/var/lib/docker/image/overlay2/layerdb/sha256/0039114f0040d66de95932fc6860e7bb10b00ba7795dacd4cd539ed9d80d9876
├── cache-id
├── diff
├── parent
├── size
└── tar-split.json.gz

0 directories, 5 files

cache-id 记录该层在 overlay2 驱动下实际对应的目录名,也就是在 /var/lib/docker/overlay2/ 下看到的那个子目录名。

1
2
3
4
5
6
7
8
9
10
11
12
# 查看 chainID 为 0039114f0040d66de95932fc6860e7bb10b00ba7795dacd4cd539ed9d80d9876 的 cacheID
root@docker-rancher:~# cat /var/lib/docker/image/overlay2/layerdb/sha256/0039114f0040d66de95932fc6860e7bb10b00ba7795dacd4cd539ed9d80d9876/cache-id
49f9f9a21790e50efa1344a4a23fec2833bee6697845aacc061ccdb8c79fadaa

# 通过 cacheID 找到对应的镜像数据目录
root@docker-rancher:~# ls -l /var/lib/docker/overlay2/49f9f9a21790e50efa1344a4a23fec2833bee6697845aacc061ccdb8c79fadaa
total 8
-rw------- 1 root root 0 Dec 26 16:25 committed
drwxr-xr-x 5 root root 40 Dec 26 16:25 diff
-rw-r--r-- 1 root root 26 Dec 26 16:25 link
-rw-r--r-- 1 root root 173 Dec 26 16:25 lower
drwx------ 2 root root 6 Dec 26 16:25 work

diff 文件存储了 diffID,与镜像元数据中的 diff_ids 对应:

1
2
root@docker-rancher:~# cat /var/lib/docker/image/overlay2/layerdb/sha256/0039114f0040d66de95932fc6860e7bb10b00ba7795dacd4cd539ed9d80d9876/diff
sha256:93252ffc4a24def2bc530132baa5ea8bd84790ebab0695802a6bbdb509d03a80

parent 文件存储了父级 chainID

1
2
root@docker-rancher:~# cat /var/lib/docker/image/overlay2/layerdb/sha256/0039114f0040d66de95932fc6860e7bb10b00ba7795dacd4cd539ed9d80d9876/parent
sha256:6e731416af892c6d4cb775f708eac8487fe3d04a817b22b17f364fe558c5349a

size 文件存储了镜像层的大小,以字节为单位:

1
2
3
# 770450043 Byte ~= 734 MB
root@docker-rancher:~# cat /var/lib/docker/image/overlay2/layerdb/sha256/0039114f0040d66de95932fc6860e7bb10b00ba7795dacd4cd539ed9d80d9876/size
770450043

tar-split.json.gz 是压缩格式的 JSON 文件,包含构建该层时 tar 的拆包信息,用于镜像导入/导出或增量传输优化(docker save/load)。

理解三个 ID:

  • cacheID:存放镜像层数据的目录名称,也就是 /var/lib/docker/overlay2 下的目录名称。
  • diffID:用于构建完整镜像层的 chainID 和内容校验,就是 /var/lib/docker/image/overlay2/imagedb/content/sha256 下的文件。
  • chainID:在 Docker 中,chainID 是一个用于唯一标识镜像层(Layer)的完整层级内容哈希,它是由当前层及其所有父层的 diffID 计算出来的,若该镜像层是最底层,那么其 chainIDdiffID 相同。

假设有三个层:

1
2
3
Layer 1: diffID = A
Layer 2: diffID = B
Layer 3: diffID = C

那么三个层的 chainID 为:

1
2
3
chainID of Layer 1 = sha256(A)
chainID of Layer 2 = sha256(chainID of Layer 1 + B)
chainID of Layer 3 = sha256(chainID of Layer 2 + C)

下图展示了镜像元数据、镜像只读层元数据、镜像真实数据之间的关系:

mountedLayer

mountedLayer 描述可读写的容器层,也就是针对已经启动的容器来说的,相关文件都存储在 /var/lib/docker/image/overlay2/layerdb/mounts 目录下。

该目录下以 Container ID 命名目录,且包含三个文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
root@docker-rancher:/var/lib/docker/image/overlay2/layerdb/mounts# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
fb59b89245b8 harbor.warnerchen.com/rancher/rancher:v2.11.1 "entrypoint.sh" 11 days ago Up 6 days 0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp rancher
dbac81c2435f harbor.warnerchen.com/dreamacro/clash:v1.18.0 "/clash" 3 weeks ago Up 6 days 0.0.0.0:7890-7891->7890-7891/tcp, :::7890-7891->7890-7891/tcp, 0.0.0.0:9090->9090/tcp, :::9090->9090/tcp clash-paofucloud

root@docker-rancher:/var/lib/docker/image/overlay2/layerdb/mounts# docker ps | awk '{ print $1 }'
CONTAINER
fb59b89245b8
dbac81c2435f

root@docker-rancher:/var/lib/docker/image/overlay2/layerdb/mounts# ls -l
total 0
drwxr-xr-x 2 root root 51 May 10 22:36 dbac81c2435f5b95efbb9dacd4f12f79350a975c1fba2b6f98e9f6ddc17df3e9
drwxr-xr-x 2 root root 51 May 23 16:32 fb59b89245b8860278909104d0d92399f579553220833692c4e1cdb3055e2fde

root@docker-rancher:/var/lib/docker/image/overlay2/layerdb/mounts# ls fb59b89245b8860278909104d0d92399f579553220833692c4e1cdb3055e2fde
init-id mount-id parent
  • mount-id 是存储在 /var/lib/docker/overlay2 的目录名称,对应容器层的数据目录。
  • init-id 是在 mount-id 后面增加了 -init 的后缀,也是存储在 /var/lib/docker/overlay2 的目录名称。
  • parent 是容器镜像的最上层 chainID
1
2
3
4
5
6
7
8
9
10
11
root@docker-rancher:/var/lib/docker/image/overlay2/layerdb/mounts/fb59b89245b8860278909104d0d92399f579553220833692c4e1cdb3055e2fde# cat mount-id
2230388d2e79adfb7d077ef2efc96e2a12bf0da12a1ecdf5a4f17184f7b3dbbf

root@docker-rancher:/var/lib/docker/image/overlay2/layerdb/mounts/fb59b89245b8860278909104d0d92399f579553220833692c4e1cdb3055e2fde#
root@docker-rancher:/var/lib/docker/image/overlay2/layerdb/mounts/fb59b89245b8860278909104d0d92399f579553220833692c4e1cdb3055e2fde# cat init-id
2230388d2e79adfb7d077ef2efc96e2a12bf0da12a1ecdf5a4f17184f7b3dbbf-init

root@docker-rancher:/var/lib/docker/image/overlay2/layerdb/mounts/fb59b89245b8860278909104d0d92399f579553220833692c4e1cdb3055e2fde#
root@docker-rancher:/var/lib/docker/image/overlay2/layerdb/mounts/fb59b89245b8860278909104d0d92399f579553220833692c4e1cdb3055e2fde# ls /var/lib/docker/overlay2 | grep 2230388d2e79adfb7d077ef2efc96e2a12bf0da12a1ecdf5a4f17184f7b3dbbf
2230388d2e79adfb7d077ef2efc96e2a12bf0da12a1ecdf5a4f17184f7b3dbbf
2230388d2e79adfb7d077ef2efc96e2a12bf0da12a1ecdf5a4f17184f7b3dbbf-init

查看 -init 的目录结构,会发现存在 hostsresolv.conf 等文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
root@docker-rancher:/var/lib/docker/overlay2# tree 2230388d2e79adfb7d077ef2efc96e2a12bf0da12a1ecdf5a4f17184f7b3dbbf-init
2230388d2e79adfb7d077ef2efc96e2a12bf0da12a1ecdf5a4f17184f7b3dbbf-init
├── committed
├── diff
│   ├── dev
│   │   ├── console
│   │   ├── pts
│   │   └── shm
│   └── etc
│   ├── hostname
│   ├── hosts
│   ├── mtab -> /proc/mounts
│   └── resolv.conf
├── link
├── lower
└── work
└── work

这是因为容器启动的时候,用户想要修改这些文件或目录,但由于这些本是属于镜像层的文件不允许被修改,所以在启动的时候就会单独挂载一个 init 层,通过修改 init 层的文件达到修改的目的。这些修改只对当前容器生效,而在 docker commit 提交为镜像的时候,会将 init 层、 layerDir 镜像层、upperDir 容器读写层进行区分,只提交后两者,init 层不会被提交。

在查看容器层的内容:

1
2
3
4
5
6
7
root@docker-rancher:/var/lib/docker/overlay2# ls -l 2230388d2e79adfb7d077ef2efc96e2a12bf0da12a1ecdf5a4f17184f7b3dbbf
total 8
drwxr-xr-x 9 root root 84 May 23 16:32 diff
-rw-r--r-- 1 root root 26 May 23 16:32 link
-rw-r--r-- 1 root root 550 May 23 16:32 lower
drwxr-xr-x 1 root root 84 May 23 16:32 merged
drwx------ 3 root root 18 May 28 14:43 work

会发现多了一个 merged 目录,也就是 OverlayFS 进行联合挂载后的最终目录,它是由镜像层、init 层联合挂载而来的。

通过 mount 也能看到对应的挂载信息:

1
2
root@docker-rancher:/var/lib/docker/overlay2# mount | grep overlay
overlay on /var/lib/docker/overlay2/2230388d2e79adfb7d077ef2efc96e2a12bf0da12a1ecdf5a4f17184f7b3dbbf/merged type overlay (rw,relatime,lowerdir=/var/lib/docker/overlay2/l/WKVVQYWDX6XM3WQHFJZ3HEVXHA:/var/lib/docker/overlay2/l/2CX3IUCN65XNPMTWYAKXDY36YY:/var/lib/docker/overlay2/l/73UR25NJGJRBHGV4XW76QS23NU:/var/lib/docker/overlay2/l/U5X27QU3RFE72DW4MKLW5B5AA2:/var/lib/docker/overlay2/l/U3T6GTKZI3BVEYC76NT3UB4QKM:/var/lib/docker/overlay2/l/XF5V6O44CJFWHWGXCBVZNDWD3J:/var/lib/docker/overlay2/l/MSBV2RGM5OXHK2ZXI3KDKGQESW:/var/lib/docker/overlay2/l/JLAAA7DI2PCNETKIP3QWNIKJ5V:/var/lib/docker/overlay2/l/RR7DLQW2OR3KJ4ROSWHGKUO7ZJ:/var/lib/docker/overlay2/l/ZE6CWQSI5BLQ4EDMNRZYZXXXUR:/var/lib/docker/overlay2/l/WZHHR5UGHAYDPPM32EXW7LDU7O:/var/lib/docker/overlay2/l/IDOVDLLKLV3O52ISM7OFDTKMZR:/var/lib/docker/overlay2/l/QS5CLJTQ2J2KD4DRBZ2D7SNJH3:/var/lib/docker/overlay2/l/YPMNY7WF5O525QRRKR3CDLVQ2O:/var/lib/docker/overlay2/l/NUBYELS36USEBZQ2YPUEHRJQM4:/var/lib/docker/overlay2/l/QXH2NMXKWNDUZ5OWFCAFXD2IJB:/var/lib/docker/overlay2/l/GLNEY32NVXKYJR7VXTOQJKGOVR:/var/lib/docker/overlay2/l/NLSYPVG4Y6DFRAGFWPRU674NNY:/var/lib/docker/overlay2/l/PDHSJV7EF3QZT6L7M4JIB2XMPU,upperdir=/var/lib/docker/overlay2/2230388d2e79adfb7d077ef2efc96e2a12bf0da12a1ecdf5a4f17184f7b3dbbf/diff,workdir=/var/lib/docker/overlay2/2230388d2e79adfb7d077ef2efc96e2a12bf0da12a1ecdf5a4f17184f7b3dbbf/work)

下图展示了容器层挂载层元数据、镜像层只读层元数据、容器真实数据之间的关系:

Author

Warner Chen

Posted on

2025-06-03

Updated on

2025-06-04

Licensed under

You need to set install_url to use ShareThis. Please set it in _config.yml.
You forgot to set the business or currency_code for Paypal. Please set it in _config.yml.

Comments

You forgot to set the shortname for Disqus. Please set it in _config.yml.