arm-linux head.S 源代碼分析

這是ARM-Linux運行的第一個文件,這些代碼是一個比較獨立的代碼包裹器。其作用就是解壓Linux內核,並將PC指針跳到內核(vmlinux)的第一條指令。

Bootloader中傳入到Linux中的參數總共有三個,Linux中用到的是第二個和第三個。第二個參數是architecture id,第三個是taglist的地址。Architecture idarm芯片在Linux中一定要唯一。TaglistbootloadLinux傳入的參數列表(詳細的解釋請參考《booting arm linux.pdf》)。

//程序的入口點

              .section ".start", #alloc, #execinstr

/*

 * sort out different calling conventions

 */

              .align

start:

              .type       start,#function

              .rept       8//重複8次下面的指令,也就是空出中斷向量表的位置

              mov r0, r0//就是nop指令

              .endr

 

              b     1f

              .word       0x016f2818              @ Magic numbers to help the loader

              .word       start               @ absolute load/run zImage address

              .word       _edata                   @ zImage end address

1:            mov r7, r1                  @ save architecture ID

              mov r8, r2                  @ save atags pointer

 

#ifndef __ARM_ARCH_2__

              /*

               * Booting from Angel - need to enter SVC mode and disable

               * FIQs/IRQs (numeric definitions from angel arm.h source).

               * We only do this if we were in user mode on entry.

               */

              mrs r2, cpsr        @ get current mode

              tst   r2, #3                  @ not user?

              bne       not_angel

              mov r0, #0x17            @ angel_SWIreason_EnterSVC

              swi       0x123456              @ angel_SWI_ARM

not_angel:

              mrs r2, cpsr        @ turn off interrupts to

              orr  r2, r2, #0xc0              @ prevent angel from running

              msr       cpsr_c, r2

#else

              teqp       pc, #0x0c000003         @ turn off interrupts

#endif

 

一定要保證當前運行在SVC模式下,否則會跳到swi裏面去(爲什麼?我不清楚,而且我沒有處理過這個swi)。然後再關閉irqfiq

 

              /*

               * Note that some cache flushing and other stuff may

               * be needed here - is there an Angel SWI call for this?

               */

 

              /*

               * some architecture specific code can be inserted

               * by the linker here, but it should preserve r7, r8, and r9.

               */

 

讀入地址表。因爲我們的代碼可以在任何地址執行,也就是位置無關代碼(PIC),所以我們需要加上一個偏移量。下面有每一個列表項的具體意義。

GOT表的初值是連接器指定的,當時程序並不知道代碼在哪個地址執行。如果當前運行的地址已經和表上的地址不一樣,還要修正GOT表。

              .text

              adr  r0, LC0

              ldmia       r0, {r1, r2, r3, r4, r5, r6, ip, sp}

              subs r0, r0, r1             @ calculate the delta offset

 

                                          @ if delta is zero, we are

              beq       not_relocated           @ running at the address we

                                          @ were linked at.

 

              /*

               * We're running at a different address.  We need to fix

               * up various pointers:

               *   r5 - zImage base address

               *   r6 - GOT start

               *   ip - GOT end

               */

              add r5, r5, r0

              add r6, r6, r0

              add  ip, ip, r0

 

              /*

               * If we're running fully PIC === CONFIG_ZBOOT_ROM = n,

               * we need to fix up pointers into the BSS region.

               *   r2 - BSS start

               *   r3 - BSS end

               *   sp - stack pointer

               */

              add r2, r2, r0

              add r3, r3, r0

              add       sp, sp, r0

 

修改GOT(全局偏移表)表。根據當前的運行地址,修正該表。

              /*

               * Relocate all entries in the GOT table.

               */

1:            ldr  r1, [r6, #0]          @ relocate entries in the GOT

              add r1, r1, r0             @ table.  This fixes up the

              str  r1, [r6], #4          @ C references.

              cmp r6, ip

              blo       1b

 

BSS段,所有的arm程序都需要做這些的。

 

not_relocated:       mov r0, #0

1:            str  r0, [r2], #4          @ clear bss

              str  r0, [r2], #4

              str  r0, [r2], #4

              str  r0, [r2], #4

              cmp r2, r3

              blo       1b

 

正如下面的註釋所說,C環境我們已經設置好了。下面我們要打開cachemmu。爲什麼要這樣做呢?這只是一個解壓程序呀?爲了速度。那爲什麼要開mmu呢,而且只是做一個平板式的映射?還是爲了速度。如果不開mmu的話,就只能打開icache。因爲不開mmu的話就無法實現內存管理,而io區是決不能開dcache的。

 

              /*

               * The C runtime environment should now be setup

               * sufficiently.  Turn the cache on, set up some

               * pointers, and start decompressing.

               */

              bl       cache_on

是不是要跟讀進去呢?對於只是對流程感興趣的人只是知道打開cache就行了。不過跟進去是很有樂趣的,這就是爲什麼雖然Linux如此龐大,但仍有人會孜孜不倦的研究它的每一行代碼的原因吧。反過來說,對於Linux內核的整體把握更加重要,要不然就成盲人摸象了。還有,想做ARM高手的人可以讀Linux下的每一個彙編文件,因爲Linux內核用ARM的東西還是比較全的。

 

              mov r1, sp                  @ malloc space above stack

              add r2, sp, #0x10000       @ 64k max

 

對下面這些地址的理解其實還是很麻煩,但有篇文檔寫得很清楚《About TEXTADDR, ZTEXTADDR, PAGE_OFFSET etc...》。下面程序的意義就是保證解壓地址和當前程序的地址不重疊。上面分配了64KB的空間來做解壓時的數據緩存。

/*

 * Check to see if we will overwrite ourselves.

 *   r4 = final kernel address//內核執行的最終實地址

 *   r5 = start of this image//該程序的首地址

 *   r2 = end of malloc space (and therefore this image)

 * We basically want:

 *   r4 >= r2 -> OK

 *   r4 + image length <= r5 -> OK

 */

              cmp r4, r2

              bhs       wont_overwrite

              add r0, r4, #4096*1024       @ 4MB largest kernel size

              cmp r0, r5

              bls       wont_overwrite

 

如果空間不夠了,只好解壓到緩衝區地址後面。調用decompress_kernel進行解壓縮,這段代碼是用c實現的,和架構無關。

 

              mov r5, r2                  @ decompress after malloc space

              mov r0, r5

              mov r3, r7

              bl       decompress_kernel

 

完成了解壓縮之後,由於空間不夠,內核也沒有解壓到正確的地址,必須通過代碼搬移來搬到指定的地址。搬運過程中有可能會覆蓋掉現在運行的這段代碼,所以必須將有可能會執行到的代碼搬運到安全的地方,這裏用的是解壓縮了的代碼的後面。

 

              add r0, r0, #127

              bic  r0, r0, #127         @ align the kernel length

/*

 * r0     = decompressed kernel length

 * r1-r3  = unused

 * r4     = kernel execution address

 * r5     = decompressed kernel start

 * r6     = processor ID

 * r7     = architecture ID

 * r8     = atags pointer

 * r9-r14 = corrupted

 */

              add r1, r5, r0             @ end of decompressed kernel

              adr  r2, reloc_start

              ldr  r3, LC1

              add r3, r2, r3

1:            ldmia       r2!, {r9 - r14}        @ copy relocation code

              stmia       r1!, {r9 - r14}

              ldmia       r2!, {r9 - r14}

              stmia       r1!, {r9 - r14}

              cmp r2, r3

              blo       1b

 

              bl       cache_clean_flush//因爲有代碼搬移,所以必須先清理(clean)清除(flushcache

              add       pc, r5, r0        @ call relocation code

 

decompress_kernel共有4個參數,解壓的內核地址、緩存區首地址、緩存區尾地址、和芯片ID,返回解壓縮代碼的長度。

 

/*

 * We're not in danger of overwriting ourselves.  Do this the simple way.

 *

 * r4     = kernel execution address

 * r7     = architecture ID

 */

wont_overwrite:       mov r0, r4

              mov r3, r7

              bl       decompress_kernel

              b       call_kernel

 

針對於不會出現代碼覆蓋的情況,就簡單了。直接解壓縮內核並且跳轉到首地址運行。call_kernel這個函數我們會在下面分析它。

 

              .type       LC0, #object

LC0:              .word       LC0               @ r1

              .word       __bss_start              @ r2

              .word       _end                     @ r3

              .word       zreladdr          @ r4

              .word       _start                    @ r5

              .word       _got_start              @ r6

              .word       _got_end        @ ip

              .word       user_stack+4096            @ sp

LC1:              .word       reloc_end - reloc_start

              .size       LC0, . - LC0

 

上面這個就是剛纔我們說過的地址表,裏面有幾個符號的地址定義。LC0是在這裏定義的。Zreladdr是在當前目錄下的Makfile裏定義的。其他的符號是在lds裏定義的。

 

下面我們來分析一下有關cachemmu的代碼。通過這些代碼我們可以看到Linux的高手們是如何通過彙編來實現各個ARM處理器的識別,以達到通用的目的。

/*

 * Turn on the cache.  We need to setup some page tables so that we

 * can have both the I and D caches on.

 *

 * We place the page tables 16k down from the kernel execution address,

 * and we hope that nothing else is using it.  If we're using it, we

 * will go pop!

 *

 * On entry,

 *  r4 = kernel execution address

 *  r6 = processor ID

 *  r7 = architecture number

 *  r8 = atags pointer

 *  r9 = run-time address of "start"  (???)

 * On exit,

 *  r1, r2, r3, r9, r10, r12 corrupted

 * This routine must preserve:

 *  r4, r5, r6, r7, r8

 */

              .align       5

cache_on:       mov r3, #8                     @ cache_on function

              b       call_cache_fn

 

這裏涉及到了很多MMUcachewritebufferTLB的操作和協處理器的編程。具體編程的東西,我就不想多說了,可以對這ARM的手冊逐行的理解。至於爲什麼要這樣做,熟悉了他們的工作原理後也就不難理解了(《ARM嵌入式系統開發》這本書就有個比較好的說明)。因爲這裏包含了太多的代碼搬運、解壓等費時的操作,所以打開cache是有必要的。由於要用到數據cache所以需要對mmu進行配置。爲了簡單這裏製作了一級映射,而且是物理地址和虛擬地址相同的1:1映射。

 

__setup_mmu:       sub  r3, r4, #16384           @ Page directory size

              bic  r3, r3, #0xff        @ Align the pointer

              bic  r3, r3, #0x3f00

/*

 * Initialise the page tables, turning on the cacheable and bufferable

 * bits for the RAM area only.

 */

              mov r0, r3

              mov r9, r0, lsr #18

              mov r9, r9, lsl #18              @ start of RAM

              add       r10, r9, #0x10000000       @ a reasonable RAM size

              mov r1, #0x12

              orr  r1, r1, #3 << 10

              add r2, r3, #16384

1:            cmp r1, r9                  @ if virt > start of RAM

              orrhs       r1, r1, #0x0c            @ set cacheable, bufferable

              cmp r1, r10                @ if virt > end of RAM

              bichs       r1, r1, #0x0c            @ clear cacheable, bufferable

              str  r1, [r0], #4          @ 1:1 mapping

              add r1, r1, #1048576

              teq  r0, r2

              bne       1b

 

參考下面的註釋,如果當前在flash中運行,我們再映射2MB。就算是當前在RAM中執行其實也沒關係,只不過是做了重複工作。

 

/*

 * If ever we are running from Flash, then we surely want the cache

 * to be enabled also for our execution instance...  We map 2MB of it

 * so there is no map overlap problem for up to 1 MB compressed kernel.

 * If the execution is in RAM then we would only be duplicating the above.

 */

              mov r1, #0x1e

              orr  r1, r1, #3 << 10

              mov r2, pc, lsr #20

              orr  r1, r1, r2, lsl #20

              add r0, r3, r2, lsl #2

              str  r1, [r0], #4

              add r1, r1, #1048576

              str  r1, [r0]

              mov       pc, lr

 

__armv4_cache_on:

              mov       r12, lr

              bl       __setup_mmu

              mov r0, #0

              mcr       p15, 0, r0, c7, c10, 4       @ drain write buffer

              mcr       p15, 0, r0, c8, c7, 0 @ flush I,D TLBs

              mrc       p15, 0, r0, c1, c0, 0 @ read control reg

              orr  r0, r0, #0x5000           @ I-cache enable, RR cache replacement

              orr  r0, r0, #0x0030

              bl       __common_cache_on

              mov r0, #0

              mcr       p15, 0, r0, c8, c7, 0 @ flush I,D TLBs

              mov       pc, r12

 

__common_cache_on:

#ifndef DEBUG

              orr  r0, r0, #0x000d           @ Write buffer, mmu

#endif

              mov r1, #-1

              mcr       p15, 0, r3, c2, c0, 0 @ load page table pointer

              mcr       p15, 0, r1, c3, c0, 0 @ load domain access control

              mcr       p15, 0, r0, c1, c0, 0 @ load control register

              mov       pc, lr

 

/*

 * All code following this line is relocatable.  It is relocated by

 * the above code to the end of the decompressed kernel image and

 * executed there.  During this time, we have no stacks.

 *

 * r0     = decompressed kernel length

 * r1-r3  = unused

 * r4     = kernel execution address

 * r5     = decompressed kernel start

 * r6     = processor ID

 * r7     = architecture ID

 * r8     = atags pointer

 * r9-r14 = corrupted

 */

 

下面這段代碼是在解壓空間不夠的情況下需要重新定位的,具體原因上面已經說明。

 

              .align       5

reloc_start:       add  r9, r5, r0

              debug_reloc_start

              mov r1, r4

1:

              .rept 4

              ldmia       r5!, {r0, r2, r3, r10 - r14}       @ relocate kernel

              stmia       r1!, {r0, r2, r3, r10 - r14}

              .endr

 

              cmp r5, r9

              blo       1b

              debug_reloc_end

 

這是最後一個函數了,這個時候一切實質性的工作已經做完。關閉cache,並跳轉到真正的內核入口。

 

call_kernel:     bl       cache_clean_flush

              bl       cache_off

              mov r0, #0                  @ must be zero

              mov r1, r7                  @ restore architecture number

              mov r2, r8                  @ restore atags pointer

              mov       pc, r4                    @ call kernel

 

/*

 * Here follow the relocatable cache support functions for the

 * various processors.  This is a generic hook for locating an

 * entry and jumping to an instruction at the specified offset

 * from the start of the block.  Please note this is all position

 * independent code.

 *

 *  r1  = corrupted

 *  r2  = corrupted

 *  r3  = block offset

 *  r6  = corrupted

 *  r12 = corrupted

 */

 

通過下面函數我們可以通過proc_types結構體數組我們可以順利的找到現在的處理器型號,並且會根據R3的偏移量跳轉到相應的函數中。裏面涉及到協處理器CP15c0的操作,如果有疑問,可以參考ARM相關手冊。

 

call_cache_fn:       adr   r12, proc_types

              mrc       p15, 0, r6, c0, c0     @ get processor ID

1:            ldr  r1, [r12, #0]        @ get value

              ldr  r2, [r12, #4]        @ get mask

              eor  r1, r1, r6             @ (real ^ match)

              tst   r1, r2                  @       & mask

              addeq       pc, r12, r3              @ call cache function

              add       r12, r12, #4*5

              b       1b

 

/*

 * Table for cache operations.  This is basically:

 *   - CPU ID match

 *   - CPU ID mask

 *   - 'cache on' method instruction

 *   - 'cache off' method instruction

 *   - 'cache flush' method instruction

 *

 * We match an entry using: ((real_id ^ match) & mask) == 0

 *

 * Writethrough caches generally only need 'on' and 'off'

 * methods.  Writeback caches _must_ have the flush method

 * defined.

 */

              .type       proc_types,#object

proc_types:

              .word       0x41560600            @ ARM6/610

              .word       0xffffffe0

              b       __arm6_cache_off   @ works, but slow

              b       __arm6_cache_off

              mov       pc, lr

@           b       __arm6_cache_on           @ untested

@           b       __arm6_cache_off

@           b       __armv3_cache_flush

 

              .word       0x00000000            @ old ARM ID

              .word       0x0000f000

              mov       pc, lr

              mov       pc, lr

              mov       pc, lr

 

              .word       0x41007000            @ ARM7/710

              .word       0xfff8fe00

              b       __arm7_cache_off

              b       __arm7_cache_off

              mov       pc, lr

 

              .word       0x41807200            @ ARM720T (writethrough)

              .word       0xffffff00

              b       __armv4_cache_on

              b       __armv4_cache_off

              mov       pc, lr

 

              .word       0x00007000            @ ARM7 IDs

              .word       0x0000f000

              mov       pc, lr

              mov       pc, lr

              mov       pc, lr

 

              @ Everything from here on will be the new ID system.

 

              .word       0x4401a100            @ sa110 / sa1100

              .word       0xffffffe0

              b       __armv4_cache_on

              b       __armv4_cache_off

              b       __armv4_cache_flush

 

              .word       0x6901b110            @ sa1110

              .word       0xfffffff0

              b       __armv4_cache_on

              b       __armv4_cache_off

              b       __armv4_cache_flush

 

              @ These match on the architecture ID

 

              .word       0x00020000            @ ARMv4T

              .word       0x000f0000

              b       __armv4_cache_on

              b       __armv4_cache_off

              b       __armv4_cache_flush

 

              .word       0x00050000            @ ARMv5TE

              .word       0x000f0000

              b       __armv4_cache_on

              b       __armv4_cache_off

              b       __armv4_cache_flush

 

              .word       0x00060000            @ ARMv5TEJ

              .word       0x000f0000

              b       __armv4_cache_on

              b       __armv4_cache_off

              b       __armv4_cache_flush

 

              .word       0x00070000            @ ARMv6

              .word       0x000f0000

              b       __armv4_cache_on

              b       __armv4_cache_off

              b       __armv6_cache_flush

 

              .word       0                   @ unrecognised type

              .word       0

              mov       pc, lr

              mov       pc, lr

              mov       pc, lr

 

              .size       proc_types, . - proc_types

 

/*

 * Turn off the Cache and MMU.  ARMv3 does not support

 * reading the control register, but ARMv4 does.

 *

 * On entry,  r6 = processor ID

 * On exit,   r0, r1, r2, r3, r12 corrupted

 * This routine must preserve: r4, r6, r7

 */

              .align       5

cache_off:       mov r3, #12                     @ cache_off function

              b       call_cache_fn

 

//代碼略

 

這裏分配了4K的空間用來做堆棧。

 

reloc_end:

 

              .align

              .section ".stack", "w"

user_stack:       .space       4096

發佈了19 篇原創文章 · 獲贊 6 · 訪問量 8萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章