Run a C program bare metal on an ARM Cortex M3
Edit 2018-08-17: Add section “Files”
Contents
- Introduction
- Files
- Hardware equipment
- Software tools
- C program
- ARM assembler code
- ARM Cortex M3 memory map
- ARM Cortex M3 boot sequence
- Stack pointer
- C prerequisites
- Building
- Running
Introduction
The goal of this article is to run a C program bare metal on an ARM Cortex M3. We will go through the assembler code generated from a small program written in C and come up with the prerequisites that must be in place in order for it to run.
Files
The following files are used in this article:
I use the following makefile to compile and link:
Hardware equipment
-
An ARM-USB-OCD-H JTAG adapter from Olimex
-
A STM32-H103 development board with an ARM Cortex M3 (STM32F103RBT6)
Software tools
-
The following tools that are part of GNU Binutils:
-
GNU Linker (ld) for linking
-
GNU objcopy for converting from elf format to binary format
-
GNU objdump for inspecting the output from GNU GCC and GNU Linker
-
The cross compiler versions of these tools can be obtained as Ubuntu 18.04 packages.
ARM also maintains a GNU Embedded Toolchain for Arm which is available for download at https://developer.arm.com/open-source/gnu-toolchain/gnu-rm/downloads.
I use the Ubuntu package variants in this article with the following versions:
C program
Below is our C program. It defines variables a
, b
and stores their sum in a variable named sum
.
ARM assembler code
We compile the program using the -S
command line option:
The -S
option will give us the corresponding assembler code, see section 3.2 in the gcc documenation:
The full assembler code listing of test_program.s
is shown below. We will divide it into parts and analyze.
ARM assembly attributes
The first part defines ARM assembly attributes which do not correspond to any specific line of code.
ARM specific directives:
-
The
.cpu cortex-m3
directive sets the target processor. Valid names are the same as for the-mcpu
command line option. See GNU Assembler (as) ARM command line options -
The
.eabi_attribute
sets the EABI object attribute tag to value. It is unclear for me what this means. A list of valid tags are available in the GNU Assembler (gas) documentation.
General directives:
- The
.file
directive tells the GNU Assembler (as) that we are about to start a new file. See section 7.32 in the GNU Assembler (gas) documentation.
Variable a
C code
Assembler code
- The
.section
directive puts variablea
in therodata
section since it is declared usingconst
. - The
.align <alignment>
directive pads the locations counter to an absolute alignment storage boundary. For ARM the alignment argument specifies “the number of low-order zero bits the location counter must have after advancement”..align 2
advances the location counter until it is a multiple of 4. - The
.type
directive tells GNU Assembler (as) ifa
is a function symbol or an object symbol. In this casea
is an object symbol. See GNU Assembler (as) .type directive. - The
.size <name>, <expression>
directive sets the size of symbol<name>
. The size is specified by<expression>
. See GNU Assembler (as) .size directive. - Label
a
“represents the current value of the active location counter”. See GNU Assembler (as) Labels”. - The
.word
directive stores the value7
at the current location.
Variable b
C code
Assembler code
- Variable
b
is put in thedata
section instead of in the.rodata
section as variablea
is. The.data
directive “tells as to assemble the following statements onto the end of the data section”. See GNU Assembler (as) .data directive. - The other directives used by
b
are identical to the ones used bya
.
Variable sum
C code
Assembler code
- The
sum
variable is put in the.bss
section since it is uninitialized. - The
.space <size>, <fill>
directive fills<size>
bytes with the value specified with<fill>
. 0 is assumed if<fill>
is omitted. See GNU Assembler (as) .space directive.
Start of main() function
C code
Assembler code
-
The
.text <subsection>
directive “tells as to assemble the following statements onto the end of the text<subsection>
”. Subsection zero will be used by default if not specified. See GNU Assembler (as) .text directive. -
The
.align <alignment>
directive pads the locations counter to an absolute alignment storage boundary. For ARM the alignment argument specifies “the number of low-order zero bits the location counter must have after advancement”..align 2
advances the location counter until it is a multiple of 4. See GNU Assembler (as) .align directive. -
The
.global
directive makes the symbol available to other c files that our file is linked with. See GNU Assembler (as) .global directive. -
The
.syntax unified
directive changes some details regarding how the ARM instruction set is intepreted.unified
is the more modern option. See GNU Assembler (as) ARM directives -
The
.thumb
directive is identical to.code 16
which means that we will use the Thumb instruction set and not the ARM instruction set. See section 3 in STM32F10xxx Programming Manual. -
The
.thumb_func
directive “specifies that the following symbol is the name of a Thumb encoded function. This information is necessary in order to allow the assembler and linker to generate correct code for interworking between Arm and Thumb instructions and should be used even if interworking is not going to be performed. The presence of this directive also implies.thumb
”. See GNU Assembler (as) ARM Directives. -
The
.fpu softvfp
directive sets floating-point unit. Valid names are the same as for the-mfpu
command line option. See GNU Assembler (as) ARM command line options -
The
.type
directive tells GNU Assembler (as) ifmain
is a function symbol or an object symbol. In this casemain
is a function symbol. See GNU Assembler (as) .type directive. -
main:
is a label and “represents the current value of the active location counter, and is, for example, a suitable instruction operand”. See GNU Assembler (as) Section 5.1 Label. -
Lines starting with
@
are comments -
push {r7}
pushes registerr7
onto the stack. See section 3.4.7 in STM32F10xxx Programming Manual. -
add r7, sp, #0
adds the value in registersp
with zero and stores the result in registerr7
. So basically we take the memory address stored insp
and copies to registerr7
I think.
In my understanding the so called frame pointer is stored in register r7
. The frame pointer keeps track of where to restore the stack pointer when returning from a function. See Call stack at Wikipedia. So at the beginning of main()
we store the current value of frame pointer onto the stack. After that we store the current value of the stack pointer in register r7
. This value in r7
shows what value sp
shall be restored to when returning from the main()
function.
Variable assignment in main() function
C code
Assembler code
movs r2, #7
copies the value7
into registerr2
. See section 3.5.6 in STM32F10xxx Programming Manual.-
ldr r3, .L2
loads the memory address of variableb
into registerr3
. See section 3.4.5 in STM32F10xxx Programming Manual..L2
is defined at the end of the assembler code:.L2: .word b .word sum .size main, .-main .ident "GCC: (15:6.3.1+svn253039-1build1) 6.3.1 20170620"
ldr r3, [r3]
loads registerr3
with the current value of variableb
. See section 3.4.2 in STM32F10xxx Programming Manual.add r3, r3, r2
adds the value of registerr3
(variableb
) andr2
(variablea
) and stores the result in registerr3
. See section 3.5.1 in STM32F10xxx Programming Manual.ldr r2, .L2+4
loads the memory address of variablesum
into registerr2
.str r3, [r2]
stores the value in registerr3
into the memory address of thesum
variable. See section 3.4.4 in STM32F10xxx Programming Manual.
End of main() function
C code
Assembler code
mov sp, r7
- At the end of themain()
function we copy the memory address stored in the frame pointer (registerr7
) back to the stack pointer.pop {r7}
pops a value from the stack and stores in registerr7
. See section 3.4.7 in STM32F10xxx Programming Manual. This means that we restore the old frame pointer from the stack into the frame pointerr7
.bx lr
- Finally we restore the Program CounterPC
from the link registerlr
using thebx
instruction. Thebx
instruction will NOT write the address of the next instruction to the link register. See section 3.8.5 in STM32F10xxx Programming Manual.
Labels for memory locations of static variables
Immediately following the end of main()
are the .L3
and .L2
labels.
Assembler code
- The
.L3
label contains an.align 2
directive which will advance the location counter to a multiple of 4. The.L3
label itself seems unused though. - The
.L2
label contains a list of memory addresses pointing to the locations of the static variablesb
andsum
. The.L2
label is being used as operand to theldr
instruction as shown above.
ARM Cortex M3 memory map
The memory map of STM32F103RBT6 is shown in section 2.2 of the STM32F10xxx Programming Manual:
- Address range from
0x0000 0000
to0x1FFF FFFF
is read-only during run time. It is suitable for storing code and immutable data. - Address range from
0x2000 0000
to0x3FFF FFFF
is SRAM (Static random-access memory), i.e. read-and-write. It is suitable for storing mutable data.
The ARM Cortex M3 (STM32F103RBT6) that I use in this article has 20 Kbytes. See section 2.1 in the STM32F103xB Datasheet. 20 Kbytes equals 20 * 1024 = 20480 (0x5000) bytes. This means that the last valid memory address is 0x2000 0000 + 0x5000 - 0x4 = 0x2000 4FFC
.
ARM Cortex M3 boot sequence
The reset vector of STM32F103RBT6 is defined in section 2.3.4 in the STM32F10xxx Programming Manual, it defines the start addresses of exception handlers.
One such exception is system reset whose handler start address is read from 0x0000 0004
. This means that the Cortex M3 will execute the code starting at the memory location read from address 0x0000 0004
.
The first 7 entries in the reset vector are shown below. The least-significant bits of vector start-addresses must be 1, this indicates that the exception handlers are implemented with Thumb code.
The program is stored as a raw binary file and flashed onto the flash memory of the STM32F103RBT6 starting at address 0x0000 0000
. The SRAM memory region, i.e. 0x2000 0000
to 0x2000 4FFC
will contain random data when the CPU starts to execute. Any mutable data that the C code assumes is available in a read/write region must be put there by the startup code.
Stack pointer
The first entry in the vector table shown above sets the initial value of the stack pointer. The stack pointer is used by the push
and pop
instructions at the start and exit of the main()
function as shown above.
The stack of the ARM Cortex M3 (STM32F103RBT6) is full descending according to section 2.1.2 in the STM32F10xxx Programming Manual: “This means the stack pointer indicates the last stacked item on the stack memory. When the processor pushes a new item onto the stack, it decrements the stack pointer and then writes the item to the new memory location.”.
We can set the initial stack pointer address to 0x2000 5000
(see ARM Cortex M3 memory map). The stack pointer will then be decremented to 0x2000 4FFC
on the first push
encountered in the code and store the data there.
C prerequisites
We can compile a number of prerequisities that must be in place in order to execute the assembler code generated from test_program.c
.
-
We need to provide a reset vector starting at memory address
0x0000 0000
. The reset vector must at the bare minimum contain an initial Stack Pointer (SP) value and an address to start execute code upon system reset. The stack pointer is used by thepush
andpop
instructions. -
Make the immutable data in the
.rodata
section available in the read only memory, i.e. address range0x0000 0000
to01FFF FFFF
. Variablea
is located in the.rodata
section. -
Make the mutable data in the
.data
section available in the read/write memory, i.e. adress range0x2000 0000
to0x200 4FFC
. Variableb
is located in the.data
section. -
Make the
bss
section available in the read/write memory too. Also make sure all memory in the.bss
section is initialized to zero. Thesum
variable is located in the.bss
section. See BSS in C at Wikipedia.
We want the STM32F103RBT6 memory to look like the image shown below when the assembler code generated from test_program.c
starts to execute.
We must use the linker script and also write some C startup code in order to get this.
Linker script
The linker script will tell the linker on what memory locations to put different sections of the code. See chapter 3 about Linker Scripts in the GNU Linker documentation.
We can print the sections available in test_program.o
using objdump
.
We have all sections available that we want in our final executable besides vectors
. The various .debug*
sections will be available in the final .elf
file for usage by the GDB debugger, but they will not be included in the “raw” binary copied onto the target.
We can write some C code to define a vector table.
We can now start working on the linker script by adding the vectors
and text
sections to it. They should be put at the beginning of the read-only memory as shown in the picture above. The way to do this is to first set the so called location pointer, symbolized with a dot .
, to 0x0
. Followed by an instruction to the linker to create the final .text
section from the vectors
and .text
sections available in the input files to the linker.
The .rodata
section containing the variable a
will also be put in the read-only memory region directly after the .text
section.
Next up is the .data
section. This section is special compared to the other sections since we want it to be present both in the read-only and read-write parts of the memory.
It should be available in its load time position in the flash memory at system reset. We can put it right after the .rodata
section. We create a symbol named _DATA_ROM_START
which points to this memory location.
However the linker script syntax does not allow us inserting the .data
section into its position on flash right away. Instead we move the location counter to the start of the SRAM (read/write) memory region. The memory address at the start of this region is stored in symbol _DATA_RAM_START
.
We now place the .data
section which contains variable b
. The memory address at the end of the .data
section is stored in symbol _DATA_RAM_END
.
We have only defined the run time position of the .data
section. We need to define the load time address of the .data
section. The syntax for doing this in the linker scipt is to use the AT
keyword. See section 3.6.8.2 in GNU Linker (ld) manual.
The data belonging to the .data
section will not be available in its run time position automatically after system reset. We must write startup code to copy it from its load time position in the flash memory to its run time position in the SRAM memory region.
The .bss
section is put right after the .data
section. We put start and end addresses in the following symbols:
_BSS_START
_BSS_END
The full linker script is shown below.
C startup code
The startup code begins by declaring a number of symbols defined in the linker script.
After that we define a vector table as already shown above. We also have to forward declare the startup
function defined below since we need to reference it in the vector table.
We now write a function named startup()
with the following responsiblities:
- Copy data belonging to the
.data
section from its load time position on flash (ROM) to its run time position in SRAM. - Initialize data in the
.bss
section to zeros. - Call the
main()
function defined intest_program.c
.
We need to make a forward declaration of the main
function since we reference it at the end of the startup
function.
The full C startup code is shown below.
Building
We need to perform a couple of steps to build test_program.c
together with startup.c
using our own linker script.
Compile the C files into object files using gcc
. We use no optimization via the -O0
flag (section 3.10 in the gcc documenation) in order to make the step
command in the GDB debugger work as expected. We use the -g
flag to produce debugging information, see 3.9 in the gcc documenation:
Link the object files according to the rules in our linker script named stm32.ld
using GNU linker ld
.
Use objcopy
to convert the .elf
file from the linker into a “raw” binary. The “raw” binary is what we will run on the target while we will feed the elf
file into the GDB debugger since it contains debugging information.
We can inspect the sections in .test_program.elf
with objdump
as we did with test_program.o
above.
Each section has a virtual memory address (VMA) and a load memory address (LMA). Section 3.1 in the GNU Linker documentation explains VMA and LMA:
Every loadable or allocatable output section has two addresses. The first is the VMA, or virtual memory address. This is the address the section will have when the output file is run. The second is the LMA, or load memory address. This is the address at which the section will be loaded. In most cases the two addresses will be the same. An example of when they might be different is when a data section is loaded into ROM, and then copied into RAM when the program starts up (this technique is often used to initialize global variables in a ROM based system). In this case the ROM address would be the LMA, and the RAM address would be the VMA.
The .data
section has different VMA and LMA as expected. VMA is 0x20000000, i.e. at the beginning of the SRAM. LMA is 0x9c which is right after the .rodata
section in ROM.
The sections also have flags associated with them, e.g. CODE, READONLY and DATA. Some of the flags are self explaining, e.g. the .text
section contains executable CODE, whereas other are harder to understand.
Figure 4-11 of Chapter 4 in System V Application Binary Interface explains a subset of these flags although using different names.
There is also a relevant conversation on stack overflow:
CODE means that the section contains executable code; it is indicated by the SHF_EXECINSTR flag in the section header
DATA means that the section is not executable but is writable, indicated by the presence of the SHF_WRITE flag
READONLY means that the section is neither executable nor writtable and should be placed in read-only memory pages
ALLOC means that the section occupies memory, e.g. memory pages are actually allocated to hold the section content when a process is created, indicated by the SHF_ALLOC flag. Some sections, e.g. those containing debug information, are not read into memory during normal program execution and are not marked as ALLOC to save memory.
We can inspect the symbols with corresponding memory addresses in test_program.elf
using arm-none-eabi-nm
:
We will use the memory addresses of a
, b
and sum
to verify that our program has run correctly.
Running
OpenOCD server
Start the openocd server in one command window. See the previous post Using OpenOCD to flash ARM Cortex M3.
Flashing
We flash test_program.bin
onto the ARM Cortex M3 using OpenOcd.
Connect to the openocd server using telnet in another command window
Halt execution of target in case it is running
Erase content on flash
Flash test_program.bin
Run program but halt directly so that we can control the execution via the debugger (gdb)
Debugging
Run gdb using our test program and connect to the openocd server on port 3333. We use the GDB TUI (Text User Interface) as described in Use GDB on an ARM assembly program.
Display register values in GDB
Set a break point at the beginning of the main()
function in test_program.c
.
Inspect the values of a
, b
and sum
before executing sum = a + b
.
Execute sum = a + b
using the GDB step
command (section 5.2 in GDB manual) and inspect sum
variable again.
The sum
variable now equals 0x0F (15) which is correct.