This page is a brief look at the production of an executable image from an assembly language program. It is Linux-specific; technically it is ELF-specific. We're also assuming the NASM assembler, but the ideas are really universal.
Here's an example program for our study. What do you think it will do?
; ---------------------------------------------------------------------------- ; hello.asm ; ; Writes "Hello, World" to the console using only system calls. ; ; System calls used: ; 4: write(fileid, bufferAddress, numberOfBytes) ; 1: exit(returnCode) ; ; Assembler: NASM ; OS: Linux ; Other libraries: None ; Assemble with "nasm -felf hello.asm" ; Link: ld hello.o ; (The symbol _start is the default entry point for ld) ; ---------------------------------------------------------------------------- global _start section .text _start: ; write(1, message, 13) mov eax, 4 ; system call 4 is write mov ebx, 1 ; file handle 1 is stdout mov ecx, message ; address of string to output mov edx, 13 ; number of bytes int 80h ; exit(0) mov eax, 1 ; system call 1 is exit mov ebx, 0 ; we want return code 0 int 80h message: db "Hello, World", 0aH, 0
You can produce a listing file with
nasm -felf -lhello.lst hello.asm
1 ; ----------------------------------------------------------------------------
2 ; hello.asm
3 ;
4 ; Writes "Hello, World" to the console using only system calls.
5 ;
6 ; System calls used:
7 ; 4: write(fileid, bufferAddress, numberOfBytes)
8 ; 1: exit(returnCode)
9 ;
10 ; Assembler: NASM
11 ; OS: Linux
12 ; Other libraries: None
13 ; Assemble with "nasm -felf hello.asm"
14 ; Link: ld hello.o
15 ; (The symbol _start is the default entry point for ld)
16 ; ----------------------------------------------------------------------------
17
18 global _start
19
20 section .text
21 _start:
22
23 ; write(1, message, 13)
24 00000000 B804000000 mov eax, 4 ; system call 4 is write
25 00000005 BB01000000 mov ebx, 1 ; file handle 1 is stdout
26 0000000A B9[22000000] mov ecx, message ; address of string to output
27 0000000F BA0D000000 mov edx, 13 ; number of bytes
28 00000014 CD80 int 80h
29
30 ; exit(0)
31 00000016 B801000000 mov eax, 1 ; system call 1 is exit
32 0000001B BB00000000 mov ebx, 0 ; we want return code 0
33 00000020 CD80 int 80h
34 message:
35 00000022 48656C6C6F2C20576F- db "Hello, World", 0aH, 0
36 0000002B 726C640A00
Executing
nasm -felf hello.asm
produces the object file hello.o which is 640 bytes in size. Here it is:
00000000: 7F 45 4C 46 01 01 01 00 00 00 00 00 00 00 00 00 ELF 00000010: 01 00 03 00 01 00 00 00 00 00 00 00 00 00 00 00 00000020: 40 00 00 00 00 00 00 00 34 00 00 00 00 00 28 00 @ 4 ( 00000030: 07 00 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000060: 00 00 00 00 00 00 00 00 01 00 00 00 01 00 00 00 00000070: 06 00 00 00 00 00 00 00 60 01 00 00 30 00 00 00 ` 0 00000080: 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 00000090: 07 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 000000A0: 90 01 00 00 1C 00 00 00 00 00 00 00 00 00 00 00 000000B0: 01 00 00 00 00 00 00 00 10 00 00 00 03 00 00 00 000000C0: 00 00 00 00 00 00 00 00 B0 01 00 00 34 00 00 00 4 000000D0: 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 000000E0: 1A 00 00 00 02 00 00 00 00 00 00 00 00 00 00 00 000000F0: F0 01 00 00 60 00 00 00 05 00 00 00 05 00 00 00 ` 00000100: 04 00 00 00 10 00 00 00 22 00 00 00 03 00 00 00 " 00000110: 00 00 00 00 00 00 00 00 50 02 00 00 1A 00 00 00 P 00000120: 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00000130: 2A 00 00 00 09 00 00 00 00 00 00 00 00 00 00 00 * 00000140: 70 02 00 00 08 00 00 00 04 00 00 00 01 00 00 00 p 00000150: 04 00 00 00 08 00 00 00 00 00 00 00 00 00 00 00 00000160: B8 04 00 00 00 BB 01 00 00 00 B9 22 00 00 00 BA " 00000170: 0D 00 00 00 CD 80 B8 01 00 00 00 BB 00 00 00 00 00000180: CD 80 48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 0A 00 Hello, World 00000190: 00 54 68 65 20 4E 65 74 77 69 64 65 20 41 73 73 The Netwide Ass 000001A0: 65 6D 62 6C 65 72 20 30 2E 39 38 00 00 00 00 00 embler 0.98 000001B0: 00 2E 74 65 78 74 00 2E 63 6F 6D 6D 65 6E 74 00 .text .comment 000001C0: 2E 73 68 73 74 72 74 61 62 00 2E 73 79 6D 74 61 .shstrtab .symta 000001D0: 62 00 2E 73 74 72 74 61 62 00 2E 72 65 6C 2E 74 b .strtab .rel.t 000001E0: 65 78 74 00 00 00 00 00 00 00 00 00 00 00 00 00 ext 000001F0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000200: 01 00 00 00 00 00 00 00 00 00 00 00 04 00 F1 FF 00000210: 00 00 00 00 00 00 00 00 00 00 00 00 03 00 F1 FF 00000220: 00 00 00 00 00 00 00 00 00 00 00 00 03 00 01 00 00000230: 12 00 00 00 22 00 00 00 00 00 00 00 00 00 01 00 " 00000240: 0B 00 00 00 00 00 00 00 00 00 00 00 10 00 01 00 00000250: 00 68 65 6C 6C 6F 2E 61 73 6D 00 5F 73 74 61 72 hello.asm _star 00000260: 74 00 6D 65 73 73 61 67 65 00 00 00 00 00 00 00 t message 00000270: 0B 00 00 00 01 03 00 00 00 00 00 00 00 00 00 00
It is a great idea to pick up a copy of the ELF specification and use it to figure out what each byte in this file means. Once you pay your dues and study the file format with a hand analysis, you can use the objdump utility to get information about the file.
Object files are not run directly since in general they will need to be linked to other object files to form complete programs. (If this were not the case, you could never have pre-compiled libraries sitting on your system and would thus have to build everything from source constantly.) We can "link" the hello.o file above with
ld hello.o
and get the file a.out, which is 753 bytes in size:
00000000: 7F 45 4C 46 01 01 01 00 00 00 00 00 00 00 00 00 ELF 00000010: 02 00 03 00 01 00 00 00 80 80 04 08 34 00 00 00 4 00000020: F8 00 00 00 00 00 00 00 34 00 20 00 01 00 28 00 4 ( 00000030: 06 00 03 00 01 00 00 00 00 00 00 00 00 80 04 08 00000040: 00 80 04 08 B0 00 00 00 B0 00 00 00 05 00 00 00 00000050: 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000080: B8 04 00 00 00 BB 01 00 00 00 B9 A2 80 04 08 BA 00000090: 0D 00 00 00 CD 80 B8 01 00 00 00 BB 00 00 00 00 000000A0: CD 80 48 65 6C 6C 6F 2C 20 57 6F 72 6C 64 0A 00 Hello, World 000000B0: 00 54 68 65 20 4E 65 74 77 69 64 65 20 41 73 73 The Netwide Ass 000000C0: 65 6D 62 6C 65 72 20 30 2E 39 38 00 00 2E 73 79 embler 0.98 .sy 000000D0: 6D 74 61 62 00 2E 73 74 72 74 61 62 00 2E 73 68 mtab .strtab .sh 000000E0: 73 74 72 74 61 62 00 2E 74 65 78 74 00 2E 63 6F strtab .text .co 000000F0: 6D 6D 65 6E 74 00 00 00 00 00 00 00 00 00 00 00 mment 00000100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000110: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00000120: 1B 00 00 00 01 00 00 00 06 00 00 00 80 80 04 08 00000130: 80 00 00 00 30 00 00 00 00 00 00 00 00 00 00 00 0 00000140: 10 00 00 00 00 00 00 00 21 00 00 00 01 00 00 00 ! 00000150: 00 00 00 00 00 00 00 00 B0 00 00 00 1C 00 00 00 00000160: 00 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00000170: 11 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 00000180: CC 00 00 00 2A 00 00 00 00 00 00 00 00 00 00 00 * 00000190: 01 00 00 00 00 00 00 00 01 00 00 00 02 00 00 00 000001A0: 00 00 00 00 00 00 00 00 E8 01 00 00 D0 00 00 00 000001B0: 05 00 00 00 08 00 00 00 04 00 00 00 10 00 00 00 000001C0: 09 00 00 00 03 00 00 00 00 00 00 00 00 00 00 00 000001D0: B8 02 00 00 39 00 00 00 00 00 00 00 00 00 00 00 9 000001E0: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 000001F0: 00 00 00 00 00 00 00 00 00 00 00 00 80 80 04 08 00000200: 00 00 00 00 03 00 01 00 00 00 00 00 00 00 00 00 00000210: 00 00 00 00 03 00 02 00 00 00 00 00 00 00 00 00 00000220: 00 00 00 00 03 00 03 00 00 00 00 00 00 00 00 00 00000230: 00 00 00 00 03 00 04 00 00 00 00 00 00 00 00 00 00000240: 00 00 00 00 03 00 05 00 01 00 00 00 00 00 00 00 00000250: 00 00 00 00 04 00 F1 FF 0B 00 00 00 A2 80 04 08 00000260: 00 00 00 00 00 00 01 00 13 00 00 00 B0 80 04 08 00000270: 00 00 00 00 11 00 F1 FF 1A 00 00 00 80 80 04 08 00000280: 00 00 00 00 10 00 01 00 21 00 00 00 B0 90 04 08 00000290: 00 00 00 00 11 00 F1 FF 2D 00 00 00 B0 90 04 08 000002A0: 00 00 00 00 11 00 F1 FF 34 00 00 00 B0 90 04 08 4 000002B0: 00 00 00 00 11 00 F1 FF 00 68 65 6C 6C 6F 2E 61 hello.a 000002C0: 73 6D 00 6D 65 73 73 61 67 65 00 5F 65 74 65 78 sm message _etex 000002D0: 74 00 5F 73 74 61 72 74 00 5F 5F 62 73 73 5F 73 t _start __bss_s 000002E0: 74 61 72 74 00 5F 65 64 61 74 61 00 5F 65 6E 64 tart _edata _end 000002F0: 00
The executable is also in ELF format, so study it if you get the chance. Now there's one final step. The executable file has to be loaded into memory before it can execute. The loading is done by the operating system. When I ran the program once, the code for _start got loaded at address 0x8048080, and this is what I saw in memory when asking gdb to disassemble:
(gdb) disassemble _start Dump of assembler code for function _start: 0x8048080 <_start>: mov $0x4,%eax 0x8048085 <_start+5>: mov $0x1,%ebx 0x804808a <_start+10>: mov $0x80480a2,%ecx 0x804808f <_start+15>: mov $0xd,%edx 0x8048094 <_start+20>: int $0x80 0x8048096 <_start+22>: mov $0x1,%eax 0x804809b <_start+27>: mov $0x0,%ebx 0x80480a0 <_start+32>: int $0x80 End of assembler dump.
To see the data:
(gdb) x /16xb message x /16xb message 0x80480a2 <message>: 0x48 0x65 0x6c 0x6c 0x6f 0x2c 0x20 0x57 0x80480aa <message+8>: 0x6f 0x72 0x6c 0x64 0x0a 0x00 0xff 0xff
There's more in memory, but that should give you an idea. The next thing to try should be a complicated program with many sections.