From Assembly Language To Executable

This page is a brief look at the production of an executable image from an assembly language program. It is Linux-specific; technically it is ELF-specific. We're also assuming the NASM assembler, but the ideas are really universal.

The assembly language file

Here's an example program for our study. What do you think it will do?

; ----------------------------------------------------------------------------
; hello.asm
;
; Writes "Hello, World" to the console using only system calls.
;
; System calls used:
;   4: write(fileid, bufferAddress, numberOfBytes)
;   1: exit(returnCode)
;
; Assembler: NASM
; OS: Linux
; Other libraries: None
; Assemble with "nasm -felf hello.asm"
; Link: ld hello.o
; (The symbol _start is the default entry point for ld)
; ----------------------------------------------------------------------------

	global	_start

	section	.text
_start:

	; write(1, message, 13)
	mov	eax, 4			; system call 4 is write
	mov	ebx, 1			; file handle 1 is stdout
	mov	ecx, message		; address of string to output
	mov	edx, 13			; number of bytes
	int	80h

	; exit(0)
	mov	eax, 1			; system call 1 is exit
	mov	ebx, 0			; we want return code 0
	int	80h
message:
	db	"Hello, World", 0aH, 0

The listing file

You can produce a listing file with

nasm -felf -lhello.lst hello.asm
     1                                  ; ----------------------------------------------------------------------------
     2                                  ; hello.asm
     3                                  ;
     4                                  ; Writes "Hello, World" to the console using only system calls.
     5                                  ;
     6                                  ; System calls used:
     7                                  ;   4: write(fileid, bufferAddress, numberOfBytes)
     8                                  ;   1: exit(returnCode)
     9                                  ;
    10                                  ; Assembler: NASM
    11                                  ; OS: Linux
    12                                  ; Other libraries: None
    13                                  ; Assemble with "nasm -felf hello.asm"
    14                                  ; Link: ld hello.o
    15                                  ; (The symbol _start is the default entry point for ld)
    16                                  ; ----------------------------------------------------------------------------
    17
    18                                  	global	_start
    19
    20                                  	section	.text
    21                                  _start:
    22
    23                                  	; write(1, message, 13)
    24 00000000 B804000000              	mov	eax, 4			; system call 4 is write
    25 00000005 BB01000000              	mov	ebx, 1			; file handle 1 is stdout
    26 0000000A B9[22000000]            	mov	ecx, message		; address of string to output
    27 0000000F BA0D000000              	mov	edx, 13			; number of bytes
    28 00000014 CD80                    	int	80h
    29
    30                                  	; exit(0)
    31 00000016 B801000000              	mov	eax, 1			; system call 1 is exit
    32 0000001B BB00000000              	mov	ebx, 0			; we want return code 0
    33 00000020 CD80                    	int	80h
    34                                  message:
    35 00000022 48656C6C6F2C20576F-     	db	"Hello, World", 0aH, 0
    36 0000002B 726C640A00

The object file

Executing

nasm -felf hello.asm

produces the object file hello.o which is 640 bytes in size. Here it is:

00000000: 7F 45 4C 46  01 01 01 00  00 00 00 00  00 00 00 00     ELF
00000010: 01 00 03 00  01 00 00 00  00 00 00 00  00 00 00 00
00000020: 40 00 00 00  00 00 00 00  34 00 00 00  00 00 28 00    @       4     (
00000030: 07 00 03 00  00 00 00 00  00 00 00 00  00 00 00 00
00000040: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
00000050: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
00000060: 00 00 00 00  00 00 00 00  01 00 00 00  01 00 00 00
00000070: 06 00 00 00  00 00 00 00  60 01 00 00  30 00 00 00            `   0
00000080: 00 00 00 00  00 00 00 00  10 00 00 00  00 00 00 00
00000090: 07 00 00 00  01 00 00 00  00 00 00 00  00 00 00 00
000000A0: 90 01 00 00  1C 00 00 00  00 00 00 00  00 00 00 00
000000B0: 01 00 00 00  00 00 00 00  10 00 00 00  03 00 00 00
000000C0: 00 00 00 00  00 00 00 00  B0 01 00 00  34 00 00 00                4
000000D0: 00 00 00 00  00 00 00 00  01 00 00 00  00 00 00 00
000000E0: 1A 00 00 00  02 00 00 00  00 00 00 00  00 00 00 00
000000F0: F0 01 00 00  60 00 00 00  05 00 00 00  05 00 00 00        `
00000100: 04 00 00 00  10 00 00 00  22 00 00 00  03 00 00 00            "
00000110: 00 00 00 00  00 00 00 00  50 02 00 00  1A 00 00 00            P
00000120: 00 00 00 00  00 00 00 00  01 00 00 00  00 00 00 00
00000130: 2A 00 00 00  09 00 00 00  00 00 00 00  00 00 00 00    *
00000140: 70 02 00 00  08 00 00 00  04 00 00 00  01 00 00 00    p
00000150: 04 00 00 00  08 00 00 00  00 00 00 00  00 00 00 00
00000160: B8 04 00 00  00 BB 01 00  00 00 B9 22  00 00 00 BA               "
00000170: 0D 00 00 00  CD 80 B8 01  00 00 00 BB  00 00 00 00
00000180: CD 80 48 65  6C 6C 6F 2C  20 57 6F 72  6C 64 0A 00      Hello, World
00000190: 00 54 68 65  20 4E 65 74  77 69 64 65  20 41 73 73     The Netwide Ass
000001A0: 65 6D 62 6C  65 72 20 30  2E 39 38 00  00 00 00 00    embler 0.98
000001B0: 00 2E 74 65  78 74 00 2E  63 6F 6D 6D  65 6E 74 00     .text .comment
000001C0: 2E 73 68 73  74 72 74 61  62 00 2E 73  79 6D 74 61    .shstrtab .symta
000001D0: 62 00 2E 73  74 72 74 61  62 00 2E 72  65 6C 2E 74    b .strtab .rel.t
000001E0: 65 78 74 00  00 00 00 00  00 00 00 00  00 00 00 00    ext
000001F0: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
00000200: 01 00 00 00  00 00 00 00  00 00 00 00  04 00 F1 FF
00000210: 00 00 00 00  00 00 00 00  00 00 00 00  03 00 F1 FF
00000220: 00 00 00 00  00 00 00 00  00 00 00 00  03 00 01 00
00000230: 12 00 00 00  22 00 00 00  00 00 00 00  00 00 01 00        "
00000240: 0B 00 00 00  00 00 00 00  00 00 00 00  10 00 01 00
00000250: 00 68 65 6C  6C 6F 2E 61  73 6D 00 5F  73 74 61 72     hello.asm _star
00000260: 74 00 6D 65  73 73 61 67  65 00 00 00  00 00 00 00    t message
00000270: 0B 00 00 00  01 03 00 00  00 00 00 00  00 00 00 00

It is a great idea to pick up a copy of the ELF specification and use it to figure out what each byte in this file means. Once you pay your dues and study the file format with a hand analysis, you can use the objdump utility to get information about the file.

The executable file

Object files are not run directly since in general they will need to be linked to other object files to form complete programs. (If this were not the case, you could never have pre-compiled libraries sitting on your system and would thus have to build everything from source constantly.) We can "link" the hello.o file above with

ld hello.o

and get the file a.out, which is 753 bytes in size:

00000000: 7F 45 4C 46  01 01 01 00  00 00 00 00  00 00 00 00     ELF
00000010: 02 00 03 00  01 00 00 00  80 80 04 08  34 00 00 00                4
00000020: F8 00 00 00  00 00 00 00  34 00 20 00  01 00 28 00            4     (
00000030: 06 00 03 00  01 00 00 00  00 00 00 00  00 80 04 08
00000040: 00 80 04 08  B0 00 00 00  B0 00 00 00  05 00 00 00
00000050: 00 10 00 00  00 00 00 00  00 00 00 00  00 00 00 00
00000060: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
00000070: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
00000080: B8 04 00 00  00 BB 01 00  00 00 B9 A2  80 04 08 BA
00000090: 0D 00 00 00  CD 80 B8 01  00 00 00 BB  00 00 00 00
000000A0: CD 80 48 65  6C 6C 6F 2C  20 57 6F 72  6C 64 0A 00      Hello, World
000000B0: 00 54 68 65  20 4E 65 74  77 69 64 65  20 41 73 73     The Netwide Ass
000000C0: 65 6D 62 6C  65 72 20 30  2E 39 38 00  00 2E 73 79    embler 0.98  .sy
000000D0: 6D 74 61 62  00 2E 73 74  72 74 61 62  00 2E 73 68    mtab .strtab .sh
000000E0: 73 74 72 74  61 62 00 2E  74 65 78 74  00 2E 63 6F    strtab .text .co
000000F0: 6D 6D 65 6E  74 00 00 00  00 00 00 00  00 00 00 00    mment
00000100: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
00000110: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
00000120: 1B 00 00 00  01 00 00 00  06 00 00 00  80 80 04 08
00000130: 80 00 00 00  30 00 00 00  00 00 00 00  00 00 00 00        0
00000140: 10 00 00 00  00 00 00 00  21 00 00 00  01 00 00 00            !
00000150: 00 00 00 00  00 00 00 00  B0 00 00 00  1C 00 00 00
00000160: 00 00 00 00  00 00 00 00  01 00 00 00  00 00 00 00
00000170: 11 00 00 00  03 00 00 00  00 00 00 00  00 00 00 00
00000180: CC 00 00 00  2A 00 00 00  00 00 00 00  00 00 00 00        *
00000190: 01 00 00 00  00 00 00 00  01 00 00 00  02 00 00 00
000001A0: 00 00 00 00  00 00 00 00  E8 01 00 00  D0 00 00 00
000001B0: 05 00 00 00  08 00 00 00  04 00 00 00  10 00 00 00
000001C0: 09 00 00 00  03 00 00 00  00 00 00 00  00 00 00 00
000001D0: B8 02 00 00  39 00 00 00  00 00 00 00  00 00 00 00        9
000001E0: 01 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00
000001F0: 00 00 00 00  00 00 00 00  00 00 00 00  80 80 04 08
00000200: 00 00 00 00  03 00 01 00  00 00 00 00  00 00 00 00
00000210: 00 00 00 00  03 00 02 00  00 00 00 00  00 00 00 00
00000220: 00 00 00 00  03 00 03 00  00 00 00 00  00 00 00 00
00000230: 00 00 00 00  03 00 04 00  00 00 00 00  00 00 00 00
00000240: 00 00 00 00  03 00 05 00  01 00 00 00  00 00 00 00
00000250: 00 00 00 00  04 00 F1 FF  0B 00 00 00  A2 80 04 08
00000260: 00 00 00 00  00 00 01 00  13 00 00 00  B0 80 04 08
00000270: 00 00 00 00  11 00 F1 FF  1A 00 00 00  80 80 04 08
00000280: 00 00 00 00  10 00 01 00  21 00 00 00  B0 90 04 08
00000290: 00 00 00 00  11 00 F1 FF  2D 00 00 00  B0 90 04 08
000002A0: 00 00 00 00  11 00 F1 FF  34 00 00 00  B0 90 04 08            4
000002B0: 00 00 00 00  11 00 F1 FF  00 68 65 6C  6C 6F 2E 61             hello.a
000002C0: 73 6D 00 6D  65 73 73 61  67 65 00 5F  65 74 65 78    sm message _etex
000002D0: 74 00 5F 73  74 61 72 74  00 5F 5F 62  73 73 5F 73    t _start __bss_s
000002E0: 74 61 72 74  00 5F 65 64  61 74 61 00  5F 65 6E 64    tart _edata _end
000002F0: 00

The executable is also in ELF format, so study it if you get the chance. Now there's one final step. The executable file has to be loaded into memory before it can execute. The loading is done by the operating system. When I ran the program once, the code for _start got loaded at address 0x8048080, and this is what I saw in memory when asking gdb to disassemble:

(gdb) disassemble _start
Dump of assembler code for function _start:
0x8048080 <_start>:     mov    $0x4,%eax
0x8048085 <_start+5>:   mov    $0x1,%ebx
0x804808a <_start+10>:  mov    $0x80480a2,%ecx
0x804808f <_start+15>:  mov    $0xd,%edx
0x8048094 <_start+20>:  int    $0x80
0x8048096 <_start+22>:  mov    $0x1,%eax
0x804809b <_start+27>:  mov    $0x0,%ebx
0x80480a0 <_start+32>:  int    $0x80
End of assembler dump.

To see the data:

(gdb) x /16xb message
x /16xb message
0x80480a2 <message>:     0x48 0x65 0x6c 0x6c 0x6f 0x2c 0x20 0x57
0x80480aa <message+8>:   0x6f 0x72 0x6c 0x64 0x0a 0x00 0xff 0xff

There's more in memory, but that should give you an idea. The next thing to try should be a complicated program with many sections.