A processor implementing the IA-32 architecture can execute in:
A processor implementing the Intel 64 architecture has the three above modes plus IA-32e mode, which has two submodes:
Application programmers only care about the following registers (those in purple only exist in 64-bit processors):

Application programmers can remain oblivious of the rest of the registers:
See the Intel 64/IA-32 SDM Volume 1, Chapter 5 for a nice overview of all of the processor instructions and Volume 2 for complete information.
Also check out super handy reference of instructions at siyobik.info.
The following table shows most of the available instructions, using the instruction names as specified in the Intel syntax. Not every processor supports every instruction, of course.
The vertical bar means OR, the square brackets mean OPTIONAL, and parentheses are used for grouping. For example:
SH(L|R)[D] stands for SHL, SHR, SHLD,
SHRD.
PUSH[A[D] stands for PUSH, PUSHA, PUSHAD.
| INTEGER | FPU | SSE | SSE2 |
|---|---|---|---|
MOV CMOV[N]((L|G|A|B)[E]|E|Z|S|C|O|P) XCHG BSWAP XADD CMPXCHG[8B] PUSH[A[D]] | POP[A[D]] IN | OUT CBW | CWDE | CWD | CDQ MOVSX | MOVZX ADD | ADC SUB | SBB [I]MUL [I]DIV INC | DEC NEG CMP DAA | DAS AAA | AAS | AAM | AAD AND | OR | XOR | NOT SH(L|R)[D] SA(L|R) RO(L|R) RC(L|R) BT[S|R|C] BS(F|R) SET[N]((L|G|A|B)[E]|E|Z|S|C|O|P) TEST JMP J[N]((L|G|A|B)[E]|E|Z|S|C|O|P) J[E]CXZ LOOP[N][Z|E] CALL | RET INT[O] | IRET ENTER | LEAVE BOUND MOVS[B|W|D] CMPS[B|W|D] SCAS[B|W|D] LODS[B|W|D] STOS[B|W|D] INS[B|W|D] OUTS[B|W|D] REP[N][Z|E] STC | CLC | CMC STD | CLD STI | CLI LAHF | SAHF PUSHF[D] | POPF[D] LDS | LES | LFS | LGS | LSS LEA NOP UD2 XLAT[B] CPUID |
F[I]LD F[I]ST[P] FBLD FBSTP FXCH FCMOV[N](E|B|BE|U) FADD[P] FIADD FSUB[R][P] FISUB[R] FMUL[P] FIMUL FDIV[R][P] FIDIV[R] FPREM[1] FABS FCHS FRNDINT FSCALE FSQRT FXTRACT F[U]COM[P][P] FICOM[P] F[U]COMI[P] FTST FXAM FSIN FCOS FSINCOS FPTAN FPATAN F2XM1 FYL2X FYL2XP1 FLD1 FLDZ FLDPI FLDL2E FLDLN2 FLDL2T FLDLG2 FINCSTP FDECSTP FFREE F[N]INIT F[N]CLEX F[N]STCW FLDCW F[N]STENV FLDENV F[N]SAVE FRSTOR F[N]STSW FWAIT | WAIT FNOP FXSAVE FXRSTOR |
MOV(A|U)PS MOV(H|HL|L|LH)PS MOVSS MOVMSKPS ADD(P|S)S SUB(P|S)S MUL(P|S)S DIV(P|S)S RCP(P|S)S SQRT(P|S)S RSQRT(P|S)S MAX(P|S)S MIN(P|S)S CMP(P|S)S [U]COMISS ANDPS ANDNPS ORPS XORPS SHUFPS UNPCK(H|L)PS CVTPI2PS CVT[T]PS2PI CVTSI2SS CVT[T]SS2SI PAVG(B|W) PEXTRW PINSRW P(MIN|MAX)(UB|SW) PMOVMSKB PMULHUW PSADBW PSHUFW LDMXCSR STMXCSR MASKMOVQ MOVNT(Q|PS) PREFETCHT(0|1|2) PREFETCHNTA SFENCE |
MOV(A|U)PD MOV(H|L)PD MOVSD MOVMSKPD ADD(P|S)D SUB(P|S)D MUL(P|S)D DIV(P|S)D SQRT(P|S)D MAX(P|S)D MIN(P|S)D CMP(P|S)D [U]COMISD ANDPD ANDNPD ORPD XORPD SHUFPD UNPCK(H|L)PD CVT(PI|DQ)2PD CVT[T]PD2(PI|DQ) CVTSI2SD CVT[T]SD2SI CVTPS2PD CVTPD2PS CVTDQ2PS CVT[T]PS2DQ CVTSS2SD CVTSD2SS MOVDQ(A|U) MOVQ2DQ MOVDQ2Q PUNPCK(H|L)QDQ PADDQ PSUBQ PMULUDQ PSHUF(LW|HW|D) PS(L|R)LDQ MASKMOVDQU MOVNT(PD|DQ|I) CLFLUSH LFENCE MFENCE PAUSE |
| SYSTEM | MMX | SSE3 | SSE4 |
LGDT | SGDT LLDT | SLDT LTR | STR LIDT | SIDT LMSW | SMSW CLTS ARPL LAR LSL VERR | VERW INVD | WBINVD INVLPG LOCK HLT RSM RDMSR | WRMSR RDPMC RDTSC SYSENTER SYSEXIT |
MOVD MOVQ PACKSS(WB|DW) PACKUSWB PUNPCK(H|L)(BW|WD|DQ) PADD(B|W|D) PADD(S|US)(B|W) PSUB(B|W|D) PSUB(S|US)(B|W) PMUL(H|L)W PMADDWD PCMP(EQ|GT)(B|W|D) PAND PANDN POR PXOR PS(L|R)L(W|D|Q) PSRA(W|D) EMMS |
FISTTP LDDQU ADDSUBP(S|D) HADDP(S|D) HSUBP(S|D) MOVS(H|L)DUP MOVDDUP MONITOR MWAIT |
PMUL(LD|DQ) DPP(D|S) MOVNTDQA BLEND[V](PD|PS) PBLEND(VB|W) PMIN(UW|UD|SB|SD) PMAX(UW|UD|SB|SD) ROUND(P|S)(S|D) EXTRACTPS INSERTPS PINSR(B|D|Q) PEXTR(B|W|D|Q) PMOV(S|Z)X(BW|BD|WD|BQ|WQ|DQ) MPSADBW PHMINPOSUW PTEST PCMPEQQ PACKUSDW PCMP(E|I)STR(I|M) PCMPGTQ CRC32 POPCNT |
| 64-BIT MODE | VIRTUAL MACHINE | SSSE3 | AESNI |
CDQE CMPSQ CMPXCHG16B LODSQ MOVSQ MOVZX STOSQ SWAPGS SYSCALL SYSRET |
VMPTRLD VPTRST VMCLEAR VMREAD VMWRITE VMCALL VMLAUNCH VMRESUME VMXOFF VMXON INVEPT INVVPID |
(Not done yet) |
AESDEC[LAST] AESENC[LAST] AESIMC AESKEYGENASSIST PCLMULQDQ |
In protected mode, applications can choose a flat or segmented memory model (see the SDM Volume 1, Chapter 3 for details); in real mode only a 16-bit segmented model is available. Most programmers will only use protected mode and a flat-memory model, so that's all we'll discuss here.
A memory reference has four parts and is often written as
[SELECTOR : BASE + INDEX * SCALE + OFFSET]
The selector is one of the six segment registers; the base is
one of the eight general purpose registers; the index is any of
the general purpose registers except ESP; the scale is 1, 2, 4,
or 8; and the offset is any 32-bit number.
(Example: [fs:ecx+esi*8+93221].) The minimal
reference consists of only a base register or only an offset;
a scale can only appear if there is an index present.
Sometimes the memory reference is written like this:
selector offset(base,index,scale)
The data types are
| Type name | Number of bits | Bit indices |
|---|---|---|
| Byte | 8 | 7..0 |
| Word | 16 | 15..0 |
| Doubleword | 32 | 32..0 |
| Quadword | 64 | 63..0 |
| Doublequadword | 128 | 127..0 |
The IA-32 is little endian, meaning the least significant bytes come first in memory. For example:
0 12
1 31 byte @ 9 = 1F
2 CB word @ B = FE06
3 74 word @ 6 = 230B
4 67 word @ 1 = CB31
5 45 dword @ A = 7AFE0636
6 0B qword @ 6 = 7AFE06361FA4230B
7 23 word @ 2 = 74CB
8 A4 qword @ 3 = 361FA4230B456774
9 1F dword @ 9 = FE06361F
A 36
B 06
C FE
D 7A
E 12
Note that if you draw memory with the lowest bytes at the bottom, then it is easier to read these values!
Many instructions cause the flags register to be updated. For example if you execute an add instruction and the sum is too big to fit into the destination register, the Overflow flag is set.
3 3 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0 9 8 7 6 5 4 3 2 1 0
+---------------------------------------------------------------+
| | | | | | | | | | |I|V|V|A|V|R| |N| I |O|D|I|T|S|Z| |A| |P| |C|
| | | | | | | | | | |D|I|I|C|M|F| |T| P |F|F|F|F|F|F| |F| |F| |F|
| | | | | | | | | | | |P|F| | | | | | L | | | | | | | | | | | | |
+---------------------------------------------------------------+
The flags are described in Section 3.4.3 of Volume 1 of the SDM. To determine how each instruction affects the flags, see Appendix A of Volume 1 of the SDM.
Sometimes while executing an instruction an exception occurs. There are three types of exceptions.
When exceptions occur, the processor will start executing code in an exception handler associated with that exception. The predefined exceptions are:
| GENERAL EXCEPTIONS | ||||
|---|---|---|---|---|
| 0 | #DE | Divide Error | fault | DIV or IDIV instruction |
| 1 | #DB | Debug | fault trap | ... |
| 3 | #BP | Breakpoint | trap | INT 3 instruction |
| 4 | #OF | Overflow | trap | INTO instruction executed when overflow flag in EFLAGS is set |
| 5 | #BR | Bound Range Exceeded | fault | BOUND instruction |
| 6 | #UD | Undefined Opcode | fault | UD2 instruction, or attempt to execute an opcode that doesn't correspond to any instruction |
| 7 | #NM | Device Not Available | fault | FPU instruction or WAIT instruction on a processor without an FPU that is not linked to a FPU coprocessor |
| 8 | #DF | Double Fault | abort | Exception occurred during an exception handler |
| 10 | #TS | Invalid TSS | fault | Task switch or implicit TSS access |
| 11 | #NP | Not Present | fault | Loading segment registers or accessing system segments |
| 12 | #SS | Stack Segment Fault | fault | Stack operations and SS register loads |
| 13 | #GP | General Protection Fault | fault | Any memory reference and other protection checks |
| 14 | #PF | Page Fault | fault | Any memory reference |
| 16 | #MF | FPU Math Fault | fault | Any FPU instruction #IS - FPU stack overflow #IA - Invalid arithmetic operation #Z - Divide by zero #D - Source operand is a denormal number #O - Overflow in result #U - Underflow in result #P - Inexact result |
| 17 | #AC | Alignment Check | fault | Any data reference in memory |
| 18 | #MC | Machine Fault | abort | Internal Error or bus error |
| 19 | #XF | SIMD Math Fault | fault | Any SIMD instruction #I - Invalid arithmetic operation or source operand #Z - Divide by zero #D - Source operand is a denormal number #O - Overflow in result #U - Underflow in result #P - Inexact result |
The System Developer's Manual contains vast amounts of important information and is required reading for all assembly language programmers. The manual is split into several volumes; links to all volumes are here. Highlights from Volumes 1 and 2: