Course: Architecture 1001: x86-64 Assembly


Introduction and such

uh… Imma skip that. Sorry Xeno! :P

Refresher

C types

Quote

So a “char” is a single byte, a “short” is two bytes, etc.

The term “word” originally referred to Intel’s native 16 bit data size when x86 was a 16 bit architecture. Thus when it expanded to 32 bit, that size was referred to as a “double word”. This “double word” terminology was adopted as the “DWORD” data type name in Windows programming. Likewise for 64 bit, “QWORD” is often used on Windows.

Bin to Hex to Dec and such

Decimal (base 10)Binary (base 2)Hexadecimal (aka “Hex”) (base 16)
000000b0x00
010001b0x01
020010b0x02
030011b0x03
040100b0x04
050101b0x05
060110b0x06
070111b0x07
081000b0x08
091001b0x09
101010b0x0A
111011b0x0B
121100b0x0C
131101b0x0D
141110b0x0E
151111b0x0F

Computer Registers

Registers

  • are small memory storage ares built-in into the processor (still volatile)
  • has 16 “general purpose” registers
    • RAX: stores function return values
    • RBX: base pointer to the data section
    • RCX: counter for strings and loop operations
    • RDX: I/O pointer
    • Note: Xeno marked GREEN for registers that were used in this class and RED for register that were not. There is no different color format in Obsidian, so I’m just gonna bold the registers that were used
    • RSP: stack pointer (top)
    • RBP: stack frame base pointer
    • RSI: string operations, source index pointer
    • RDI: string operations, destination index pointer
    • RIP: instruction pointer, points to next instruction for execution

  • On x86, register are 32 bits wide
  • On x64, register are 64 bits wide

First Instruction - No-Operation (NOP)

  • no registers, no value, no nothing
  • it really does nothing lmao
  • use to pad, align bytes, or to kill times
  • attacker uses this it to make exploits more reliable
  • can be 1 byte - 9 bytes (multi-bytes NOPs), can be referring to as
  • NOP = 0x90

The stack

  • LIFO data structure (push to the top, pop off the top)
  • a conceptual area of main memory (RAM)
  • the stack grows toward lower memory address and the heap grows toward higher memory address
  • RSP points to the top of the stack, the lowest address which is being used (wtf)
  • what can you find on the stack?
    • “returns addresses”: for example a function calls to another function, it has to push the address onto the stack and pop itself off when it’s done so RSP can points back to the function that originally calls it (uh… I think)
    • local variables
    • sometimes uses to pass arguments between function
    • save spaces for register so function can share registers without smashing the value for each other
    • save spaces for register when the compiler has to juggle too many in a function
    • dynamically allocated memory via the use of alloca()

Example:

#include <stdio.h>
 
int bar(int y) {
	int a = 3*y;
	printf("bar returned %d", a);
	return a;
}
 
int foo(int x) {
	int b = 5*x;
	printf("foo passed %d", b);
	return bar(b);
}
 
int main() {
	int c = foo(7);
	printf("main passed %d", c);
}

New instructions - Push & Pop

Push

  • push pushes quadword onto the stack
  • push automatically decrements the stack pointer RSP by 8
  • in 64 bits execution mode, operand can be
    • a value in a 64 bit register
    • a 64-bit value from memory, given in r/mX form

r/mX

It’s actually a made up terms by Xeno to call the r/m8, r/m16, r/m32 or r/m64 in the Intel Manual

  • it’s a way to specify either a register or a memory value in either 8, 16, 32 or 64 bits long
  • a value inside a square bracket [ ] is usually treated as a memory address, and to fetch the value from that address (kinda like dereferencing a pointer in C)
  • take 4 forms:
    • register: rbx
    • memory, base only: [rbx]
    • memory, base + index * scale: [rbx+rcx*X] (X = 1, 2, 4, 8)
    • memory, base + index * scale + displacement: [rbx+rcx*X+Y]
      • this has natural applicability to multi dimensional array indexing, array of structs, etc
  • when he (Xeno) says something about instructions support access to memory, he means memory as r/mX form. and it could be either of the 4 forms above.

Address writing convention

  • Xeno writes 64 bits numbers like this 0x12345678`12345678
  • it’s from WinDbg

Pop

  • mostly the same attributes with push but instead of pushing value onto the stack, it pops value off the stack
  • increment RSP by 0x8

Example:

Notes on 32 bits

from the slides

  • If you are executing in 32-bit mode, push/pop will add/remove values 32 bits at a time, rather than 64 bits, and thus they decrement/ increment ESP by 4 rather than 8 at a time
  • Likewise, if you’re in 16-bit mode, they push/pop 16-bit values, and decrement/increment SP by 2 at a time

The RSP game

~~Oh god, I hate playing game while having some sort of score to show how much of an idiot I am ~~

I’m fine, I’m fine. I’m cool. let’s do this.

Level 1: Canonical orientation, rbp at top, rsp at bottom

HIGH ADDRESSES  
================  
b1a570ce11 <- RBP  
================  
d00dad  
================  
501ace  
================  
f1eece  
================  
0b501e7e <- RSP  
================  
LOW ADDRESSES

What is the offset to 0b501e7e ?
(Enter answer in the form of "rsp{+,-}0x??" or "rbp{+,-}0x??", where ?? must always be 2 digits, e.g. rsp-0x00 or rbp+0x08)

Uh.. uuhhhhhhh… rsp-0x00…?

y-yay…

**Level 2:

**HIGH ADDRESSES  
================  
b1ade <- RSP  
================  
decea5ed  
================  
ba11ad  
================  
10ca1e  
================  
0b501e7e  
================  
d0771e  
================  
badd00d  
================  
ca11ab1e <- RBP  
================  
LOW ADDRESSES

What is the offset to d0771e ?
(Enter answer in the form of "rsp{+,-}0x??" or "rbp{+,-}0x??", where ?? must always be 2 digits, e.g. rsp-0x00 or rbp+0x08)**

rbp+0x10, I think.

Okay I’m not gonna copy and paste the rest here since it’s a randomized game and it’s pretty fun to do it yourself. Go play them!

Calling Function

CallASubroutine1.c

int func(){
	return 0xbeef;
}
 
int main(){
	func();
	return 0xf00d;
}

Still CallASubroutine1.c, but in asm

func:
0000000140001000  mov  eax,0x0BEEFh
0000000140001005  ret
 
main:
0000000140001010  sub  rsp,28h
0000000140001014  call func (0140001000h)
0000000140001019  mov  eax, 0x0FOODh
000000014000101E  add  rsp, 28h
0000000140001022  ret

call - Call procedure

  • call’s job is to transfer control to a different function. for example like when a function calls to another function
  • first it pushes the address of the next instruction onto the stack
  • then change rip to the address given in the instruction
  • the address of the function is being called can be specified in multiple ways
    • absolute address
    • relative address (relative to what, honestly depends on whatever the compiler and the disassembled code spew at us lmao)
    • not focus on this, just go along for now

ret - Return from procedure

  • two form
    • pop the top of the stack to rip, increment the rsp by 0x08 (aka pop stuff off the stack and throw it in rip). In this form, the instruction is just written as ret
    • Pop the top of the stack into rip and also add a constant number of bytes to rsp. In this form, the instruction is written as “ret 0x8”, or “ret 0x20”, etc
      • this is from Windows API

There are 2 ways to write operand instructions:

  • Intel: Destination - Source(s)
    • Windows. Think algebra or C: y = 2x + 1;
    • mov rbp, rsp
    • add rsp, 0x14 ; (rsp = rsp + 0x14)
  • AT&T: Source(s) - Destination
    • *nix/GNU. Think elementary school: 1 + 2 = 3
    • mov %rsp, %rbp
    • add $0x14,%rsp
    • So registers get a % prefix and immediate values get a $
  • Xeno uses Intel syntax in this course, so there’s that

mov - Move

  • Can move:
    • register to register
    • memory to register, register to memory
    • immediate to register, immediate to memory
  • but never memory to memory
  • memory as in [r/mX] form
immediate to memoryimmediate to register
mov [rbx], imm32mov rbx, imm64
mov rbx, imm64register to register
mov [rbx+rcx*X], imm32mov rbx, rax
register to memorymemory to register
mov [rbx], raxmov rax, [rbx]
mov [rbx+rcx*X], raxmov rax, [rbx+rcx*X]
mov [rbx+rcx*X+Y], raxmov rax, [rbx+rcx*X+Y]

add & sub- Adds and Subtracts

  • destination can be register or memory
  • source can be register or memory or immediate
  • again, no memory to memory on both source and destination
  • add rsp, 8 - (rsp = rsp + 8)
  • sub rax, [rbx*2] - (rax = rax - memorypointedtoby(rbx*2))

Simple stack diagram 2

#include <stdio.h>
int bar(int y){
	int a = 3*y;
	printf("bar returned %d", a);
	return a;
}
int foo(int x){
	int b = 5*x;
	printf("foo passed %d", b);
	return bar(b);
}
int main(){
	int c = foo(7);
	printf("main passed %d", c);
}

More or less the stack looks like this.

SingleLocalVariable.c

int func(){
  int i = 0x5ca1ab1e;
  return i;
}
int main(){
  return func();
}
Memory AddressReturn Address
00000000`0014FE0800000001`40001349
00000000`0014FE00undef
00000000`0014FDF8undef
00000000`0014FDF0undef
00000000`0014FDE8undef
00000000`0014FDE0undef
00000000`0014FDD8undef
00000000`0014FDD000000001`40001029
00000000`0014FDC8undef
00000000`0014FDC0undef`5CA1AB1E
  • In Intel Syntax, for r/mX memory descriptions, it will use things like qword ptr, dword ptr, or word ptr to indicate the size of the data being operated on (8, 4, and 2 bytes respectively)
mov qword ptr [rsp+10h],rax
mov dword ptr [rsp],5CA1AB1Eh
mov word ptr [rsp],ax

So… what do we know so far?

  • Local variables lead to an allocation of space on the stack, within the function where the variable is scoped to
  • In VS there is an over-allocation of space for local variables
  • 0x18 reserved for only 0x4 (int) worth of data

Why is VS over-allocating space for a single local variable?

  • According to the “Stack usage” reference , “The stack will always be maintained 16-byte aligned, except within the prolog (for example, after the return address is pushed) …”
int func3() {
	int i = 0x7a11;
	return i;
}
int func2(){
	int j = 0x7a1e;
	return func3();
}
int func(){
	return func2();
}
int main(){
	return func();
}

From this point on I’ll only include the instructions and what’s worth noting, go read the slides and take the course yourself >:(

IMUL - Signed Multiply

  • It has 13 forms lmao, spread across 3 5 groups

Group 1 - Single Operand
IMUL r/m8AX = AL * r/m8
IMUL r/m16DX:AX = AX * r/m16
IMUL r/m32EDX:EAX = EAX * r/m32
IMUL r/m64RDX:RAX = RAX * r/m64
Group 2 - Two Operand
IMUL r16, r/m16r16 = r16 * r/m16
IMUL r32, r/m32r32 = r32 * r/m32
IMUL r64, r/m64r64 = r64 * r/m64
Group 3 - Three Operand, 8 Bit Immediate
IMUL r16, r/m16, imm8r16 = r/m16 * sign-extended imm8
IMUL r32, r/m32, imm8r32 = r/m32 * sign-extended imm8
IMUL r64, r/m64, imm8r64 = r/m64 * sign-extended imm8
Group 4 - Three Operand, 16 Bit Immediate
IMUL r16, r/m16, imm16r16 = r/m16 * imm16
Group 5 - Three Operand, 32 Bit Immediate
IMUL r32, r/m32, imm32r32 = r/m32 * imm32
IMUL r64, r/m64, imm32r64 = r/m64 * sign-extended imm32

Example 1

IMUL r/m8AX = AL * r/m8

Register before imul:

r120x84
rax*0x609966C1A977E177 *

Register after imul:

r120x84
rax*0x609966C1A977C65C *

MOVZX - Move with zero extended

  • Used to move small values (from smaller types) into larger registers (holding larger types)
  • Support same r->r, r->m, m->r, i->m, i->r forms as normal MOV
  • “Zero extend” means the CPU unconditionally fills the high order bits of the larger register with zeros
  • “Sign extend” means the CPU fills the high order bits of the destination larger register with whatever the sign bit is set to on the small value

MOVSXD - Move with sign extended XD

  • MOVSX technically only sign extends from 8 or 16 bit values
  • If you want to sign extend a 32 bit value to 64 bits, you need to use MOVSXD
  • There’s no MOVZXD, it’s always just MOVZX

MOVZX/MOVSX - examples

mov eax, 0xF00DFACE
movzx rbx, eax
	;rbx = 0x00000000`F00DFACE
movsxd rbx, eax
	;rbx = 0xFFFFFFFF`F00DFACE

Because the sign bit (most significant bit) of 0xF00DFACE is 1

ArrayLocalVariable.c

short main() {
    short a;
    int b[6];
    long long c;
    a = 0xbabe;
    c = 0xba1b0ab1edb100d;
    b[1] = a;
    b[4] = b[1] + c;
    return b[4];
}
Memory AddressValue
00000000`0014FE08returns address = 00000001'40001379
00000000`0014FE0016-byte-stack-alignment padding
00000000`0014FDFCundef (alignment padding)
00000000`0014FDF8undef (alignment padding)
00000000`0014FDF4b[5] = undef
00000000`0014FDF0b[4] = 1edacacb
00000000`0014FDECb[3] = undef
00000000`0014FDE8b[2] = undef
00000000`0014FDE4b[1] = ffffbabe
00000000`0014FDE0b[0] = undef
00000000`0014FDDCc (MSBs) = 0ba1b0ab
00000000`0014FDD8c (LSBs) = 1edb100d
00000000`0014FDD4undef (alignment padding)
00000000`0014FDD0a = undef babe (2 bytes)

StructLocalVariable.c

typedef struct mystruct{
	short a;
	int b[6];
	long long c;
} mystruct_t;
short main(){
	mystruct_t foo;
	foo.a = 0xbabe;
	foo.c = 0xba1b0ab1edb100d;
	foo.b[1] = foo.a;
	foo.b[4] = foo.b[1] + foo.c;
	return foo.b[4];
}
main:
0000000140001000 sub     rsp,38h
0000000140001004 mov     eax,0FFFFBABEh
0000000140001009 mov     word ptr [rsp],ax
000000014000100D mov     rax,0BA1B0AB1EDB100Dh
0000000140001017 mov     qword ptr [rsp+1Ah],rax
000000014000101C mov     eax,4
0000000140001021 imul    rax,rax,1
0000000140001025 movsx   ecx,word ptr [rsp]
0000000140001029 mov     dword ptr [rsp+rax+2],ecx
000000014000102D mov     eax,4
0000000140001032 imul    rax,rax,1
0000000140001036 movsxd  rax,dword ptr [rsp+rax+2]
000000014000103B add     rax,qword ptr [rsp+1Ah]
0000000140001040 mov     ecx,4
0000000140001045 imul    rcx,rcx,4
0000000140001049 mov     dword ptr [rsp+rcx+2],eax
000000014000104D mov     eax,4
0000000140001052 imul    rax,rax,4
0000000140001056 movzx   eax,word ptr [rsp+rax+2]
000000014000105B add     rsp,38h
000000014000105F ret
Memory AddressValue
00000000`0014FE0800000001`40001379
00000000`0014FE0016-byte-stack-alignment padding
00000000`0014FDFCundef (16 byte alignment padding)
00000000`0014FDF8undef (16 byte alignment padding)
00000000`0014FDF4undef (16 byte alignment padding)
00000000`0014FDF0undef (16 byte alignment padding), c 2 MSBs = 0ba1
00000000`0014FDECc 4 middle bytes = b0ab1edb
00000000`0014FDE8c 2 LSBs = 100d, b[5] 2 MSBs = undef
00000000`0014FDE4b[5] 2 LSBs = undef, b[4] 2 MSBs = 1eda
00000000`0014FDE0b[4] 2 LSBs = cacb, b[3] 2 MSBs = undef
00000000`0014FDDCb[3] 2 LSBs = undef, b[2] 2 MSBs = undef
00000000`0014FDD8b[2] 2 LSBs = undef, b[1] 2 MSBs = ffff
00000000`0014FDD4b[1] 2 LSBs = babe, b[0] 2 MSBs = undef
00000000`0014FDD0b[0] 2 LSBs = undef, a = babe (2 bytes)

TooManyParameter.c

#define uint64 unsigned long long
 
int func(uint64 a, uint64 b, uint64 c, uint64 d, uint64 e){
    int i = a + b - c + d - e;
    return i;
}
 
int main(){
    return func(0x11, 0x22, 0x33, 0x44, 0x55);
}
Memory AddressValue
00000000`0014FE08return address = 0000000140001399
00000000`0014FE0016-byte-stack-alignment padding
00000000`0014FDF8undef
00000000`0014FDF0arg5 = 0x55
00000000`0014FDE8arg4 = r9 = 0x44
00000000`0014FDE0arg3 = r8 = 0x33
00000000`0014FDD8arg2 = rdx = 0x22
00000000`0014FDD0arg1 = rcx = 0x11
00000000`0014FDC8return address = 0000000140001078
00000000`0014FDC016-byte-stack-alignment padding
00000000`0014FDB816-byte-stack-alignment padding
00000000`0014FDB0i = undef'ffffffef
compares to the example Pass1Parameter.c

This is what they called Shadow Store

Shadow Store

  • ”The x64 Application Binary Interface (ABI) uses a four-register fast-call calling convention by default. Space is allocated on the call stack as a shadow store for callees to save those registers."
  • "The caller must always allocate sufficient space to store four register parameters, even if the callee doesn’t take that many parameters."
  • "Any parameters beyond the first four must be stored on the stack after the shadow store before the call.”
  • Source: https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160

CallASubroutine.c

int func(){
	return 0xbeef;
}
 
int main(){
	func();
	return 0xf00d;
}

Calling Conventions

Caller-save registers

  • Also called “volatile” registers by MS. I.e. the caller should assume they will be changed by the callee
  • Registers “belong” to the Callee.
  • So the caller is in charge of saving the value before a call to a subroutine, and restoring the value after the call returns
  • VisualStudio: RAX, RCX, RDX, R8, R9, R10, R11
  • GCC: RAX, RDI, RSI, RDX, RCX, R8, R9, R10, R11

Callee-save registers

  • Also called “non-volatile” registers by MS. I.e. the caller should assume they will not be changed by the callee
  • Registers “belong” to the caller
  • If the callee needs to use more registers than are saved by the caller, the callee is
  • responsible for making sure the values are stored/restored, so it doesn’t break things for the caller
  • VisualStudio: RBX, RBP, RDI, RSI, R12-R15
  • GCC: RBX, RBP, R12-R15

Balance

  • Both caller and callee are responsible for balancing any register saves they perform (add to the stack), with restores (removal from the stack)
  • Caller will typically save registers right before the call and restore right after the call
  • Callee will typically save registers at the beginning of the function and restore at the end of the function

Parameters

int func(int a, int b, int c, int d, int e){
	int i = a+b-c+d-e;
	return i;
}
int main(){
	return func(0x11,0x22,0x33,0x44,0x55);
}

Microsoft x64 ABI

  • First 4 parameters (from left to right) are put into RCX, RDX, R8, R9 respectively
    • int a RCX
    • int b RDX
    • int c R8
    • int d R9
    • int e pushed onto stack
  • Remaining parameters “pushed” onto the stack so that the left-most parameter is at the lowest address
  • Typically mov is used instead of push

System V “x86-64” ABI (GCC et al.)

  • First 6 parameters (from left to right) are put into RDI, RSI, RDX, RCX, R8, R9 respectively
    • int a RDX
    • int b RSI
    • int c RDX
    • int d RCX
    • int e R8
  • Remaining parameters “pushed” onto the stack so that the left-most parameter is at the lowest address
  • Typically mov is used instead of push

32-bit Stack Calling Conventions

In 32 bit code, there are many more calling conventions in use

  • cdecl (default for most C code)
    • Caller cleans up the stack
  • stdcall (for Windows’ Win32 APIs)
    • Callee cleans up the stack
  • Function parameters are pushed onto stack from right to left
    • leftmost parameter (the first function parameter) ends up at the lowest address
  • Both cdecl and stdcall conventions perform explicit stack frame linkage
  • Each time you enter a new function, the old ebp (“stack frame base pointer” by Intel convention) gets pushed onto the stack, and the new esp (top-of-the stack pointer, which now points at the copy of ebp), gets moved into ebp
  • Function parameters tend to get referenced as ebp+offset
  • Local variables tend to get referenced as ebp-offset
  • Although GCC sometimes references as esp+offset
  • M$ also supports other conventions such as fastcall which passes first 2 params in EDX, ECX, and then the rest on the stack
  • It’s like a more limited version of the x86-64 calling convention
  • No use of stack frames (since that’s extra overhead, which would make it less fast!)

Example:

#include <stdio.h>
int bar(int y){
	int a = 3*y;
	printf("bar returned %d", a);
	return a;
}
int foo(int x){
	int b = 5*x;
	printf("foo passed %d", b);
	return bar(b);
}
int main(){
	int c = foo(7);
	printf("main passed %d", c);
}

LEA - Load Effective Address

  • Uses the mX form (really just “m” in the manual) but is the exception to the rule that the square brackets [] syntax means dereference!
  • Frequently used with pointer arithmetic, sometimes for just arithmetic in general

Example:

rbx = 0x2, rdx = 0x1000
lea rax, [rdx+rbx*8+5]
rax = 0x1015, not the value at 0x1015

Control Flow

Two forms of control flow: - Conditional - go somewhere if a condition is met. Think “if”s, switches, loops - Unconditional - always go somewhere. Function calls, goto, exceptions, interrupts

JMP - Jump

  • Unconditionally change RIP to given address
  • Ways to specify the address:
    • Short, relative (RIP = RIP of next instruction + 1 byte sign-extended-to-64-bits displacement)
      • Frequently used in small loops
      • Some disassemblers will indicate this with a mnemonic by writing it as “jmp short
      • jmp -2 == infinite loop for short relative jmp :)
      • jmp 0000000140001012 doesn’t have the number 0000000140001012 anywhere in it, it’s really jmp 0x0C bytes forward”
    • Far, absolute indirect - We’ll discuss in future class
  • Ways to specify the address:
    • Near, relative (RIP = RIP of next instruction + 4 byte sign-extended-to-64-bits displacement)
    • Near, absolute indirect (address calculated with r/m64)

jcc - Jump If Condition Is Met

  • If a condition is true, the jump is taken. Otherwise it proceeds to the next instruction
  • There are more than 4 pages of conditional jump types! Luckily a bunch of them are synonyms for each other.
  • JNE == JNZ (Jump if not equal, Jump if not zero, both check if the Zero Flag (ZF) == 0)

Architecture - RFLAGS

RFLAGS register holds many single bit flags. Will only ask you to remember the following for now:

  • Zero Flag (ZF) - Set if the result of some instruction is zero; cleared otherwise.
  • Sign Flag (SF) - Set equal to the most-significant bit of the result, which is the sign bit of a signed integer. (0 indicates a positive value and 1 indicates a negative value.)

Some Notable Jcc Instructions

  • JZ/JE: if ZF == 1
  • JNZ/JNE: if ZF == 0
  • JLE/JNG : if ZF == 1 or SF != OF
  • JGE/JNL : if SF == OF
  • JBE/JNA: if CF 1 OR ZF 1
  • JB: if CF == 1

Mnemonic translations

  • B = below, unsigned notion
  • A = above, unsigned notion
  • N = Not (like “Not less than:” JNL)
  • G = greater than, signed notion
  • L = less than, signed notion
  • E = Equal (same a Z, zero flag set)

CMP - Compare Two Operands

  • “The comparison is performed by subtracting the second operand from the first operand and then setting the status flags in the same manner as the SUB instruction.”
  • What’s the difference from just doing SUB? Difference is that with SUB the result has to be stored somewhere. With CMP the result is computed, the flags are set, but the result is discarded
  • Modifies CF, OF, SF, ZF, AF, and PF

Xeno’s miniguide

cmp dword ptr [rsp+4] , eax
jne 0000000140001033
 
cmp dword ptr [rsp+4] , eax
jle 0000000140001043
 
cmp dword ptr [rsp+4] , eax
jae 0000000140001043

Inference…

  • There are different conditions for unsigned (above) vs. signed (greater than)… Which leads to different assembly instructions for unsigned (JA) vs. signed (JG) comparisons…
  • Which implies the compiler emits different code depending on whether the programmer declared variables as unsigned vs. signed…
  • Which a reverse engineer / decompiler can use to infer whether variables are likely unsigned or signed
  • It turns out that for instructions that set status flags (e.g. arithmetic operations), the hardware just does the operation and sets flags as if the operands were both unsigned and signed
  • Basically the hardware doesn’t know or care about whether the humans are currently interpreting the bits as signed or unsigned. That’s the compiler’s problem to sort out.
  • The compiler must emit instructions which treat the bits as signed or unsigned based on what’s specified in the high level language

AND - Bitwise AND

  • C binary operator & (not &&, that’s logical AND)
  • Destination operand can be r/mX or register
  • Source operand can be r/mX or register or immediate (No source and destination as r/mXs)

OR - Bitwise OR

  • C binary operator | (not ||, that’s logical OR)
  • Destination operand can be r/mX or register
  • Source operand can be r/mX or register or immediate (No source and destination as r/mXs)

XOR - Bitwise Exclusive OR

  • C binary operator ”^“
  • Destination operand can be r/mX or register
  • Source operand can be r/mX or register or immediate (No source and destination as r/mXs)
  • FYI XOR is commonly used to zero a register, by XORing it with itself, because it’s faster than a MOV

NOT - One’s Complement Negation

  • C binary operator ”~” (not !, that’s logical NOT)
  • Single source/destination operand can be r/mX

INC/DEC - Increment / decrement

  • Single source/destination operand can be r/mX
  • Increase or decrease the value by 1
  • When optimized, compilers will tend to favor not using inc/dec, as directed by the Intel optimization guide. So their presence may be indicative of hand-written, or un-optimized code
  • Modifies OF, SF, ZF, AF, PF, and CF flags

TEST - Logical Compare

  • “Computes the bit-wise AND of first operand (source 1 operand) and the second operand (source 2 operand) and sets the SF, ZF, and PF status flags according to the result.”
  • Like CMP - sets flags, and throws away the result

SAR - Shift Arithmetic Right

  • Can be explicitly used with the C “>>” operator, if operands are signed
  • First operand (source and destination) is an r/mX
  • Second operand is either cl (lowest byte of ecx), or a 1 byte immediate. The 2nd operand is the number of places to shift
  • It divides the register by 2 for each place the value is shifted. More efficient than a divide instruction
  • Each bit shifted off the right side is placed in CF

SAL - Shift Arithmetic Left

  • Actually behaves exactly the same as SHL!
  • First operand (source and destination) is an r/mX
  • Second operand is either cl (lowest byte of rcx), or a 1 byte immediate. The 2nd operand is the number of places to shift
  • It multiplies the register by 2 for each place the value is shifted. More efficient than a multiply instruction
  • Each bit shifted off the left side is placed in CF

DIV - Unsigned Divide

  • Three forms:
    • Unsigned divide ax by r/m8, al = quotient, ah = remainder
    • Unsigned divide edx:eax by r/m32, eax = quotient, edx = remainder
    • Unsigned divide rdx:rax by r/m64, rax = quotient, rdx = remainder
  • If dividend is 32/64bits, edx/rdx will just be set to 0 by the compiler before the instruction (as occurred in the MulDivExample.c code)
  • If the divisor is 0, a divide by zero exception is raised.

IDIV - Signed Divide

  • If you were to then change MulDivExample to signed, you would see the IDIV instruction appear
  • Three forms
    • Signed divide ax by r/m8, al = quotient, ah = remainder
    • Signed divide edx:eax by r/mX, eax = quotient, edx = remainder
    • Signed divide rdx:rax by r/m64, rax = quotient, rdx = remainder
  • If dividend is 32/64bits, edx/rdx will just be set to 0 by the compiler before the instruction
  • If the divisor is 0, a divide by zero exception is raised.

I refuse to understand REP STOS >:V

LEAVE - Exit a function

  • It’s literally just the same thing as the two instructions you’d typically expect to see right before you return from a function that using stack frames:
    • mov rsp, rbp
    • pop rbp

Intel vs. AT&T Syntax 2

  • Intel Syntax:
    • Preferred on Windows. Think algebra or C: y = 2x + 1;
    • Destination Source(s) (right to left)
    • mov rbp, rsp
    • add rsp, 0x14 ; (rsp = rsp + 0x14)
  • AT&T Syntax:
    • Preferred on *nix/GNU. Think elementary school: 1 + 2 = 3
    • Source(s) Destination (left to right)
    • mov %rsp, %rbp
    • add $0x14,%rsp
    • Registers get a % prefix and immediates get a $
  • Intel indicates size with things like mov qword ptr [rax], rbx, but it’s not in the actual mnemonic of the instruction
  • ”In AT&T syntax the size of memory operands is determined from the last character of the instruction mnemonic. …only when there’s no other way to disambiguate an instruction”- Source: https://sourceware.org/binutils/docs/as/i386_002dVariations.html#i386_002dVariations
    • movb - operates on bytes
    • movw - operates on word (2 bytes)
    • movl - operates on “long” (dword) (4 bytes)
    • movq - operates on “quad word” (qword) (8 bytes)
  • Some mnemonics have been more or less renamed, to better conform to the b/w/l/q mnemonic naming conventions for lengths
    • ”cwde” “cwtl” (convert (sign extend) word to long)
    • “movsx” “movsbw”
  • Xeno seems to hate it lol
  • r/mX
    • Intel: [base + index*scale + disp]
    • AT&T: disp(base, index, scale)

Read The Fun Manuals

  • http://www.intel.com/products/processor/manuals/
  • Vol.1 is a summary of life, the universe, and everything about x86
  • Vol. 2a-d explains all the instructions
  • Vol. 3a-d are all the gory details for all the extra stuff they’ve added in over the years (MultiMedia eXtentions - MMX, Virtual Machine eXtentions - VMX, virtual memory, 16/64 bit modes, system management mode, etc)
  • Reminder to use the pre-downloaded Nov 2020 version of the manual which we’ve been using as the standardized reference throughout this class, so we’re all looking at the same information
  • We’re primarily looking at Vol. 2 in this class

  • Opcode Column
    • Represents the literal byte value(s) which correspond to the given instruction
    • In this case, if you were to see a 0x24 followed by a byte, or 0x25 followed by 4 bytes, you would know they were specific forms of the AND instruction
    • If it was 0x25, how would you know whether it should be followed by 2 bytes (imm16) or 4 bytes (imm32)?
    • The length of the operand depends on if the processor is in 16-bit, 32-bit, or 64-bit mode
    • Each mode has a default operand size (i.e. the size of the value)
    • For 64-bit mode, the default operand size is 32-bits for most instructions and the default address size is 64-bits
    • This means the default interpretation will usually be the ones with the r/m32, r32, imm32, or in this case a specific register like EAX, unless explicitly overridden with special instruction prefixes
  • Instruction Column
    • The human-readable mnemonic which is used to represent the instruction.
    • This will frequently contain special encodings such as the “r/mX format” which I’ve previously discussed
  • Operand Encoding Column: Indicates what forms the operands can take

  • 64bit Column: Whether or not the opcode is valid in 64 bit mode

  • Compatibility/Legacy Mode Column: Whether or not the opcode is valid in 32/16 bit code

  • Description Column

    • Simple description of the action performed by the instruction
    • Typically this just conveys the flavor of the instruction, but the majority of the details are in the main description text

Bomb Lab

Go play them! Source: https://gitlab.com/opensecuritytraining/arch1001_x86-64_asm_code_for_class/-/tree/master

(I may or may not write a walkthrough of the bomb lab in the future)

Conclusion

  • Special thanks to Xeno Kovah for making this course and many more courses open, as well as the resources, codes, and the relatively free license for me to be able to write these dumb notes
  • Of course these things I wrote are not endorsed by him.