Course: Architecture 1001: x86-64 Assembly

Slides and subtitles: https://gitlab.com/opensecuritytraining/arch1001_x86-64_asm_slides_and_subtitles
Source code: https://gitlab.com/opensecuritytraining/arch1001_x86-64_asm_code_for_class

Introduction and such

uh… Imma skip that. Sorry Xeno! :P

Refresher

C types

Quote

So a “char” is a single byte, a “short” is two bytes, etc.

The term “word” originally referred to Intel’s native 16 bit data size when x86 was a 16 bit architecture. Thus when it expanded to 32 bit, that size was referred to as a “double word”. This “double word” terminology was adopted as the “DWORD” data type name in Windows programming. Likewise for 64 bit, “QWORD” is often used on Windows.

Bin to Hex to Dec and such

Decimal (base 10)	Binary (base 2)	Hexadecimal (aka “Hex”) (base 16)
00	0000b	0x00
01	0001b	0x01
02	0010b	0x02
03	0011b	0x03
04	0100b	0x04
05	0101b	0x05
06	0110b	0x06
07	0111b	0x07
08	1000b	0x08
09	1001b	0x09
10	1010b	0x0A
11	1011b	0x0B
12	1100b	0x0C
13	1101b	0x0D
14	1110b	0x0E
15	1111b	0x0F

Computer Registers

Registers

are small memory storage ares built-in into the processor (still volatile)
has 16 “general purpose” registers
- RAX: stores function return values
- RBX: base pointer to the data section
- RCX: counter for strings and loop operations
- RDX: I/O pointer
- Note: Xeno marked GREEN for registers that were used in this class and RED for register that were not. There is no different color format in Obsidian, so I’m just gonna bold the registers that were used
- RSP: stack pointer (top)
- RBP: stack frame base pointer
- RSI: string operations, source index pointer
- RDI: string operations, destination index pointer
- RIP: instruction pointer, points to next instruction for execution

On x86, register are 32 bits wide
On x64, register are 64 bits wide

First Instruction - No-Operation (NOP)

no registers, no value, no nothing
it really does nothing lmao
use to pad, align bytes, or to kill times
attacker uses this it to make exploits more reliable
can be 1 byte - 9 bytes (multi-bytes NOPs), can be referring to as
NOP = 0x90

The stack

LIFO data structure (push to the top, pop off the top)
a conceptual area of main memory (RAM)
the stack grows toward lower memory address and the heap grows toward higher memory address
RSP points to the top of the stack, the lowest address which is being used (wtf)
what can you find on the stack?
- “returns addresses”: for example a function calls to another function, it has to push the address onto the stack and pop itself off when it’s done so RSP can points back to the function that originally calls it (uh… I think)
- local variables
- sometimes uses to pass arguments between function
- save spaces for register so function can share registers without smashing the value for each other
- save spaces for register when the compiler has to juggle too many in a function
- dynamically allocated memory via the use of alloca()

Example:

#include <stdio.h>
 
int bar(int y) {
	int a = 3*y;
	printf("bar returned %d", a);
	return a;
}
 
int foo(int x) {
	int b = 5*x;
	printf("foo passed %d", b);
	return bar(b);
}
 
int main() {
	int c = foo(7);
	printf("main passed %d", c);
}

New instructions - Push & Pop

Push

push pushes quadword onto the stack
push automatically decrements the stack pointer RSP by 8
in 64 bits execution mode, operand can be
- a value in a 64 bit register
- a 64-bit value from memory, given in r/mX form

`r/mX`

It’s actually a made up terms by Xeno to call the r/m8, r/m16, r/m32 or r/m64 in the Intel Manual

it’s a way to specify either a register or a memory value in either 8, 16, 32 or 64 bits long
a value inside a square bracket [ ] is usually treated as a memory address, and to fetch the value from that address (kinda like dereferencing a pointer in C)
take 4 forms:
- register: rbx
- memory, base only: [rbx]
- memory, base + index * scale: [rbx+rcx*X] (X = 1, 2, 4, 8)
- memory, base + index * scale + displacement: [rbx+rcx*X+Y]
  - this has natural applicability to multi dimensional array indexing, array of structs, etc
when he (Xeno) says something about instructions support access to memory, he means memory as r/mX form. and it could be either of the 4 forms above.

Address writing convention

Xeno writes 64 bits numbers like this 0x12345678`12345678
it’s from WinDbg

Pop

mostly the same attributes with push but instead of pushing value onto the stack, it pops value off the stack
increment RSP by 0x8

Example:

Notes on 32 bits

from the slides

If you are executing in 32-bit mode, push/pop will add/remove values 32 bits at a time, rather than 64 bits, and thus they decrement/ increment ESP by 4 rather than 8 at a time

Likewise, if you’re in 16-bit mode, they push/pop 16-bit values, and decrement/increment SP by 2 at a time

The RSP game

~~Oh god, I hate playing game while having some sort of score to show how much of an idiot I am ~~

I’m fine, I’m fine. I’m cool. let’s do this.

Level 1: Canonical orientation, rbp at top, rsp at bottom

HIGH ADDRESSES  
================  
b1a570ce11 <- RBP  
================  
d00dad  
================  
501ace  
================  
f1eece  
================  
0b501e7e <- RSP  
================  
LOW ADDRESSES

What is the offset to 0b501e7e ?
(Enter answer in the form of "rsp{+,-}0x??" or "rbp{+,-}0x??", where ?? must always be 2 digits, e.g. rsp-0x00 or rbp+0x08)

Uh.. uuhhhhhhh… rsp-0x00…?

y-yay…

**Level 2:

**HIGH ADDRESSES  
================  
b1ade <- RSP  
================  
decea5ed  
================  
ba11ad  
================  
10ca1e  
================  
0b501e7e  
================  
d0771e  
================  
badd00d  
================  
ca11ab1e <- RBP  
================  
LOW ADDRESSES

What is the offset to d0771e ?
(Enter answer in the form of "rsp{+,-}0x??" or "rbp{+,-}0x??", where ?? must always be 2 digits, e.g. rsp-0x00 or rbp+0x08)**

rbp+0x10, I think.

Okay I’m not gonna copy and paste the rest here since it’s a randomized game and it’s pretty fun to do it yourself. Go play them!

Calling Function

CallASubroutine1.c

int func(){
	return 0xbeef;
}
 
int main(){
	func();
	return 0xf00d;
}

Still CallASubroutine1.c, but in asm

func:
0000000140001000  mov  eax,0x0BEEFh
0000000140001005  ret
 
main:
0000000140001010  sub  rsp,28h
0000000140001014  call func (0140001000h)
0000000140001019  mov  eax, 0x0FOODh
000000014000101E  add  rsp, 28h
0000000140001022  ret

`call` - Call procedure

call’s job is to transfer control to a different function. for example like when a function calls to another function
first it pushes the address of the next instruction onto the stack
then change rip to the address given in the instruction
the address of the function is being called can be specified in multiple ways
- absolute address
- relative address (relative to what, honestly depends on whatever the compiler and the disassembled code spew at us lmao)
- not focus on this, just go along for now

`ret` - Return from procedure

two form
- pop the top of the stack to rip, increment the rsp by 0x08 (aka pop stuff off the stack and throw it in rip). In this form, the instruction is just written as ret
- Pop the top of the stack into rip and also add a constant number of bytes to rsp. In this form, the instruction is written as “ret 0x8”, or “ret 0x20”, etc
  - this is from Windows API

There are 2 ways to write operand instructions:

Intel: Destination ⇐- Source(s)
- Windows. Think algebra or C: y = 2x + 1;
- mov rbp, rsp
- add rsp, 0x14 ; (rsp = rsp + 0x14)
AT&T: Source(s) -⇒ Destination
- *nix/GNU. Think elementary school: 1 + 2 = 3
- mov %rsp, %rbp
- add $0x14,%rsp
- So registers get a % prefix and immediate values get a $
Xeno uses Intel syntax in this course, so there’s that

`mov` - Move

Can move:
- register to register
- memory to register, register to memory
- immediate to register, immediate to memory
but never memory to memory
memory as in [r/mX] form

immediate to memory	immediate to register
`mov [rbx], imm32`	`mov rbx, imm64`
`mov rbx, imm64`	register to register
`mov [rbx+rcx*X], imm32`	`mov rbx, rax`
register to memory	memory to register
`mov [rbx], rax`	`mov rax, [rbx]`
`mov [rbx+rcx*X], rax`	`mov rax, [rbx+rcx*X]`
`mov [rbx+rcx*X+Y], rax`	`mov rax, [rbx+rcx*X+Y]`

`add` & `sub`- Adds and Subtracts

destination can be register or memory
source can be register or memory or immediate
again, no memory to memory on both source and destination
add rsp, 8 -⇒ (rsp = rsp + 8)
sub rax, [rbx*2] -⇒ (rax = rax - memorypointedtoby(rbx*2))

Simple stack diagram 2

#include <stdio.h>
int bar(int y){
	int a = 3*y;
	printf("bar returned %d", a);
	return a;
}
int foo(int x){
	int b = 5*x;
	printf("foo passed %d", b);
	return bar(b);
}
int main(){
	int c = foo(7);
	printf("main passed %d", c);
}

More or less the stack looks like this.

SingleLocalVariable.c

int func(){
  int i = 0x5ca1ab1e;
  return i;
}
int main(){
  return func();
}

Memory Address	Return Address
00000000`0014FE08	00000001`40001349
00000000`0014FE00	undef
00000000`0014FDF8	undef
00000000`0014FDF0	undef
00000000`0014FDE8	undef
00000000`0014FDE0	undef
00000000`0014FDD8	undef
00000000`0014FDD0	00000001`40001029
00000000`0014FDC8	undef
00000000`0014FDC0	undef`5CA1AB1E

In Intel Syntax, for r/mX memory descriptions, it will use things like qword ptr, dword ptr, or word ptr to indicate the size of the data being operated on (8, 4, and 2 bytes respectively)

mov qword ptr [rsp+10h],rax
mov dword ptr [rsp],5CA1AB1Eh
mov word ptr [rsp],ax

So… what do we know so far?

Local variables lead to an allocation of space on the stack, within the function where the variable is scoped to
In VS there is an over-allocation of space for local variables
0x18 reserved for only 0x4 (int) worth of data

Why is VS over-allocating space for a single local variable?

According to the “Stack usage” reference , “The stack will always be maintained 16-byte aligned, except within the prolog (for example, after the return address is pushed) …”

int func3() {
	int i = 0x7a11;
	return i;
}
int func2(){
	int j = 0x7a1e;
	return func3();
}
int func(){
	return func2();
}
int main(){
	return func();
}

From this point on I’ll only include the instructions and what’s worth noting, go read the slides and take the course yourself >:(

`IMUL` - Signed Multiply

It has 13 forms lmao, spread across 3 5 groups

Group 1 - Single Operand
IMUL r/m8	AX = AL * r/m8
IMUL r/m16	DX:AX = AX * r/m16
IMUL r/m32	EDX:EAX = EAX * r/m32
IMUL r/m64	RDX:RAX = RAX * r/m64
Group 2 - Two Operand
IMUL r16, r/m16	r16 = r16 * r/m16
IMUL r32, r/m32	r32 = r32 * r/m32
IMUL r64, r/m64	r64 = r64 * r/m64
Group 3 - Three Operand, 8 Bit Immediate
IMUL r16, r/m16, imm8	r16 = r/m16 * sign-extended imm8
IMUL r32, r/m32, imm8	r32 = r/m32 * sign-extended imm8
IMUL r64, r/m64, imm8	r64 = r/m64 * sign-extended imm8
Group 4 - Three Operand, 16 Bit Immediate
IMUL r16, r/m16, imm16	r16 = r/m16 * imm16
Group 5 - Three Operand, 32 Bit Immediate
IMUL r32, r/m32, imm32	r32 = r/m32 * imm32
IMUL r64, r/m64, imm32	r64 = r/m64 * sign-extended imm32

Example 1

IMUL r/m8	AX = AL * r/m8

r12	`0x84`
rax	0x609966C1A977E177

r12	`0x84`
rax	0x609966C1A977C65C

`MOVZX` - Move with zero extended

Used to move small values (from smaller types) into larger registers (holding larger types)
Support same r->r, r->m, m->r, i->m, i->r forms as normal MOV
“Zero extend” means the CPU unconditionally fills the high order bits of the larger register with zeros
“Sign extend” means the CPU fills the high order bits of the destination larger register with whatever the sign bit is set to on the small value

`MOVSXD` - Move with sign extended XD

MOVSX technically only sign extends from 8 or 16 bit values
If you want to sign extend a 32 bit value to 64 bits, you need to use MOVSXD
There’s no MOVZXD, it’s always just MOVZX

MOVZX/MOVSX - examples

mov eax, 0xF00DFACE
movzx rbx, eax
	;rbx = 0x00000000`F00DFACE
movsxd rbx, eax
	;rbx = 0xFFFFFFFF`F00DFACE

Because the sign bit (most significant bit) of 0xF00DFACE is 1

ArrayLocalVariable.c

short main() {
    short a;
    int b[6];
    long long c;
    a = 0xbabe;
    c = 0xba1b0ab1edb100d;
    b[1] = a;
    b[4] = b[1] + c;
    return b[4];
}

Memory Address	Value
00000000`0014FE08	returns address = `00000001'40001379`
00000000`0014FE00	16-byte-stack-alignment padding
00000000`0014FDFC	undef (alignment padding)
00000000`0014FDF8	undef (alignment padding)
00000000`0014FDF4	`b[5] = undef`
00000000`0014FDF0	`b[4] = 1edacacb`
00000000`0014FDEC	`b[3] = undef`
00000000`0014FDE8	`b[2] = undef`
00000000`0014FDE4	`b[1] = ffffbabe`
00000000`0014FDE0	`b[0] = undef`
00000000`0014FDDC	`c (MSBs) = 0ba1b0ab`
00000000`0014FDD8	`c (LSBs) = 1edb100d`
00000000`0014FDD4	`undef (alignment padding)`
00000000`0014FDD0	`a = undef babe (2 bytes)`

StructLocalVariable.c

typedef struct mystruct{
	short a;
	int b[6];
	long long c;
} mystruct_t;
short main(){
	mystruct_t foo;
	foo.a = 0xbabe;
	foo.c = 0xba1b0ab1edb100d;
	foo.b[1] = foo.a;
	foo.b[4] = foo.b[1] + foo.c;
	return foo.b[4];
}

main:
0000000140001000 sub     rsp,38h
0000000140001004 mov     eax,0FFFFBABEh
0000000140001009 mov     word ptr [rsp],ax
000000014000100D mov     rax,0BA1B0AB1EDB100Dh
0000000140001017 mov     qword ptr [rsp+1Ah],rax
000000014000101C mov     eax,4
0000000140001021 imul    rax,rax,1
0000000140001025 movsx   ecx,word ptr [rsp]
0000000140001029 mov     dword ptr [rsp+rax+2],ecx
000000014000102D mov     eax,4
0000000140001032 imul    rax,rax,1
0000000140001036 movsxd  rax,dword ptr [rsp+rax+2]
000000014000103B add     rax,qword ptr [rsp+1Ah]
0000000140001040 mov     ecx,4
0000000140001045 imul    rcx,rcx,4
0000000140001049 mov     dword ptr [rsp+rcx+2],eax
000000014000104D mov     eax,4
0000000140001052 imul    rax,rax,4
0000000140001056 movzx   eax,word ptr [rsp+rax+2]
000000014000105B add     rsp,38h
000000014000105F ret

Memory Address	Value
00000000`0014FE08	00000001`40001379
00000000`0014FE00	16-byte-stack-alignment padding
00000000`0014FDFC	undef (16 byte alignment padding)
00000000`0014FDF8	undef (16 byte alignment padding)
00000000`0014FDF4	undef (16 byte alignment padding)
00000000`0014FDF0	undef (16 byte alignment padding), c 2 MSBs = 0ba1
00000000`0014FDEC	c 4 middle bytes = b0ab1edb
00000000`0014FDE8	c 2 LSBs = 100d, b[5] 2 MSBs = undef
00000000`0014FDE4	b[5] 2 LSBs = undef, b[4] 2 MSBs = 1eda
00000000`0014FDE0	b[4] 2 LSBs = cacb, b[3] 2 MSBs = undef
00000000`0014FDDC	b[3] 2 LSBs = undef, b[2] 2 MSBs = undef
00000000`0014FDD8	b[2] 2 LSBs = undef, b[1] 2 MSBs = ffff
00000000`0014FDD4	b[1] 2 LSBs = babe, b[0] 2 MSBs = undef
00000000`0014FDD0	b[0] 2 LSBs = undef, a = babe (2 bytes)

TooManyParameter.c

#define uint64 unsigned long long
 
int func(uint64 a, uint64 b, uint64 c, uint64 d, uint64 e){
    int i = a + b - c + d - e;
    return i;
}
 
int main(){
    return func(0x11, 0x22, 0x33, 0x44, 0x55);
}

Memory Address	Value
00000000`0014FE08	return address = `0000000140001399`
00000000`0014FE00	16-byte-stack-alignment padding
00000000`0014FDF8	undef
00000000`0014FDF0	`arg5 = 0x55`
00000000`0014FDE8	`arg4 = r9 = 0x44`
00000000`0014FDE0	`arg3 = r8 = 0x33`
00000000`0014FDD8	`arg2 = rdx = 0x22`
00000000`0014FDD0	`arg1 = rcx = 0x11`
00000000`0014FDC8	return address = `0000000140001078`
00000000`0014FDC0	16-byte-stack-alignment padding
00000000`0014FDB8	16-byte-stack-alignment padding
00000000`0014FDB0	`i = undef'ffffffef`
compares to the example Pass1Parameter.c

This is what they called Shadow Store

Shadow Store

”The x64 Application Binary Interface (ABI) uses a four-register fast-call calling convention by default. Space is allocated on the call stack as a shadow store for callees to save those registers."
"The caller must always allocate sufficient space to store four register parameters, even if the callee doesn’t take that many parameters."
"Any parameters beyond the first four must be stored on the stack after the shadow store before the call.”
Source: https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=msvc-160

CallASubroutine.c

int func(){
	return 0xbeef;
}
 
int main(){
	func();
	return 0xf00d;
}

Calling Conventions

We’re just talking about the MS “x64” ABI and System V “x86-64” ABI in the this section
More examples at http://en.wikipedia.org/wiki/X86_calling_conventions

Caller-save registers

Also called “volatile” registers by MS. I.e. the caller should assume they will be changed by the callee
Registers “belong” to the Callee.
So the caller is in charge of saving the value before a call to a subroutine, and restoring the value after the call returns
VisualStudio: RAX, RCX, RDX, R8, R9, R10, R11
GCC: RAX, RDI, RSI, RDX, RCX, R8, R9, R10, R11

Callee-save registers

Also called “non-volatile” registers by MS. I.e. the caller should assume they will not be changed by the callee
Registers “belong” to the caller
If the callee needs to use more registers than are saved by the caller, the callee is
responsible for making sure the values are stored/restored, so it doesn’t break things for the caller
VisualStudio: RBX, RBP, RDI, RSI, R12-R15
GCC: RBX, RBP, R12-R15

Balance

Both caller and callee are responsible for balancing any register saves they perform (add to the stack), with restores (removal from the stack)
Caller will typically save registers right before the call and restore right after the call
Callee will typically save registers at the beginning of the function and restore at the end of the function

Parameters

int func(int a, int b, int c, int d, int e){
	int i = a+b-c+d-e;
	return i;
}
int main(){
	return func(0x11,0x22,0x33,0x44,0x55);
}

Microsoft `x64` ABI

First 4 parameters (from left to right) are put into RCX, RDX, R8, R9 respectively
- int a ⇒ RCX
- int b ⇒ RDX
- int c ⇒ R8
- int d ⇒ R9
- int e ⇒ pushed onto stack
Remaining parameters “pushed” onto the stack so that the left-most parameter is at the lowest address
Typically mov is used instead of push

System V “x86-64” ABI (GCC et al.)

First 6 parameters (from left to right) are put into RDI, RSI, RDX, RCX, R8, R9 respectively
- int a ⇒ RDX
- int b ⇒ RSI
- int c ⇒ RDX
- int d ⇒ RCX
- int e ⇒ R8
Remaining parameters “pushed” onto the stack so that the left-most parameter is at the lowest address
Typically mov is used instead of push

32-bit Stack Calling Conventions

In 32 bit code, there are many more calling conventions in use

cdecl (default for most C code)
- Caller cleans up the stack
stdcall (for Windows’ Win32 APIs)
- Callee cleans up the stack
Function parameters are pushed onto stack from right to left
- leftmost parameter (the first function parameter) ends up at the lowest address
Both cdecl and stdcall conventions perform explicit stack frame linkage
Each time you enter a new function, the old ebp (“stack frame base pointer” by Intel convention) gets pushed onto the stack, and the new esp (top-of-the stack pointer, which now points at the copy of ebp), gets moved into ebp
Function parameters tend to get referenced as ebp+offset
Local variables tend to get referenced as ebp-offset
Although GCC sometimes references as esp+offset
M$ also supports other conventions such as fastcall which passes first 2 params in EDX, ECX, and then the rest on the stack
It’s like a more limited version of the x86-64 calling convention
No use of stack frames (since that’s extra overhead, which would make it less fast!)

Example:

#include <stdio.h>
int bar(int y){
	int a = 3*y;
	printf("bar returned %d", a);
	return a;
}
int foo(int x){
	int b = 5*x;
	printf("foo passed %d", b);
	return bar(b);
}
int main(){
	int c = foo(7);
	printf("main passed %d", c);
}

`LEA` - Load Effective Address

Uses the mX form (really just “m” in the manual) but is the exception to the rule that the square brackets [] syntax means dereference!
Frequently used with pointer arithmetic, sometimes for just arithmetic in general

Example:

rbx = 0x2, rdx = 0x1000
lea rax, [rdx+rbx*8+5]
rax = 0x1015, not the value at 0x1015

Control Flow

Two forms of control flow: - Conditional - go somewhere if a condition is met. Think “if”s, switches, loops - Unconditional - always go somewhere. Function calls, goto, exceptions, interrupts

JMP - Jump

Unconditionally change RIP to given address
Ways to specify the address:
- Short, relative (RIP = RIP of next instruction + 1 byte sign-extended-to-64-bits displacement)
  - Frequently used in small loops
  - Some disassemblers will indicate this with a mnemonic by writing it as “jmp short”
  - jmp -2 == infinite loop for short relative jmp :)
  - jmp 0000000140001012 doesn’t have the number 0000000140001012 anywhere in it, it’s really jmp 0x0C bytes forward”
- Far, absolute indirect - We’ll discuss in future class
Ways to specify the address:
- Near, relative (RIP = RIP of next instruction + 4 byte sign-extended-to-64-bits displacement)
- Near, absolute indirect (address calculated with r/m64)

`jcc` - Jump If Condition Is Met

If a condition is true, the jump is taken. Otherwise it proceeds to the next instruction
There are more than 4 pages of conditional jump types! Luckily a bunch of them are synonyms for each other.
JNE == JNZ (Jump if not equal, Jump if not zero, both check if the Zero Flag (ZF) == 0)

Architecture - RFLAGS

RFLAGS register holds many single bit flags. Will only ask you to remember the following for now:

Zero Flag (ZF) - Set if the result of some instruction is zero; cleared otherwise.
Sign Flag (SF) - Set equal to the most-significant bit of the result, which is the sign bit of a signed integer. (0 indicates a positive value and 1 indicates a negative value.)

Some Notable Jcc Instructions

JZ/JE: if ZF == 1
JNZ/JNE: if ZF == 0
JLE/JNG : if ZF == 1 or SF != OF
JGE/JNL : if SF == OF
JBE/JNA: if CF 1 OR ZF 1
JB: if CF == 1

Mnemonic translations

B = below, unsigned notion
A = above, unsigned notion
N = Not (like “Not less than:” JNL)
G = greater than, signed notion
L = less than, signed notion
E = Equal (same a Z, zero flag set)

CMP - Compare Two Operands

“The comparison is performed by subtracting the second operand from the first operand and then setting the status flags in the same manner as the SUB instruction.”
What’s the difference from just doing SUB? Difference is that with SUB the result has to be stored somewhere. With CMP the result is computed, the flags are set, but the result is discarded
Modifies CF, OF, SF, ZF, AF, and PF

Xeno’s miniguide

cmp dword ptr [rsp+4] , eax
jne 0000000140001033
 
cmp dword ptr [rsp+4] , eax
jle 0000000140001043
 
cmp dword ptr [rsp+4] , eax
jae 0000000140001043

Inference…

There are different conditions for unsigned (above) vs. signed (greater than)… Which leads to different assembly instructions for unsigned (JA) vs. signed (JG) comparisons…
Which implies the compiler emits different code depending on whether the programmer declared variables as unsigned vs. signed…
Which a reverse engineer / decompiler can use to infer whether variables are likely unsigned or signed
It turns out that for instructions that set status flags (e.g. arithmetic operations), the hardware just does the operation and sets flags as if the operands were both unsigned and signed
Basically the hardware doesn’t know or care about whether the humans are currently interpreting the bits as signed or unsigned. That’s the compiler’s problem to sort out.
The compiler must emit instructions which treat the bits as signed or unsigned based on what’s specified in the high level language

AND - Bitwise AND

C binary operator & (not &&, that’s logical AND)
Destination operand can be r/mX or register
Source operand can be r/mX or register or immediate (No source and destination as r/mXs)

OR - Bitwise OR

C binary operator | (not ||, that’s logical OR)
Destination operand can be r/mX or register
Source operand can be r/mX or register or immediate (No source and destination as r/mXs)

XOR - Bitwise Exclusive OR

C binary operator ”^“
Destination operand can be r/mX or register
Source operand can be r/mX or register or immediate (No source and destination as r/mXs)
FYI XOR is commonly used to zero a register, by XORing it with itself, because it’s faster than a MOV

NOT - One’s Complement Negation

C binary operator ”~” (not !, that’s logical NOT)
Single source/destination operand can be r/mX

INC/DEC - Increment / decrement

Single source/destination operand can be r/mX
Increase or decrease the value by 1
When optimized, compilers will tend to favor not using inc/dec, as directed by the Intel optimization guide. So their presence may be indicative of hand-written, or un-optimized code
Modifies OF, SF, ZF, AF, PF, and CF flags

TEST - Logical Compare

“Computes the bit-wise AND of first operand (source 1 operand) and the second operand (source 2 operand) and sets the SF, ZF, and PF status flags according to the result.”
Like CMP - sets flags, and throws away the result

SAR - Shift Arithmetic Right

Can be explicitly used with the C “>>” operator, if operands are signed
First operand (source and destination) is an r/mX
Second operand is either cl (lowest byte of ecx), or a 1 byte immediate. The 2nd operand is the number of places to shift
It divides the register by 2 for each place the value is shifted. More efficient than a divide instruction
Each bit shifted off the right side is placed in CF

SAL - Shift Arithmetic Left

Actually behaves exactly the same as SHL!
First operand (source and destination) is an r/mX
Second operand is either cl (lowest byte of rcx), or a 1 byte immediate. The 2nd operand is the number of places to shift
It multiplies the register by 2 for each place the value is shifted. More efficient than a multiply instruction
Each bit shifted off the left side is placed in CF

DIV - Unsigned Divide

Three forms:
- Unsigned divide ax by r/m8, al = quotient, ah = remainder
- Unsigned divide edx:eax by r/m32, eax = quotient, edx = remainder
- Unsigned divide rdx:rax by r/m64, rax = quotient, rdx = remainder
If dividend is 32/64bits, edx/rdx will just be set to 0 by the compiler before the instruction (as occurred in the MulDivExample.c code)
If the divisor is 0, a divide by zero exception is raised.

IDIV - Signed Divide

If you were to then change MulDivExample to signed, you would see the IDIV instruction appear
Three forms
- Signed divide ax by r/m8, al = quotient, ah = remainder
- Signed divide edx:eax by r/mX, eax = quotient, edx = remainder
- Signed divide rdx:rax by r/m64, rax = quotient, rdx = remainder
If dividend is 32/64bits, edx/rdx will just be set to 0 by the compiler before the instruction
If the divisor is 0, a divide by zero exception is raised.

I refuse to understand REP STOS >:V

LEAVE - Exit a function

It’s literally just the same thing as the two instructions you’d typically expect to see right before you return from a function that using stack frames:
- mov rsp, rbp
- pop rbp

Intel vs. AT&T Syntax 2

Intel Syntax:
- Preferred on Windows. Think algebra or C: y = 2x + 1;
- Destination ← Source(s) (right to left)
- mov rbp, rsp
- add rsp, 0x14 ; (rsp = rsp + 0x14)
AT&T Syntax:
- Preferred on *nix/GNU. Think elementary school: 1 + 2 = 3
- Source(s) → Destination (left to right)
- mov %rsp, %rbp
- add $0x14,%rsp
- Registers get a % prefix and immediates get a $
Intel indicates size with things like mov qword ptr [rax], rbx, but it’s not in the actual mnemonic of the instruction
”In AT&T syntax the size of memory operands is determined from the last character of the instruction mnemonic. …only when there’s no other way to disambiguate an instruction”- Source: https://sourceware.org/binutils/docs/as/i386_002dVariations.html#i386_002dVariations
- movb - operates on bytes
- movw - operates on word (2 bytes)
- movl - operates on “long” (dword) (4 bytes)
- movq - operates on “quad word” (qword) (8 bytes)
Some mnemonics have been more or less renamed, to better conform to the b/w/l/q mnemonic naming conventions for lengths
- ”cwde” → “cwtl” (convert (sign extend) word to long)
- “movsx” → “movsbw”
Xeno seems to hate it lol
r/mX
- Intel: [base + index*scale + disp]
- AT&T: disp(base, index, scale)

Read The Fun Manuals

http://www.intel.com/products/processor/manuals/
Vol.1 is a summary of life, the universe, and everything about x86
Vol. 2a-d explains all the instructions
Vol. 3a-d are all the gory details for all the extra stuff they’ve added in over the years (MultiMedia eXtentions - MMX, Virtual Machine eXtentions - VMX, virtual memory, 16/64 bit modes, system management mode, etc)
Reminder to use the pre-downloaded Nov 2020 version of the manual which we’ve been using as the standardized reference throughout this class, so we’re all looking at the same information
We’re primarily looking at Vol. 2 in this class

Opcode Column
- Represents the literal byte value(s) which correspond to the given instruction
- In this case, if you were to see a 0x24 followed by a byte, or 0x25 followed by 4 bytes, you would know they were specific forms of the AND instruction
- If it was 0x25, how would you know whether it should be followed by 2 bytes (imm16) or 4 bytes (imm32)?
- The length of the operand depends on if the processor is in 16-bit, 32-bit, or 64-bit mode
- Each mode has a default operand size (i.e. the size of the value)
- For 64-bit mode, the default operand size is 32-bits for most instructions and the default address size is 64-bits
- This means the default interpretation will usually be the ones with the r/m32, r32, imm32, or in this case a specific register like EAX, unless explicitly overridden with special instruction prefixes
Instruction Column
- The human-readable mnemonic which is used to represent the instruction.
- This will frequently contain special encodings such as the “r/mX format” which I’ve previously discussed
Operand Encoding Column: Indicates what forms the operands can take

64bit Column: Whether or not the opcode is valid in 64 bit mode
Compatibility/Legacy Mode Column: Whether or not the opcode is valid in 32/16 bit code
Description Column
- Simple description of the action performed by the instruction
- Typically this just conveys the flavor of the instruction, but the majority of the details are in the main description text

Bomb Lab

Go play them! Source: https://gitlab.com/opensecuritytraining/arch1001_x86-64_asm_code_for_class/-/tree/master

(I may or may not write a walkthrough of the bomb lab in the future)

Conclusion

Special thanks to Xeno Kovah for making this course and many more courses open, as well as the resources, codes, and the relatively free license for me to be able to write these dumb notes
Of course these things I wrote are not endorsed by him.

🌔

Explorer

OpenSecurityTraining2 - Architecture 1001: x86-64 Assembly

Course: Architecture 1001: x86-64 Assembly

Introduction and such

Refresher

C types

Bin to Hex to Dec and such

Computer Registers

Registers

First Instruction - No-Operation (NOP)

The stack

New instructions - Push & Pop

Push

r/mX

Address writing convention

Pop

Notes on 32 bits

The RSP game

Calling Function

call - Call procedure

ret - Return from procedure

mov - Move

add & sub- Adds and Subtracts

Simple stack diagram 2

Why is VS over-allocating space for a single local variable?

IMUL - Signed Multiply

MOVZX - Move with zero extended

MOVSXD - Move with sign extended XD

MOVZX/MOVSX - examples

ArrayLocalVariable.c

StructLocalVariable.c

TooManyParameter.c

Shadow Store

CallASubroutine.c

Calling Conventions

Caller-save registers

Callee-save registers

Balance

Parameters

Microsoft x64 ABI

System V “x86-64” ABI (GCC et al.)

32-bit Stack Calling Conventions

Example:

LEA - Load Effective Address

Control Flow

JMP - Jump

jcc - Jump If Condition Is Met

Architecture - RFLAGS

Some Notable Jcc Instructions

Mnemonic translations

CMP - Compare Two Operands

Xeno’s miniguide

Inference…

AND - Bitwise AND

OR - Bitwise OR

XOR - Bitwise Exclusive OR

NOT - One’s Complement Negation

INC/DEC - Increment / decrement

TEST - Logical Compare

SAR - Shift Arithmetic Right

SAL - Shift Arithmetic Left

DIV - Unsigned Divide

IDIV - Signed Divide

LEAVE - Exit a function

Intel vs. AT&T Syntax 2

Read The Fun Manuals

Bomb Lab

Conclusion

Graph View

Table of Contents

Backlinks

`r/mX`

`call` - Call procedure

`ret` - Return from procedure

`mov` - Move

`add` & `sub`- Adds and Subtracts

`IMUL` - Signed Multiply

`MOVZX` - Move with zero extended

`MOVSXD` - Move with sign extended XD

Microsoft `x64` ABI

`LEA` - Load Effective Address

`jcc` - Jump If Condition Is Met