Assembly Crash Course
———–ASU CSE 365: Introduction to Cybersecurity
Assembly Crash Course: Computer Architecture
①all roads lead to cpu
-
SourceCode(python/java/javascript)—->Interpreter or JIT—->CPU
-
SourceCode(c/c++/rust)—->Compiler—->CPU
②logic gates
and gate& or gate≥1 xor gate=1 not gate1
③
CU: control units
Assembly Crash Course: Assembly
① nouns: Data
- data we directly give it as part of the instruction
- data that is close at hand(register)
- data in storage(memory)
② verbs: operations
add :add some data together
sub :subtract some data
mul :multiply some data
div :divide some data
mov :move some data into or out of storage
cmp :compare two pieces of data with each other
test :test some other properties of data
③every architecture has its own variant: x86(√)、arm、ppc、mips、risc-v、pdp-11
history: 8085->8086->80186->80286->80386–x86
intel syntax(√) and AT&T syntax
Assembly Crash Course: Data
hexadecimal(base 16), decimal(base 10),octal(base 8), binary(base 2), a binary digit is called a bit
①Expressing text
ASCII(American Standard Code for Information Exchange): Specified how to encode, in 7 bits, the English alphabet and common symbols.
below: the top is the first hex digit, the left is the second hex digit
Uppercase(Lowercase) letters: 0x40(0x60) + LETTER_INDEX_IN_HEX
Digit representations: 0x30 + DIGIT
lower than 0x20(SPACE) are “control characters”: 0x09(tab), 0x0a(newline), 0x07(bell)
ASCII has evolved into UTF-8, used on 98% of the web. Extend more than 8 bits
② Grouping bits into bytes
IBM invented 8-bits EBCDIC in 1963 for use on their terminals. ASCII(1963) replaced it but the 8-bit byte stuck.
③ Grouping bytes into words
most modern architectures are 64-bit
Nibble: 4bits
Byte: 8 bits
word: 2 bytes, 16bits
Double word(dword): 4 bytes, 32 bits
Quad word(qword): 8 bytes, 64 bits
thinking: what happens if add 1 to 0xffffffffffffffff
integer overflow: 1 + 0xffffffffffffffff = 0x10000000000000000
the extra bit gets put in common carry bit storage by the CPU and the result becomes 0
④Expressing negative numbers(-1)
- sign bit(leftmost bit): 0b00000011 == 3 and 0b10000011 == -3
drawback1: 0b00000000 = 0 = 0b10000000
drawback2: arithmetic operations have to be signedness-aware
(unsigned) 0b00000000 - 1 = 0 - 1 = 255 == 0b11111111
(signed) 0b00000000 - 1 = 0 - 1 = -1 == 0b10000001
- two’s complement
- 0 == 0b00000000
- negative numbers are represented as the large positive numbers that they would correlate to
- 0 - 1 == 0b11111111 == 255(unsigned) == -1(signed)
- -1 - 1 == 0b11111110 == 254 == -2
- the leftmost sign is still there, smallest expressible negative number : 0b10000000 = -128
- unsigned: -128->127 signed: 0->255
⑤anatomy of a word
Assembly Crash Course: Registers
CPU need rapid access to data via the Register File, it’s fast and temporary stores for data
“general purpose” registers
- 8085: a,c,d,b,e,h,l
- 8086:ax,cx,dx,bx,
sp
,bp
,si,di - x86:eax,ecx,edx,ebx,
esp
,ebp
,esi,edi - amd64:rax,rcx,rdx,rbx,
rsp
,rbp
,rsi,rdi,r8,r9,r10,r11,r12,r13,r14,r15 - arm:r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12,
r13
,r14
address of the next instruction
- eip(x86), rip(amd64), r15(arm)
①partial accesses on amd64
data specified directly in the instruction is called an Immediate Value
sets rax to 0xffffffffffff0539
|
|
sets rax to 0x0000000000000539
|
|
|
|
②extending data
mov eax, -1
eax is now 0xffffffff(both 4294967295 and -1)
rax is now 0x00000000ffffffff(only 4294967295 )
operate on that -1 in 64-bit land
mov eax, -1
movsx rax, eax
—> do a sign-extending move, preserving the two’s complement value(copies the top bit to the rest of the register)
eax is now 0xffffffff(both 4294967295 and -1)
rax is now 0xffffffffffffffff(both 4294967295 and -1)
③register arithmetic
most arithmetic instructions the first specified register stores the result
④special registers
- can’t directly read from or write to rip , it contains the memory address of the next instruction to be executed (Instruction Pointer)
- careful with rsp , it contains the address of an region of memory to store temporary data (Stack Pointer)
Assembly Crash Course: Memory
Registers: expensive+limited numbers
system Memory: a place to store lots of data and fast
①Process Perspective
Memory <—-> Registers+Dist+Network+Video Card
Process memory is addressed linearly
From: 0x10000 (for security reasons)
To: 0x7fffffffffff (for architecture / OS purposes)
0x10000 | 0x7fffffffffff | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Program Binary Code | Dynamically Allocated Memory(managed by libraries) | [Dynamically Mapped Memory(requested by process)] | Library Code | Process Stack | OS Helper Regions |
②Stack - temporary data storage
registers and immediates can be pushed to stack push rax
, push 0xaabbccdd
(even on 64-bit x86, can only push 32-bit immediates)
values can be popped back off of the stack(to the register) pop rax
[CPU knows: stack address is stored in rsp] (top stack address < bottom stack address)
- push decreases rsp by 8 in hex
- pop increases rsp by 8 in hex
③accessing memory(between register and memory)
load the 64-bit value stored at memory address 0x12345 into rbx:
|
|
store the 64-bit value in rbx into memory at address 0x133337
|
|
push rcx :
|
|
Each addressed memory location contains one byte!!!: 8-byte write at address 0x133337 will write to addresses 0x133337 through 0x13333f
④Memory Endianess
backwards—>in little endian
⑤address calculation
get the calculated address with Load Effective Address(lea)
|
|
limits: reg+reg*(2 or 4 or 8)+value
⑥RIP-Relative Addressing
|
|
also can write immediate values (must specify the size)
|
|
Assembly Crash Course: Control Flow
①CPU execute instructions
jmp: eb====>eb 04 :(skip 4 bytes)
②Conditional Jumps
conditional jumps check conditions stored in the “flags” register: rflags
- Carry Flag: was the 65th bit 1?
- Zero Flag: was the result 0?
- Overflow Flag: did the “wrap” between positive to negative?
- Signed Flag: was the result’s signed bit set (was it negative)?
|
|
③Looping: for, while
|
|
④Function calls
call
pushes rip
and jumps away
ret
pops rip
and jumps to it
c:
|
|
assembly:
|
|
⑤Calling Conventions
callee and caller functions must agree on argument passing
-
Linux x86: push arguments(in reverse order), then call(which pushes return address), return value in eax
-
Linux amd64: rdi, rsi, rdx, rcx, r8, r9, return value in rax
-
Linux arm: r0, r1, r2, r3, return value in r0
Registers are shared between functions, so calling conventions should agree on what registers are protected.
Linux amd64: rbx, rbp, r12, r13, r14, r15 are “callee-saved”(the function you call keeps their values safe on the stack)
Assembly Crash Course: System Calls
①System calls—->jumps to the Operating System
syscall: arguments in rdi,rsi,rdx,r10,r8,r9 return value in rax
n = read(0, buf, 100);
: Reading 100 Bytes from stdin to the stack
|
|
write(1, buf, n)
|
|
examples
②"String" Arguments
a string is a bunch of contiguous bytes in memory, followed by a 0 byte
build a file path for open() on the stackand open /flag file, return the file descriptor number in rax
rsp | rsp+1 | rsp+2 | rsp+3 | rsp+4 | rsp+5 |
---|---|---|---|---|---|
2f(/) | 66(f) | 6c(l) | 61(a) | 67(g) | 00(\0) |
|
|
open() has an argument flag to determine how the file will be opened.
- O_RDONLY(read-only), O_WRONLY(write-only), O_RDWR(read/write)
③Quitting the Program
|
|
Assembly Crash Course: Building Programs
①from assembly to binary
|
|
.intel_syntax tells the assembler that we are using Intel assembly syntax, noprefix tells it that we will not prefix all register names with “%”
|
|
gcc -nostdlib -o x x.s
②running the program
|
|
③reading assembly
|
|
④extracting the binary code
gcc builds assembly into a full ELF program. We can extract just our binary code using :
|
|
⑤Debugging(debugger: gdb)
debuggers use special debug instructions, and the debugged program is interrupted and we can inspect its state.
|
|
GDB / strace(figure out how program is interacting with OS) / Rappel / Documentation of x86
embryoasm
———–send/craft/assemble/pipe raw bytes over stdin to program
registers
level1: mov—->* rdi = 0x11
|
|
shell: to get the flag
|
|
something to mention:
|
|
level2: add—>* add 0x11 to rdi
|
|
level3: function—>rax:f(x) = mx + b
m = rdi, x = rsi, b = rdx
|
|
level4: divide
|
|
level5: modulo
|
|
level6: lower register
independent access to lower register bytes
64bits | 32bits | 16bits | 8bits |
---|---|---|---|
rax | eax | ax | ah al |
rdi | edi | di | dil |
only use the ‘mov’ to compute:
- rax = rdi modulo 256 ,256= 28 ———->8—1,0000,0000
- rbx = rsi modulo 65536 ,65536=216——->16—1,0000,0000,0000,0000
|
|
If B is a power of 2, A % B
can be simplified to A & (B-1)
. A can be any number, B = 20 ,21 ,22 ,2N …(If B is 256, so B-1 is FFFF,FFFF in binary)
level7: shl,shr—->it will add 0 in another side
shift: rax=10001010 , after the instructionshl rax, 1
, rax=00010100——>8bits
register has 64bits=8*8bits
shl reg1, x <=> Shift reg1 left by x shr reg1, x <=> Shift reg1 right by x
rdi = | B7 | B6 | B5 | B4 | B3 | B2 | B1| B0 |, and set the rax to the value of B4, (rdi=0x77665544332211)
|
|
level8: and,or,xor,no—->bitwise logic
rax = rdi AND rsi
|
|
level9: and,or,xor
|
|
tips: We judge it by the value on the smallest bit.==>0: even, 1: odd
|
|
memory
level10: AddressOperation
mov rax, [some_address] <=> Moves the thing at ‘some_address’ into rax
task: move the [xxx] to rax, and then value in [xxx] should add yyy
|
|
level11: byte,word,dword,qword
memory size:
- Quad Word = 8 Bytes = 64 bits rax 0x1234567812345678
- Double Word = 4 bytes = 32 bits eax 0x12345678
- Word = 2 bytes = 16 bits ax 0x1234
- Byte = 1 byte = 8 bits ah, al 0x12
perform:
1. Set rax to the byte at 0x404000
2. Set rbx to the word at 0x404000
3. Set rcx to the double word at 0x404000
4. Set rdx to the quad word at 0x404000
|
|
level12:
Little Endian : values are stored in reverse order of how we represent them
[0x1330] = 0x00000000deadc0de
[0x1330] = 0xde 0xc0 0xad 0xde 0x00 0x00 0x00 0x00 <——–actually in memory
Register indirect addressing : Perform—-> set [rdi] = 0xaaa , can’t directly use like this
|
|
level13:
|
|
relative addressing ——-> perform:
- Load two consecutive quad words from the address stored in rdi, get a, b.
- Get the sum of a, b.
- Store the sum at the address in rsi.
|
|
stack
level14:
stack: last in first out(LIFO) memory structure and push value into it and pop value out of it.
perform: “Subtract rdi from the top value on the stack” means TopValue in stack - rdi
|
|
level15: exchange —-> swap rdi, rsi only use the push and pop
|
|
level16: rsp—->rsp points to the top of the stack, can use the [rsp] to access the value at the memory address in rsp.
perform: calculate average of 4 consecutive qwords on the stack, and store it to the top of the stack
|
|
control flow manipulation : directly or indirectly control the regester “RIP”
level17: jumps
- unconditional jumps and conditional jumps
- Relative jumps and Absolute jumps and Indirect jumps
relative jump: we should fill space in the code to make it possible so we use the nop
like
|
|
perform:
- Make the first instruction in your code a jmp
- Make that jmp a relative jump to 0x51 bytes from its current position
- At 0x51 write the following code:
- Place the top value on the stack into register rdi
- jmp to the absolute address 0x403000
we should use the .rept count ... .endr
: Repeat the sequence of lines between the .rept directive and the next .endr directive count times.
|
|
level18: conditional jumps —> get a if-else function using the jne
and je
and cmp
if [x] is 0x7f454c46: y = [x+4] + [x+8] + [x+12] else if [x] is 0x00005A4D: y = [x+4] - [x+8] - [x+12] else: y = [x+4] * [x+8] * [x+12] where: x = rdi, y = rax. Assume each dereferenced value is a signed dword .
|
|
done
function is the most important to add because of the order of execution .- the ZF, the Zero Flag . The ZF is set to 1 when a cmp is equal. 0 otherwise.
level19
switch(number): 0: jmp do_thing_0 1: jmp do_thing_1 2: jmp do_thing_2 default: jmp do_default_thing
reduced else-if
using jump table: A jump table is a contiguous section of memory that holds addresses of places to jump
jump table could look like:
[0x1337] = address of do_thing_0 [0x1337+0x8] = address of do_thing_1 [0x1337+0x10] = address of do_thing_2 [0x1337+0x18] = address of do_default_thing
implement:
if rdi is 0: jmp 0x403040 else if rdi is 1: jmp 0x4030f7 else if rdi is 2: jmp 0x4031f1 else if rdi is 3: jmp 0x4032b9 else: jmp 0x40337c
an example jump table:
[0x4041df] = 0x403040 [0x4041e7] = 0x4030f7 [0x4041ef] = 0x4031f1 [0x4041f7] = 0x4032b9 [0x4041ff] = 0x40337c
constraints:
- assume rdi will NOT be negative
- use no more than 1 cmp instruction
- use no more than 3 jumps (of any variant)
- we will provide you with the number to ‘switch’ on in rdi.
- we will provide you with a jump table base address in rsi.
|
|
level20: for-loop —–>iterate for a number of times
perform: compute the average of n consecutive quad words
|
|
-
[0x404128:0x404310] = {n qwords} —->8 bytes
-
rdi = 0x404128
-
rsi = 61
|
|
ps: jle : ≤
level21: while-loop —–>iterate until meet a condition
|
|
Count the consecutive non-zero bytes in a contiguous region of memory, where:
-
rdi = memory address of the 1st byte
-
rax = number of consecutive non-zero bytes
-
if rdi = 0, then set rax = 0
Example
|
|
|
|
functions
level22: function —->a callable segment of code that does not destory control flow. Use the “call” and “ret” instructions
ip control 、utilize the stack to save things 、call other functions provided
The “call” instruction pushes the memory address of the next instruction onto the stack and then jumps to the value stored in the first argument.
|
|
- call pushes
0x102a
, the address of the next instruction, onto the stack - call jumps to
0x400000
, the value stored in rax - ret pops the top value off of the stack and jumps to it
0x102a
implement the following logic:
|
|
example
|
|
|
|
level23
rbp: Stack Base Pointer
example of constructing some list
|
|
we should implements :
|
|
Constraints:
- You must put the “counting list” on the stack
- You must restore the stack like in a normal function
- You cannot modify the data at src_addr
below: the push 0
I consulted the the reference. There is data on the top of the stack, not empty. So I pushed a 0 into the top of the stack to take advantage of it
|
|