A fundamental understanding of the basics is always worthwhile to get to grips with areas such as exploit development. Knowing how a program operates under the 'hood' can make this considerably easier to handle. In this post I'll go over the use of Registers, Stack, and the Heap starting with a very basic overview.
In the Intel x86 and the x86_64 architecture there are a number of 'registers' that exist. A register is similar to a variable in the sense that they store information, with the exception that they can only hold a single value at a time. There are also only a fixed number of registers available, unlike a program or script that could have multiple variables, for example.
There are 8 general purpose registers, 6 segment registers, 1 flag register, and a single instruction pointer register within x86.
An overview of the general purpose registers is as follows:
- Accumulator: This is used for input/output from function calls and general arithmetic.
- Base: This is used for storing a pointer to data or general storage.
- Counter: Often used as a loop counter or shift instructions.
- Data: This is often used for storing variables within a function or general input/output operations.
- Destination Index: Generally used as a pointer for destination data in function calls.
- Source Index: Generally used as a pointer for source data in function calls.
- Base Pointer: This is either used as a general register, or is used to point to the base of the stack.
- Stack Pointer: This is used to point to the top of the stack.
From the above, these general purpose registers can be identified and manipulated by segments of the individual registers. For example, the Accumulator register when referenced as a 32-bit register on x86 is named
EAX, but the lower half of the register is referenced as
AX (a 16-bit register).
The table below should demonstrate this a little clearer:
In the table above the 8-bit registers are able to be referenced by the least significant bit (LSB) or the most significant bit (MSF). For example, the 16-bit Accumulator register
AX LSB is referenced via
AL, whilst the MSB would be
The Instruction pointer is a special register that stores the address of the next instruction that will be executed. For 32-bit x86 this is known as
EIP (Extended Instruction Pointer), whilst in 64-bit this is
Interacting with the registers via Assembly is done via a number of different instructions. There are far too many instructions that are available across different architectures to list them here, but some examples of the most common are below.
|Example Instruction||Example Opcode|
|ADD EAX,EAX||01 C0|
|ADD EBX,EAX||01 C3|
|XOR EAX,EAX||31 C0|
|MOV EBX,EAX||89 C3|
|MOV EAX,0x11223344||B8 44 33 22 11|
|SUB EAX,0x2A||83 e8 2a|
|JMP ESP||FF E4|
|JMP EBX||FF E3|
|JMP EDX||FF E2|
If I wanted to create a basic loop that functioned with a counter, this could be done with the
XOR EAX,EAX # Zero-out EAX by setting the destination and source as itself INC EAX # Add 1 to EAX value, previously 0 CMP EAX,0x11111111 # Compare value of EAX to a specified address or value JLE 0x1 # If EAX is not equal to value then JMP backwards to INC EAX ...
This is of course a very brief example. Some reference material for the x86 instruction listings can be found here.
The stack is a data structure within memory that is referenced via the Stack Pointer register (e.g.
RSP). Reading from and writing to the stack is performed quickly. Any data written to the stack is sent to the top, whilst it is also read from the top, with the order of the stack being determined via a last in - first out queue system. Data being written to the stack is via a 'push' and can be read from the stack as a 'pop' (i.e. I 'pop' data off of the stack, and I 'push' data onto it).
The size of the stack is set to grow downwards towards the heap, such as from a higher memory address to a lower memory address. When data is 'popped' from the stack it is effectively removed and stored in a register.
See the following diagram for a very basic example of how this looks:
When a function call has populated the stack with data, and then that function has exited, then all of the data that was pushed on to the stack becomes freed/deleted. Thus, the stack is used by function calls and local variables for this purpose.
Conversely, whilst the stack data is managed via the CPU, the heap is not managed automatically. The size of the heap can also be considerably larger than the stack and the allocation of memory must be performed manually within a program. Memory also has to be freed manually, again unlike the stack.
Memory on the heap is accessed via pointers, unlike the stack which can be accessed directly. Unlike the stack where there could be multiple stacks per application thread, there is typically only one heap allocation per application.