A Beginner's Guide to x86 Assembly, Part 1 of 2
This was originally submitted to Reddit here.
Writing bare assembly is rarely necessary these days, but I definitely recommend it for anyone interested in programming. Not only does it offer a different perspective compared to higher-level languages, but it may prove to be useful when debugging code in other languages.
In this two-part series we will be implementing a Reverse Polish notation (RPN) calculator in bare x86 assembly from the ground up. When we’re done we’ll be able to use it like this:
The complete code from the end of Part 1 can be found here. Additionally, if you feel like taking a peek, the complete code from the end of this series can be found here. It is heavily commented and these two may serve as a sufficient learning resource for those of you with some knowledge of assembly already.
In Part 1 we will begin with a basic “Hello world!” program to ensure your setup is working properly. We will progress to explaining system calls, call stack, stack frames, and the x86 calling convention. We will then finish by writing some basic functions in x86 assembly for practice. In Part 2 we we will begin our RPN calculator from scratch.
This article series is aimed at people who have some experience programming in C and have some basic knowledge of computer architecture (such as what a CPU register is). Since we will be using Linux you will also need to know how to use the Linux command line.
As stated before, we will be using Linux (either 64-bit or 32-bit). The code in this article series will not work on Windows or Mac OS X.
You simply need the GNU linker
binutils, which is pre-installed on most distros, and the NASM assembler. On Ubuntu and Debian you can install both with:
I would also recommend you keep an ASCII table handy.
To ensure your setup is ready to begin, save the following code into a file called
The comments should explain the general strucure, but you are probably a bit confused on how it works. If you are confused about the instructions or the registers used, you can reference University of Virginia’s Guide to x86 Assembly for a list of registers and common instructions. Once we discuss system calls this should hopefully make even more sense.
To assemble the assembly file into an object file, then to link the object file into an executable, run:
When run you should see:
This part is optional, but to easily assemble and link in the future, we can create use Make. Save the following into a file called
Makefile in the same directory as your
Then to assemble and link, instead of following the above instructions, you can simply run
System calls are used to request the operating system to perform an action for us. System calls are set up by storing the system call number in register
eax, followed by its arguments in
edx in that order (if applicable). If a system call does not use a particular argument than that register can be set to anything. For example, if a system call only takes 1 argument, then the values in
edx will be ignored.
In this article, we will only be using the following two system calls1:
write(), which writes a string to a file or stream (in our case, to standard out and standard error), and
exit(), to exit the program:
0x01): Exits the program. Arguments:
error code- set to 0 to indicate the program ended without errors, and use any other number (such as 1) to instead indicate an error occurred
0x04): Writes a string to a file or stream. Arguments:
fd- The number of an open file descriptor for the file to write to. In our case, we will use 1 to write to standard output and 2 for standard error output.
string- a pointer to the first character of the string
length- the length of the string in bytes
The call stack
The call stack is a data structure that stores information about each function call. Each function call has its own section in the stack called a “frame,” which stores some information about the current function call, such as the local variables of that function and the return address (where the program should jump to once the function is done executing).
One confusing thing that I will note immediately is that the stack grows downwards in memory. When you add something to the top of the stack, it will be inserted at a memory address lower than the previous thing in the stack. In other words, as the stack grows, the memory address of the top of the stack decreases. To prevent confusion, I will not mention that fact unless it is absolutely necessary because we are working with memory addresses of items on the stack.
push instruction will insert some onto the top of the stack, and
pop will remove data from the top of the stack. For example,
push eax will allocate more space on the top of the stack and move the value in
eax to that space, and
pop eax will move whatever data is at the top of the stack into
eax and unallocate that space from the stack.
esp register’s purpose is to point to the top of the stack. Any data above
esp is considered not on the stack, it is garbage data. Performing a
push instruction to add data to the top of the stack (or a
pop to remove data) will move
esp. You can also manipulate
esp directly if you know what you’re doing.
ebp register is similar except it always points somewhere in the middle of the current stack frame, directly before the local variables of the current function (we’ll talk more about this later). However, calling another function does not move
ebp automatically, we must do that manually each time.
The x86 calling convention
x86 has no built-in notion of a function like higher-level languages do. The x86
call instruction is essentially just a
goto) to another place in memory. In order to use subroutines like we use functions in other languages (which can take arguments and return data back), we must follow a calling convention2. That will also ensure that a subroutine’s registers will not be messed up when calling another function.
Before calling the function, the caller must:
- Save the caller-saved registers by pushing them onto the stack. Some registers are able to be modified by the called function, and to ensure you do not lose the data in those registers, the caller must save them in memory before the call by pushing them onto the stack. These registers are
edx. If you were not using some or all of those registers then you do not need to save them.
- Push the function’s arguments onto the stack in reverse order (pushing the last argument first and the first argument last). This order ensures that the called function will have its arguments in correct order when popping them from the stack.
The function will store its result in
eax if applicable. Immediately after the
call, the caller must:
- Remove the function’s arguments from the stack. This is typically done by simply adding the number of bytes to
esp. Don’t forget that the stack grows downward, so to remove from the stack you must add.
- Restore the caller-saved registers by popping them from the stack in reverse order. No other registers will have been modified by the called function.
The following example sums up the above caller rules. In this example, assume I have a function called
_subtract which takes two integer (4-byte) arguments and returns the first argument minus the second argument. In my subroutine called
_mysubroutine, I call
_subtract with the arguments
In order to be called, a subroutine must:
- Save the previous frame’s base pointer
ebpby pushing it onto the stack.
ebp, which currently points to the previous frame, to point to the current frame (the current value of
- Allocate more space on the stack for local variables, if necessary, by moving the stack pointer
esp. Since the stack growns downwards, that means you should subtract from
- Save the callee-saved registers by pushing them onto the stack. These are:
esi. You do not have to save any registers you are not planning on modifying.
You may notice a return address in each stack frame in those diagrams. Those are inserted into the stack automatically by
ret instruction pops the address on the top of the stack and jumps to that location. We don’t have to use it for anything, I only include it to show why the function’s local variables are 4 bytes above
ebp but the function’s arguments are 8 bytes below
You may also notice in the last diagram above that the local variables of a function always begin 4 bytes above
ebp at the address
ebp-4 (you subtract because addresses go down as you move up the stack) and the arguments of a function always begin 8 bytes below
ebp at the address
ebp+8 (you add to move down the stack). If you follow the callee rules, this will always be the case for any function.
Once your function is done executing and you wish to return, you should first set
eax to the return value of your function if necessary. Additionally you must:
- Restore the callee-saved registers by popping them from the stack in reverse order.
- Deallocate the space on the stack you allocated in step 3 above for local variables, if applicable. This can be done simply by setting
ebp. This is safe to do even if you didn’t allocate any space in the first place.
- Restore the previous frame’s base pointer
ebpby popping it from the stack.
- Return with
Now we’ll implement our
_subtract function from our example above:
Enter and Leave
You may notice in the example above that a function will always start the same:
mov ebp, esp, and allocating space for local variables. x86 has a handy instruction to accomplish this for us:
enter a b, where
a is the number of bytes you’d like to allocate for local variables, and
b is the “nesting level” which we will always leave at
0. Additionally, a function always ends with
pop ebp and
mov esp, ebp3. This can also be replaced with a single instruction:
leave. Using these, our example above becomes:
Writing some basic functions
Now that we understand the calling convention, we can begin writing some subroutines. It would be pretty handy to generalizing the code that prints the “Hello world!” to print any string we’d like, a
For that function we will need a
_strlen function to count the length of the string. That function might look like this in C:
In other words, starting at the beginning of the string, we add
1 to the return value for every character we see that is not the string-terminating null character
0. Once that null character is seen, we return. In assembly it is also pretty simple using our example
_subtract function above as a base:
That wasn’t too bad, right? Writing the code in C beforehand may help you a lot because much of it can be directly converted to assembly. Now we can use this function in our
_print_msg function, which will require everything we’ve learned so far:
And lets see the fruit of our hard work by using that function in our complete “Hello world!” program:
That is the end of Part 1. Believe it or not, we have covered all the main x86 topics you will need to write basic x86 programs! In the next article we will apply this knowledge to write our RPN calculator. Now that we have all the introductory material and theory out of the way, Part 2 will be focusing entirely on the code. The functions we write will be much longer and will even have to use some local variables as well.
- University of Virginia’s Guide to x86 Assembly – Goes more into depth on many of the topics covered here, including more information on each of the most common x86 instructions. A great reference for the most common x86 instructions as well.
- The Art of Picking Intel Registers – While most of the x86 registers are general-purpose, many of the registers have a historical meaning. Following those conventions can improve code readability, and as an interesting side benefit, will even slightly optimize the size of your binaries.
- NASM: Intel x86 Instruction Reference - a full reference to all the obscure x86 instructions.
There are multiple calling conventions, but we will be using CDECL, the most popular calling convention for x86 in use by C compilers and assembly programmers. ↩︎
This is especially important, for example, for C functions with variable number of arguments, since the first argument will be the number of arguments to pop. ↩︎