Video: Richard Buckland Machine Code
Programs written in a high level language are generally compiled into machine code (a few, such as Prolog, are intrepreted. Unlike a high level language, assembly language deals directly with the hardware of a computer and is assembled into machine code. Unlike a high level language programmer an assembly language programmer must be familiar with the hardware features of a computer and must deal with a different set of concepts. While high level languages are portable, the same code can be re-compiled to run on different computers, assembly language is specific to one machine; in comparison with assembly language high level languages are logical and easy to use and maintain.
Despite the difficulties of low level assembly language programming it is still necessary sometimes to write in assembler because it is the best way to write code for certain situations. Assembly language is often used where code must be as fast and compact as possible and for code that controls pieces of hardware such as network, sound and graphics cards (device drivers). One benefit of studying assembly language is that it gives an insight into computer design and how a computer works.
There are four basic types of instruction: Transfer, Arithmetic, Logical, Test and Branch
MOVE, LOAD, STORE
These instructions move data around the computer, typically between the CPU and memory (and memory-mapped devices).
ADD, SUBTRACT, SHIFT, INCREMENT, DECREMENT, SHIFT, ROTATE
Status flags C, N, V and Z may be set by arithmetic operations.
Overflow is where the sign bit (bit 7 in 8 bits) is changed due to carry from bit 6, e.g. 1+1=0, so 1 is carried into bit 7.
Carry is where there is a carry from bit 7, e.g. 1+1=0 carry 1.
Carry and overflow are complicated by the sign conventions. In unsigned binary (no negative values) overflow does not matter because there is no sign bit to worry about. In signed binary (1's complement, 2's complement, sign and magnitude) overflow is a problem because it changes the sign bit and therefore the value of the result. For example:
64 + 64:
0100 0000
0100 0000
----------
1000 0000
The result is 128 in unsigned binary (where we don't care about overflow) but -128 in 2's complement where negative values are allowed.
Adding 128 and 128:
1000 0000
1000 0000
----------
0000 0000 carry 1
The carry is effectively bit 8 and 8^2 = 256 so this gives the result but only if the carry is taken into account. The largest number that can be stored in 8 bits without complements is, of course, 255 or 2^8-1.
Adding -64 and -65:
1100 0000
1011 1111
----------
0111 1111
The answer is +127, which is clearly wrong, it should be -129. -129 is beyond the range of 8 bits (-128 -> +127) and this sum has resulted in a change of the sign bit (overflow) and a carry from bit 7. The CPU needs to make the appropriate adjustments and provide error routines for situations like this, e.g. 'Too big at line nn' or 'Integer Overflow at line nn'.
Shift instructions move bits along to the left or the right. A logical shift moves a 0 or 1 into bit 7 (the sign bit) regardless of whether the sign bit needs to be preserved; an arithmetic shift protects the sign bit from change so the sign value is preserved.
AND, OR, NOT, XOR
These operations follow the rules for Boolean logic.
BRANCH_EQUAL_ZERO, BRANCH_NON_ZERO, BRANCH_CARRY_SET, etc.
These instructions make use of the status register to see whether a particular flag is set. The CMP (compare) instruction will check whether two values are the same by performing subtraction; if the values are the same the result will be zero and the Z flag will be set, otherwise it will not be set. For 12-10=2 the Z, N, C and V flags will all be unset but for 10-12=-2 the N flag will be set.
Unconditional branch instructions are summarised by the word 'jump' - jump to location nnnn.
Run through the examples on the introductory page. For AS level you should understand the fetch-execute cycle and at least the LOAD, ADD and STORE instructions.
The 'instruction' part of an instruction (opcode) will always be the same length e.g. 8 bits, 16 bits. The operand (data) will vary according to what it is - byte, word, string, address, etc.
Some instructions have no address e.g. Halt, Clear Carry. These are known as zero address instructions.
Some instructions have one 'address', e.g. ADD X. These are known as one address instructions.
Some instructions have two addresses, e.g. LDA MEMLOC, where the memory location may require two bytes or words.
This is where the operand is an immediate value and follows the opcode e.g. LDA #23
This is where the operand is an address, the actual value to be processed is in memory e.g. LDA MEMLOC
This is where the operand is the address of an address in memory, e.g. LDA (MEMLOC) - MEMLOC is not the address of the data but an address where you will find the real address of the data. (Imagine you have an address where you will collect a parcel but when you get there you find a note giving another address where the parcel really is.)
This is where the address following the opcode is computed by adding an index that can vary to a base address that doesn't, e.g. LDA MEMLOC, X.
This is typically used to access data in adjacent memory locations, starting at the base address and proceeding through a given number of locations using the index. (Imagine you have to deliver some parcels along a street where the houses are not numbered but the parcels are in order. Starting at the first house - the base address - you take each parcel in turn and deliver it to the base address plus one, the base address plus two, and so on.)
This is unlikely to be in the exam but for completeness it is a combination of indirect and indexed addressing, e.g. LDA (MEMLOC),X.
The base address is the address of an address, the effective address is that indirect address plus the index value. Imagine you have a bag of parcels... (that's enough parcels!)
There are four steps to creating and running an assembly language program:
An assembler translates assembly language into machine code but this in itself does not produce a working program. Most operating systems will load a program into the next free block of memory so the assembler cannot know in advance what addresses to use for its addresses (start, finish, subroutines, data, etc.). The program must be relocatable, capable of being placed anywhere in memory. The assembly code must be passed through a linker/loader that translates the relocatable addresses into absolute addresses, which will probably be different each time the program is run. The program may include external modules or sub-programs and the linker must also join these into a single executable program. Sub-programs may be organised to form a library and the linker may be set up to include specified library programs automatically. When it is known that all addresses can be resolved the program can be loaded into memory and executed.
During the assembly process the assembler does the following:
Even in assembly language routines are available for operations such as reading a character from the keyboard, otherwise these would have to be programmed from scratch on every computer. One way to carry out existing routines is to load the address of the routine into a register and then call an interrupt with that address as the location of the interrupt handling routine. For example:
mov ax, 100h
int 21h
This moves a location into the ax register (note the 'h' for a location given in hexadecimal) and then calls for an interrupt using the code at this address.
Where an action is repeated it can be moved to its own block of code with a name so that it can be called from another point in the program. The call instruction in assembly language is like 'goto' in a high level language or a call to a procedure or function. When a subroutine is called in this way the program places the address of the code block in the program counter so that the next instruction is executed from that location. The contents of the program counter before the jump will be placed on a stack so that they can be retrieved when the subroutine has finished. The end of a subroutine should be marked with a 'return' or 'ret' instruction so that the contents of the program counter before the subroutine was called can be retrieved from the stack.
In TOM a block of code can be placed in memory without a name and a jump instruction issued to force the program counter to move to that address. A return instruction is needed to end a subroutine.
In high level languages the flow of control in a program is implemented with if or case statements, as in if x > 10 then... In assembly language and machine code conditional statements are implemented with evaluation of a condition followed by a jump or transfer of control to another point in the program.
Conditions are created and monitored by the status register, which uses single bits to store the consequences of each machine code instruction for various 'flags'. These flags include:
The contents of two locations or registers can be achieved with the cmp (compare) instruction, which performs subtraction on two operands and sets the zero flag according to the result. This can be followed by a jmp (jump or branch) instruction to transfer control to a block of code at a certain location. (A location can be defined symbolically and it can be left to the assembler to supply the addresses for the various jumps.) The cmp instruction might typically be followed by the je instruction, that is jump to the address specified if the operands were equal. Some jump instructions take a flag in the status register into account, for example jz (jump to the address specified if the zero flag is zero) and jnz (jump if the zero flag is not zero).
Where signed numbers are in use there are special instructions to compare numbers, for example jg (jump if one signed number is greater than another signed number) and ja (jump if one signed number is greater than another).
A stack is an area of memory that has been set aside for temporary storage of data. One use of the stack is in storing the contents of a CPU's registers when a subroutine is called (this will be done automatically by the jump or branch instruction). When a subroutine is called the program counter value will change and many of the other registers will change as well as new instructions and operands are loaded. When the subroutine finishes the contents of the stack will be accessed to restore the values the CPU contained just before it was called. Similar techniques can be introduced by the programmer using the push and pop commands, often to store temporary variables.
Write the assembly language code for:
A <- 0
Repeat
A <- A + 1
Until A = 99
| MOVE R1, #0 | Initialise A | |
| .loop | ADD R1, #1 | Increment A |
| CMP R1, #99 | Check A value =99 | |
| BNE loop | Repeat loop if result not zero | |
| RTS |
Another example:
If X = 99
Then X <- 1
Else X <- X + 1
EndIf
| MOVE R1, #45 | Random test value for X | |
| .loop | CMP R1, #99 | Is X 99? |
| BEQ finish | If 99 then branch out of loop | |
| ADD R1, #1 | Increment R1 | |
| JMP loop | Unconditional jump, no need for 2nd test | |
| .finish | MOVE R1, #1 | Pointless reset of X |
| RTS |
A simple loop:
FOR I <- 1 TO 10 DO
A <- A + 1
| MOVE R1, #1 | Counter I | |
| MOVE R2, #0 | Initialise A to 0 | |
| .loop | ADD R1, #1 | Increment counter I |
| ADD R2, #1 | Add 1 to A | |
| CMP R1, #10 | Compare counter with 10 | |
| BNE loop | Repeat loop if result not zero | |
| RTS |
Now try examples from past papers.
Count <- 0
Repeat
X <- X + X
Count <- Count + 1
Until X >= 999
| MOVE #0, r0 | Initialise counter | |
| MOVE 110, r1 | Copy | |
| .loop | ADD #1, r0 | Increment counter |
| MOVE 110, r2 | Copy contents of X to r2 | |
| ADD r2, r1 | Add X to itself | |
| MOVE r1, 110 | Store result in X (doubles value) | |
| CMP #999, r1 | Is it >= 999? | |
| BCC loop | If not loop again | |
| RTS | Finish | |
| 110 | 15 | Storage for variable X |
Further Reading
| Assembly language | Machine code | Microcode |