Memory Management

Physical memory

The first computers had no memory abstractions whatsoever. Thus accessing the memory was actually accessing the physical memory in the requested location. However having to programs running at the same time became a challenge, because one program would most certainly overwrite a memory location that the other program used leading to crashes. One way to overcome this is static relocation , but it is slow and tricky to implement. However the operating system could run multiple programs concurrently even without memory abstractions. The way it would do so is saving the entire contents of memory to a dist file, swapping , and put in another program. In this way only one program would have access to the memory at a given time. Another problem to take into consideration is that if a user program could access every memory location it could intentionally or by accident overwrite the memory of the operating system unless there is some locking mechanisms.

Address space

So how to we solve protection and relocation of memory? By introducing a new abstract called the address space. The address space is a set of addresses that is a process can use to access memory. It is also unique for that process unless, in some occasions, some processes want to share their address spaces. A simple solution to map each process' address space to the physical memory is to use to special registers, base and limit. The base register holds the physical address assigned to a program and the limit register the length of that program. Thus if two programs are loaded into memory at the same time, the second program would have the base as the length of first program. When these programs are executed and a memory instruction is referenced the hardware automatically adds the base value to that memory reference before executing it. If the new value is above the limit or below the base of that program a fault is generated. The obvious disadvantage of this is that for every memory instruction additional computation is required.

Swapping

In practice the total memory needed for all processes that are running concurrently often exceeds the amount of RAM . To deal with this two approaches are used, swapping and virtual memory. Swapping involves bringing in each process to memory, run it for some time, and the put it back on the disk. Virtual memory on the other hand allows each process to run even though they are partially in memory. So how much memory needs to be allocated for a process when swapping? If every process is created with a fixed size the operating systems knows how much memory is needed and allocates exactly that amount. But if the process contains dynamic elements allowing the data segment to grow the problem becomes much more complicated. If a memory hole, created by swapping in and out different processes, are located next to a process that could easily be allocated for the program to hold dynamic elements. If it can't grow it has to move to a memory hole that is big enough or other processes have to be swapped out to create a space big enough. If the swap area is full as well when a process can't grow it is suspended until enough space is available or killed. Thus it can be a good idea to allocate some extra memory for each process to reduce overhead in swapping and moving processes around. But we should only swap the memory in use, and not the extra allocated memory.

When memory is dynamically used the operating system must keep track of it either by bitmaps or free lists . The bitmap divide the memory into allocation units where each unit corresponds to a bit, 0 if the unit is free or 1 if it is used. There is a tradeoff of how big these allocations units should be. The operating system could also maintain a linked list of all the allocated memory and available memory. When the list is sorted by addresses several algorithms exist to allocate the memory. The simplest algorithm is called first fit. It just scan the list until it finds a memory segment big enough. A variation of the first fit algorithm is the next fit. It extends first fit by keeping track of every hole it encounters to that the next time it searches it begins at such a hole. However simulations have showed that next fit gives worse performance than first fit. Yet another approach is best fit. It searches every possible hole and take the smallest. An alternative is also worst fit which takes the biggest available memory but it is not very good. Another is quick fit which keeps track of commonly requested sizes in a separate list. All of these algorithms have the disadvantage that when a process is swapped out or terminated it could possibly be quite expensive to check if a merge of the newly available memory to already existing adjacent holes could be done.

Segmentation

A segment is an independent address space used to manage the growing and shrinking data structures. It also simplifies linking if procedures occupy separate segments, because then the start address is same for all procedures (0) and we need only the segment number to call the procedure. If a procedure changes no other procedure needs to change due to the fact that the start address does not change. This is not the case if this technique is not used, because then procedures may be tightly packed next to each other and changing the size of one procedure may change the whole structure. A segment consists of linear addresses starting from 0 up to same maximum value which can change at execution. To specify an address the program must provide a segment number and an address within the segment. Segmentation makes it easier to share data and procedures between processes.

Virtual memory

The idea behind virtual memory is that each program has its own address space, broken up into pieces called pages . Every page consists of a linear sequence of addresses mapped onto the physical addresses. Because all virtual addresses do not need to be in memory at the same time, the operating system is alerted when an address that is not on the physical memory is referenced. If the address is in memory the mapping is done automatically.

Paging

Virtual addresses are generated to form the virtual address space . In computers without virtual memory the virtual memory addresses are sent directly to the memory bus without any alteration so they represent the physical memory. However, in computers with virtual memory addresses are sent to a MMU that maps the addresses to the correct physical addresses. A page is a fixed-size unit that the address space is divided into, while page frames are fixed-size units in the physical address space usually of the same size as pages. If we have 64KB of virtual memory, 32KB of physical memory we can not keep the whole virtual memory in memory. If the page size is 4KB we get 16 virtual pages and 8 page frames. When we need to swap in memory addresses from disk that is not currently in memory the whole page is swapped in. A 4KB page consists of 4096 addresses. When accessing address 0 the MMU would see that it is page 0, the addresses 0-4095, and depending on the mapping it may map the page to page frame 1. Thus transforming the virtual address page to the physical addresses 4096-8191 assuming they have the same size. The relation between the virtual addresses and the physical addresses is given by the page table . Because there are only 8 page frames in the example above and 16 pages, 8 of the pages are not mapped at a given time. Present and absent bits keep track of which virtual page that is currently mapped into the physical memory. When an address, not currently mapped, is referenced the MMU notices this and causes the CPU to trap ( page fault ) the operating system. This causes the operating system to replace a little used page (marking it as unmapped) frame with the referenced page (marking it as mapped). The mapping in the page table is split into a virtual page number and an offset. Thus we need to have four bits to represent 16 virtual pages. A typical page table entry has a page frame number, a present/absent bit, caching, referenced, modified and protection. The referenced bit plays an important role in many page replacement algorithms.

Performance

Paging may become a large bottleneck the more bits we use to address each page and offset. For example a 32-bit address space with 4KB page sizes has 1 million pages. To speed things up, a TLB may be used, which is a small hardware for mapping the virtual addresses to physical addresses without going through the page table. This usually works well because most programs tend to only reference a small portion of the pages, while the rest are barely used at all. The TLB usually has between 8-256 entries, entries that are exactly the same as the entries in the page table except the virtual page number which is not used at all. So when a virtual address is referenced and goes to the MMU for translation, a check is done to see if the address is present in the TLB by comparing it do every entry simultaneously. If a match is found we do not need to access the page table. If it is a miss a page table lookup is needed, and the match of that lookup replaced one of the entries in the TLB. Software TLBs are surprisingly quite good with acceptable performance if the TLB is quite large. A soft miss, using software TLB,is when the page is not in the TLB but in memory. Similarly, a hard miss is when the page is not in the TLB and not in the memory.

To deal with large virtual memories one can consider using a multilevel page table. Here we have page tables referencing to other page tables.