CS GRIND

CSC Notes Chapter 9 Virtual Memory

Chapter Objectives

  • To describe the benefits of a virtual memory system.
  • To explain the concepts of demand paging, page-replacement algorithms, and allocation of page frames.
  • To discuss the principles of the working-set model.

Background

Normally, it's unnecessary to load the whole program into memory:

Even the whole program is useful, partially load the program could still have following advantage:

Virtual memory allows files and memory to be shared by two or more processes through page sharing. This leads to the following benefits:

Demand Paging

Definition: Load pages only as they are needed instead of the entire program.

A lazy swapper never swaps a page into memory unless that page will be needed. Since we are now viewing a process as a sequence of pages, rather than as one large contiguous address space, use of the term swapper is technically incorrect. A swapper manipulates entire processes, whereas a pager is concerned with the individual pages of a process. We thus use pager, rather than swapper, in connection with demand paging.

Basic Concepts

The procedure for handling page fault:

  1. Check an internal table(usually kept with the process control block) for this process to determine whether the reference was a valid or an invalid memory access
  2. If the reference was invalid, we terminate the process. If it was valid, but we have not yet brought in that page, we now page it in.
  3. We find a free frame(by taking one from the free-frame list, for example).
  4. We schedule a disk operation to read the desired page into the newly allocated frame.
  5. When the disk read is complete, we modify the internal table kept we the process and the page table to indicate that the page is now in memory.
  6. We restart the instruction that was interrupted by the trap. The process can now access the page as though it had always been in memory.

Pure demand paging: never bring a page into memory until it is required.

Performance of Demand Paging

Performance of demand paging heavily depends on the page-fault rate. If we can not minimize the page-fault rate, the performance will be worsen dramatically.

Copy-on-Write

Child process reference to the same address space as its parent. When one of the process modified the memory, it creates a copy.

Page Replacement

The solution to solve over-allocation: when we need to swap in some pages, there is no free frame in memory.

Basic Page Replacement

The page-fault service routine with page replacement:

  1. Find the location of the desired page on the disk.
  2. Find a free frame:
    1. If there is a free frame, use it
    2. If there is no free frame, use a page-replacement algorithm to select a victim frame,
    3. Write the victim frame to the disk; change the page and frame tables accordingly.
  3. Read the desired page into the newly freed frame; change the page and frame tables.
  4. Restart the user process.

Since this scheme may double the effective access time, we can efficient reduce this overhead by using a modify bit(or dirty bit):

  1. Each page or frame has a modify bit associated with it in the hardware.
  2. The modify bit for a page is set by the hardware whenever any word or byte in the page is written into, indicating that the page has been modified
  3. When we select a page for replacement, we examine its modify bit.
  4. If the bit is set, we need to write the page to the disk.
  5. Else, we can simply discard that page.

The two major problems to implement demand paging:

  1. frame-allocation algorithm
  2. page-replacement algorithm

FIFO Page Replacement

Optimal Page Replacement

Replace the page that will not be used for the longest period of time.

LRU Page Replacement

Use the recent past as an approximation of the near future, replace the page that has note been used for the longest period of time. This is called least-recently-used(LRU) algorithm.

Two ways to implement:

  1. Counters: associate with each page-table entry a time-of-use field and add to the CPU a logical clock or counter. The clock is incremented for every memory reference. Whenever a reference to a page is made, the contents of the clock register are copied to the time-of-use field in the page-table entry for that page.
  2. Stack: keep a stack of page numbers. Whenever a page is referenced, it is removed from the stack and put on the top.

LRU-Approximation Page Replacement

Additional-Reference-Bits Algorithm

  1. If used, flip the highest bit to 1.
  2. Shift right after a time interval
  3. The process with smallest reference bits is the LRU.

Second-Chance Algorithm

  1. The basic algorithm of second-chance replacement is FIFO replacement algorithm.
  2. When a page has been selected, we inspect its reference bit.
  3. If the value is 0, replace the page.
  4. If it's 1, we give this page a second chance by clear its reference bit and move on to select the next FIFO page.
  5. Thus, when a page got a second chance, it will not be replaced until all the other pages have been replaced or given second chances.

Enhanced Second-Chance Algorithm

Combine reference bit and modify bit to reduce the I/O request:

  1. (0,0) neither recently used nor modified -- best page to replace
  2. (0,1) not recently used but modified -- not quite as good,because the page will need to be written out before replacement.
  3. (1,0) recently used but clean -- probably will be used again soon
  4. (1,1) recently used and modified -- probably will be sued again soon, and the page will be need to be written out to disk before it can be replaced.

Counting-Based Page Replacement

Page-Buffering Algorithms

Buffer the modified pages and write them out to disk when the I/O is idle.

Applications and Page Replacement

Because some applications can manager I/O more efficiently than OS for some specific functionalities, some OS give special programs the ability to use a disk partition as a large sequential array of logical blocks without any file-system data structures.

Allocation of Frames

Minimum Number of Frames

Allocation Algorithms

  1. equal allocation: split m frames among n processes equally and have leftover as free-frame buffer pool.
  2. proportional allocation: allocate available memory to each process according to its size.

Global versus Local Allocation

With multiple processes competing for frames, we can classify page-replacement algorithms into two broad categories:

one problem with global replacement algorithm is that a process cannot control its own page-fault rate.

Local replacement might hinder a process, however, by not making available to it other, less used pages of memory. Thus, global replacement generally results in greater system throughput and is therefore the more common method.

Non-Uniform Memory Access

The system boards are interconnected in various ways, ranging from system buses to high-speed network connections like InfiniBand. As you might expect, the CPUs on a particular board can access the memory on that board with less delay than they can access memory on other boards in the system. Systems in which memory access times vary significantly are known collectively as non-uniform memory access (NUMA) systems, and without exception, they are slower than systems in which memory and CPUs are located on the same motherboard.

Thrashing

Cause of Thrashing

Low CPU utilization causes higher multiprogramming which causes more page fault and further worsen the CPU utilization.

local replacement algorithm(or priority replacement algorithm): if one process starts thrashing, it cannot steal frames from another process and cause the latter to thrash as well.

To prevent thrashing, we must provide a process with as many frames as it needs. We need the locality model of process execution.

Locality model:

How it works:

  1. Suppose we allocate enough frames to a process to accommodate its current locality.
  2. It will fault for the pages in its locality until all these pages are in memory; then, it will not fault again until it changes localities.
  3. If we do not allocate enough frames to accommodate the size of the current locality, the process will thrash, since it cannot keep in memory all the pages that it is actively using.

Working-Set Model

What is working set:

  1. Working-set model is based on the assumption of locality.
  2. This model uses a parameter, delta, to define the working-set window.
  3. the idea is examine the most recently used delta page references.
  4. The set of pages in the most recent delta page references is the working set.
    • If a page is in active use, it will be in the working set.
    • If a page is no longer being used, it will drop from the working set delta time units after its last reference.

Thus, the working set is an approximation of the program's locality.
The accuracy of the working set depends on the selection of delta:

  1. If delta is too small, it will not encompass the entire locality;
  2. If delta is too large, it may overlap several localities.

How does it work:

  1. OS monitors the working set of each process and allocates to that working set enough frames to provide it with its working-set size.
  2. If there are enough extra frames, another process can be initiated.
  3. If the sum of the working-set sizes increases, exceeding the total number of available frames, OS selected a process to suspend.
  4. The process's pages are written out(swapped), and its frames are reallocated to other processes.
  5. The suspended process can be restarted later.

This strategy prevents thrashing while keeping the degree of multiprogramming as high as possible. Thus it optimizes CPU utilization.

It's hard to keep track of the working set because the working-set window is a moving window.
We can approximate the working-set model with a fixed-interval timer interrupt and a reference bit. We update the reference bit when the timer interrupt incurs and when a page fault incurs, the pages with particular reference bit(larger than 00, for example) are considered in the working set.

Page-Fault Frequency

Page-fault frequency(PFF): a more direct approach to handle thrashing:

Memory-Mapped Files

Memory-mapping: Instead of using standard system calls open(), read(), write() to access files, we can use virtual memory techniques to treat file I/O as routine memory accesses.

Basic Mechanism

Memory-mapping a file is accomplished by mapping a disk block to a page(or pages) in memory.

  1. Initial access to the file proceeds through ordinary demand paging, resulting in a page fault;
  2. a page-sized portion of the file is read from the file system into a physical page;
  3. subsequent reads and writes to the file are handled as routine memory accesses;
  4. The practice simplifies file access and usage by allowing the system of manipulate files through memory rather than incurring the overhead of using the read() and write() system calls.

Allocating Kernel Memory

Kernel memory is often allocated from a free-memory pool different from the list(discussed earlier) used to satisfy ordinary user-mode processes. There are two primary reasons for this:

  1. The kernel requests memory for data structures of varying sizes, some of which are less than a page in size. Use paging will generate fragmentation problem.
  2. Pages allocated to user-mode processes do not necessarily have to be in contiguous physical memory. However, certain hardware devices interact directly with physical memory -- without the benefit of virtual memory interface -- and consequently may require memory residing in physically contiguous pages.

Buddy System

The buddy system allocates memory from a fixed-size segment consisting of physically contiguous pages. Memory is allocated from this segment using a power-of-2 allocator, which satisfies requests in units sized as a power of 2 (4 KB, 8 KB, 16 KB, and so forth). A request in units not appropriately sized is rounded up to the next highest power of 2.

We split a memory chunk evenly until we get a block that meets the requirement from the allocator.

Drawback: up to 50% internal fragmentation.

Slab Allocation

How it works:

  1. The slab allocator first attempt to satisfy the request with a free object in a partial slab.
  2. If none exists, a free object is assigned from an empty slab.
  3. If no empty slabs are available, a new slab is allocated from contiguous physical pages and assigned to a cache; memory for the object is allocated from this slab.

Benefits:

  1. No memory is wasted due to fragmentation
  2. Memory requests can be satisfied quickly because objects are created in advance and thus can be quickly allocated from the cache.

Other Considerations for Paging Systems

Prepaging

Prepaging is an attempt to prevent this high level of initial paging. The strategy is to bring into memory at one time all the pages that will be needed.

Page Size

TLB Reach

Inverted Page Tables

Program Structure

I/O Interlock