@node Reference Guide @appendix Reference Guide This chapter is a reference for the Pintos code. The reference guide does not cover all of the code in Pintos, but it does cover those pieces that students most often find troublesome. You may find that you want to read each part of the reference guide as you work on the task where it becomes important. We recommend using ``tags'' to follow along with references to function and variable names (@pxref{Tags}). @menu * Pintos Loading:: * Threads:: * Synchronization:: * Interrupt Handling:: * Memory Allocation:: * Virtual Addresses:: * Page Table:: * Hash Table:: @end menu @node Pintos Loading @section Loading This section covers the Pintos loader and basic kernel initialization. @menu * Pintos Loader:: * Low-Level Kernel Initialization:: * High-Level Kernel Initialization:: * Physical Memory Map:: @end menu @node Pintos Loader @subsection The Loader The first part of Pintos that runs is the loader, in @file{threads/loader.S}. The PC BIOS loads the loader into memory. The loader, in turn, is responsible for finding the kernel on disk, loading it into memory, and then jumping to its start. It's not important to understand exactly how the loader works, but if you're interested, read on. You should probably read along with the loader's source. You should also understand the basics of the 80@var{x}86 architecture as described by chapter 3, ``Basic Execution Environment,'' of @bibref{IA32-v1}. The PC BIOS loads the loader from the first sector of the first hard disk, called the @dfn{master boot record} (MBR). PC conventions reserve 64 bytes of the MBR for the partition table, and Pintos uses about 128 additional bytes for kernel command-line arguments. This leaves a little over 300 bytes for the loader's own code. This is a severe restriction that means, practically speaking, the loader must be written in assembly language. The Pintos loader and kernel don't have to be on the same disk, nor is the kernel required to be in any particular location on a given disk. The loader's first job, then, is to find the kernel by reading the partition table on each hard disk, looking for a bootable partition of the type used for a Pintos kernel. When the loader finds a bootable kernel partition, it reads the partition's contents into memory at physical address @w{128 kB}. The kernel is at the beginning of the partition, which might be larger than necessary due to partition boundary alignment conventions, so the loader reads no more than @w{512 kB} (and the Pintos build process will refuse to produce kernels larger than that). Reading more data than this would cross into the region from @w{640 kB} to @w{1 MB} that the PC architecture reserves for hardware and the BIOS, and a standard PC BIOS does not provide any means to load the kernel above @w{1 MB}. The loader's final job is to extract the entry point from the loaded kernel image and transfer control to it. The entry point is not at a predictable location, but the kernel's ELF header contains a pointer to it. The loader extracts the pointer and jumps to the location it points to. The Pintos kernel command line is stored in the boot loader. The @command{pintos} program actually modifies a copy of the boot loader on disk each time it runs the kernel, inserting whatever command-line arguments the user supplies to the kernel, and then the kernel at boot time reads those arguments out of the boot loader in memory. This is not an elegant solution, but it is simple and effective. @node Low-Level Kernel Initialization @subsection Low-Level Kernel Initialization The loader's last action is to transfer control to the kernel's entry point, which is @func{start} in @file{threads/start.S}. The job of this code is to switch the CPU from legacy 16-bit ``real mode'' into the 32-bit ``protected mode'' used by all modern 80@var{x}86 operating systems. The startup code's first task is actually to obtain the machine's memory size, by asking the BIOS for the PC's memory size. The simplest BIOS function to do this can only detect up to 64 MB of RAM, so that's the practical limit that Pintos can support. The function stores the memory size, in pages, in global variable @code{init_ram_pages}. The first part of CPU initialization is to enable the A20 line, that is, the CPU's address line numbered 20. For historical reasons, PCs boot with this address line fixed at 0, which means that attempts to access memory beyond the first 1 MB (2 raised to the 20th power) will fail. Pintos wants to access more memory than this, so we have to enable it. Next, the loader creates a basic page table. This page table maps the 64 MB at the base of virtual memory (starting at virtual address 0) directly to the identical physical addresses. It also maps the same physical memory starting at virtual address @code{LOADER_PHYS_BASE}, which defaults to @t{0xc0000000} (3 GB). The Pintos kernel only wants the latter mapping, but there's a chicken-and-egg problem if we don't include the former: our current virtual address is roughly @t{0x20000}, the location where the loader put us, and we can't jump to @t{0xc0020000} until we turn on the page table, but if we turn on the page table without jumping there, then we've just pulled the rug out from under ourselves. After the page table is initialized, we load the CPU's control registers to turn on protected mode and paging, and set up the segment registers. We aren't yet equipped to handle interrupts in protected mode, so we disable interrupts. The final step is to call @func{main}. @node High-Level Kernel Initialization @subsection High-Level Kernel Initialization The kernel proper starts with the @func{main} function. The @func{main} function is written in C, as will be most of the code we encounter in Pintos from here on out. When @func{main} starts, the system is in a pretty raw state. We're in 32-bit protected mode with paging enabled, but hardly anything else is ready. Thus, the @func{main} function consists primarily of calls into other Pintos modules' initialization functions. These are usually named @func{@var{module}_init}, where @var{module} is the module's name, @file{@var{module}.c} is the module's source code, and @file{@var{module}.h} is the module's header. The first step in @func{main} is to call @func{bss_init}, which clears out the kernel's ``BSS'', which is the traditional name for a segment that should be initialized to all zeros. In most C implementations, whenever you declare a variable outside a function without providing an initializer, that variable goes into the BSS. Because it's all zeros, the BSS isn't stored in the image that the loader brought into memory. We just use @func{memset} to zero it out. Next, @func{main} calls @func{read_command_line} to break the kernel command line into arguments, then @func{parse_options} to read any options at the beginning of the command line. (Actions specified on the command line execute later.) @func{thread_init} initializes the thread system. We will defer full discussion to our discussion of Pintos threads below. It is called so early in initialization because a valid thread structure is a prerequisite for acquiring a lock, and lock acquisition in turn is important to other Pintos subsystems. Then we initialize the console and print a startup message to the console. The next block of functions we call initializes the kernel's memory system. @func{palloc_init} sets up the kernel page allocator, which doles out memory one or more pages at a time (@pxref{Page Allocator}). @func{malloc_init} sets up the allocator that handles allocations of arbitrary-size blocks of memory (@pxref{Block Allocator}). @func{paging_init} sets up a page table for the kernel (@pxref{Page Table}). In tasks 2 and later, @func{main} also calls @func{tss_init} and @func{gdt_init}. The next set of calls initializes the interrupt system. @func{intr_init} sets up the CPU's @dfn{interrupt descriptor table} (IDT) to ready it for interrupt handling (@pxref{Interrupt Infrastructure}), then @func{timer_init} and @func{kbd_init} prepare for handling timer interrupts and keyboard interrupts, respectively. @func{input_init} sets up to merge serial and keyboard input into one stream. In tasks 2 and later, we also prepare to handle interrupts caused by user programs using @func{exception_init} and @func{syscall_init}. Now that interrupts are set up, we can start the scheduler with @func{thread_start}, which creates the idle thread and enables interrupts. With interrupts enabled, interrupt-driven serial port I/O becomes possible, so we use @func{serial_init_queue} to switch to that mode. Finally, @func{timer_calibrate} calibrates the timer for accurate short delays. If the file system is compiled in, as it will starting in task 2, we initialize the IDE disks with @func{ide_init}, then the file system with @func{filesys_init}. The Pintos boot is then complete, so we print a message. Function @func{run_actions} now parses and executes actions specified on the kernel command line, such as @command{run} to run a test (in task 1) or a user program (in later tasks). Finally, if @option{-q} was specified on the kernel command line, we call @func{shutdown_power_off} to terminate the machine simulator. Otherwise, @func{main} calls @func{thread_exit}, which allows any other running threads to continue running. @node Physical Memory Map @subsection Physical Memory Map @multitable {@t{00000000}--@t{00000000}} {Hardware} {Some much longer explanatory text} @headitem Memory Range @tab Owner @tab Contents @item @t{00000000}--@t{000003ff} @tab CPU @tab Real mode interrupt table. @item @t{00000400}--@t{000005ff} @tab BIOS @tab Miscellaneous data area. @item @t{00000600}--@t{00007bff} @tab --- @tab --- @item @t{00007c00}--@t{00007dff} @tab Pintos @tab Loader. @item @t{0000e000}--@t{0000efff} @tab Pintos @tab Stack for loader; kernel stack and @struct{thread} for initial kernel thread. @item @t{0000f000}--@t{0000ffff} @tab Pintos @tab Page directory for startup code. @item @t{00010000}--@t{00020000} @tab Pintos @tab Page tables for startup code. @item @t{00020000}--@t{0009ffff} @tab Pintos @tab Kernel code, data, and uninitialized data segments. @item @t{000a0000}--@t{000bffff} @tab Video @tab VGA display memory. @item @t{000c0000}--@t{000effff} @tab Hardware @tab Reserved for expansion card RAM and ROM. @item @t{000f0000}--@t{000fffff} @tab BIOS @tab ROM BIOS. @item @t{00100000}--@t{03ffffff} @tab Pintos @tab Dynamic memory allocation. @end multitable @node Threads @section Threads @menu * struct thread:: * Thread Functions:: * Thread Switching:: @end menu @node struct thread @subsection @code{struct thread} The main Pintos data structure for threads is @struct{thread}, declared in @file{threads/thread.h}. @deftp {Structure} {struct thread} Represents a thread or a user process. In the tasks, you will have to add your own members to @struct{thread}. You may also change or delete the definitions of existing members. Every @struct{thread} occupies the beginning of its own page of memory. The rest of the page is used for the thread's stack, which grows downward from the end of the page. It looks like this: @example @group 4 kB +---------------------------------+ | kernel stack | | | | | | | | V | | grows downward | | | | | | | | | | | | | | | | | sizeof (struct thread) +---------------------------------+ | magic | | : | | : | | status | | tid | 0 kB +---------------------------------+ @end group @end example This has two consequences. First, @struct{thread} must not be allowed to grow too big. If it does, then there will not be enough room for the kernel stack. The base @struct{thread} is only a few bytes in size. It probably should stay well under 1 kB. Second, kernel stacks must not be allowed to grow too large. If a stack overflows, it will corrupt the thread state. Thus, kernel functions should not allocate large structures or arrays as non-static local variables. Use dynamic allocation with @func{malloc} or @func{palloc_get_page} instead (@pxref{Memory Allocation}). @end deftp @deftypecv {Member} {@struct{thread}} {tid_t} tid The thread's thread identifier or @dfn{tid}. Every thread must have a tid that is unique over the entire lifetime of the kernel. By default, @code{tid_t} is a @code{typedef} for @code{int} and each new thread receives the numerically next higher tid, starting from 1 for the initial process. You can change the type and the numbering scheme if you like. @end deftypecv @deftypecv {Member} {@struct{thread}} {enum thread_status} status @anchor{Thread States} The thread's state, one of the following: @defvr {Thread State} @code{THREAD_RUNNING} The thread is running. Exactly one thread is running at a given time. @func{thread_current} returns the running thread. @end defvr @defvr {Thread State} @code{THREAD_READY} The thread is ready to run, but it's not running right now. The thread could be selected to run the next time the scheduler is invoked. Ready threads are kept in a doubly linked list called @code{ready_list}. @end defvr @defvr {Thread State} @code{THREAD_BLOCKED} The thread is waiting for something, e.g.@: a lock to become available, an interrupt to be invoked. The thread won't be scheduled again until it transitions to the @code{THREAD_READY} state with a call to @func{thread_unblock}. This is most conveniently done indirectly, using one of the Pintos synchronization primitives that block and unblock threads automatically (@pxref{Synchronization}). There is no @i{a priori} way to tell what a blocked thread is waiting for, but a backtrace can help (@pxref{Backtraces}). @end defvr @defvr {Thread State} @code{THREAD_DYING} The thread will be destroyed by the scheduler after switching to the next thread. @end defvr @end deftypecv @deftypecv {Member} {@struct{thread}} {char} name[16] The thread's name as a string, or at least the first few characters of it. @end deftypecv @deftypecv {Member} {@struct{thread}} {uint8_t *} stack Every thread has its own stack to keep track of its state. When the thread is running, the CPU's stack pointer register tracks the top of the stack and this member is unused. But when the CPU switches to another thread, this member saves the thread's stack pointer. No other members are needed to save the thread's registers, because the other registers that must be saved are saved on the stack. When an interrupt occurs, whether in the kernel or a user program, an @struct{intr_frame} is pushed onto the stack. When the interrupt occurs in a user program, the @struct{intr_frame} is always at the very top of the page. @xref{Interrupt Handling}, for more information. @end deftypecv @deftypecv {Member} {@struct{thread}} {int} priority A thread priority, ranging from @code{PRI_MIN} (0) to @code{PRI_MAX} (63). Lower numbers correspond to lower priorities, so that priority 0 is the lowest priority and priority 63 is the highest. Pintos, as initially provided, ignores thread priorities, but you will implement priority scheduling in task 1 (@pxref{Priority Scheduling}). @end deftypecv @deftypecv {Member} {@struct{thread}} {@struct{list_elem}} allelem This ``list element'' is used to link the thread into the list of all threads. Each thread is inserted into this list when it is created and removed when it exits. The @func{thread_foreach} function should be used to iterate over all threads. @end deftypecv @deftypecv {Member} {@struct{thread}} {@struct{list_elem}} elem A ``list element'' used to put the thread into doubly linked lists, either @code{ready_list} (the list of threads ready to run) or a list of threads waiting on a semaphore in @func{sema_down}. It can do double duty because a thread waiting on a semaphore is not ready, and vice versa. @end deftypecv @deftypecv {Member} {@struct{thread}} {uint32_t *} pagedir Only present in task 2 and later. @xref{Page Tables}. @end deftypecv @deftypecv {Member} {@struct{thread}} {unsigned} magic Always set to @code{THREAD_MAGIC}, which is just an arbitrary number defined in @file{threads/thread.c}, and used to detect stack overflow. @func{thread_current} checks that the @code{magic} member of the running thread's @struct{thread} is set to @code{THREAD_MAGIC}. Stack overflow tends to change this value, triggering the assertion. For greatest benefit, as you add members to @struct{thread}, leave @code{magic} at the end. @end deftypecv @node Thread Functions @subsection Thread Functions @file{threads/thread.c} implements several public functions for thread support. Let's take a look at the most useful: @deftypefun void thread_init (void) Called by @func{main} to initialize the thread system. Its main purpose is to create a @struct{thread} for Pintos's initial thread. This is possible because the Pintos loader puts the initial thread's stack at the top of a page, in the same position as any other Pintos thread. Before @func{thread_init} runs, @func{thread_current} will fail because the running thread's @code{magic} value is incorrect. Lots of functions call @func{thread_current} directly or indirectly, including @func{lock_acquire} for locking a lock, so @func{thread_init} is called early in Pintos initialization. @end deftypefun @deftypefun void thread_start (void) Called by @func{main} to start the scheduler. Creates the idle thread, that is, the thread that is scheduled when no other thread is ready. Then enables interrupts, which as a side effect enables the scheduler because the scheduler runs on return from the timer interrupt, using @func{intr_yield_on_return} (@pxref{External Interrupt Handling}). @end deftypefun @deftypefun void thread_tick (void) Called by the timer interrupt at each timer tick. It keeps track of thread statistics and triggers the scheduler when a time slice expires. @end deftypefun @deftypefun void thread_print_stats (void) Called during Pintos shutdown to print thread statistics. @end deftypefun @deftypefun tid_t thread_create (const char *@var{name}, int @var{priority}, thread_func *@var{func}, void *@var{aux}) Creates and starts a new thread named @var{name} with the given @var{priority}, returning the new thread's tid. The thread executes @var{func}, passing @var{aux} as the function's single argument. @func{thread_create} allocates a page for the thread's @struct{thread} and stack and initializes its members, then it sets up a set of fake stack frames for it (@pxref{Thread Switching}). The thread is initialized in the blocked state, then unblocked just before returning, which allows the new thread to be scheduled (@pxref{Thread States}). @deftp {Type} {void thread_func (void *@var{aux})} This is the type of the function passed to @func{thread_create}, whose @var{aux} argument is passed along as the function's argument. @end deftp @end deftypefun @deftypefun void thread_block (void) Transitions the running thread from the running state to the blocked state (@pxref{Thread States}). The thread will not run again until @func{thread_unblock} is called on it, so you'd better have some way arranged for that to happen. Because @func{thread_block} is so low-level, you should prefer to use one of the synchronization primitives instead (@pxref{Synchronization}). @end deftypefun @deftypefun void thread_unblock (struct thread *@var{thread}) Transitions @var{thread}, which must be in the blocked state, to the ready state, allowing it to resume running (@pxref{Thread States}). This is called when the event that the thread is waiting for occurs, e.g.@: when the lock that the thread is waiting on becomes available. @end deftypefun @deftypefun {struct thread *} thread_current (void) Returns the running thread. @end deftypefun @deftypefun {tid_t} thread_tid (void) Returns the running thread's thread id. Equivalent to @code{thread_current ()->tid}. @end deftypefun @deftypefun {const char *} thread_name (void) Returns the running thread's name. Equivalent to @code{thread_current ()->name}. @end deftypefun @deftypefun void thread_exit (void) @code{NO_RETURN} Causes the current thread to exit. Never returns, hence @code{NO_RETURN} (@pxref{Function and Parameter Attributes}). @end deftypefun @deftypefun void thread_yield (void) Yields the CPU to the scheduler, which picks a new thread to run. The new thread might be the current thread, so you can't depend on this function to keep this thread from running for any particular length of time. @end deftypefun @deftypefun void thread_foreach (thread_action_func *@var{action}, void *@var{aux}) Iterates over all threads @var{t} and invokes @code{action(t, aux)} on each. @var{action} must refer to a function that matches the signature given by @func{thread_action_func}: @deftp {Type} {void thread_action_func (struct thread *@var{thread}, void *@var{aux})} Performs some action on a thread, given @var{aux}. @end deftp @end deftypefun @deftypefun int thread_get_priority (void) @deftypefunx void thread_set_priority (int @var{new_priority}) Stub to set and get thread priority. @xref{Priority Scheduling}. @end deftypefun @deftypefun int thread_get_nice (void) @deftypefunx void thread_set_nice (int @var{new_nice}) @deftypefunx int thread_get_recent_cpu (void) @deftypefunx int thread_get_load_avg (void) Stubs for the advanced scheduler. @xref{4.4BSD Scheduler}. @end deftypefun @node Thread Switching @subsection Thread Switching @func{schedule} is responsible for switching threads. It is internal to @file{threads/thread.c} and called only by the three public thread functions that need to switch threads: @func{thread_block}, @func{thread_exit}, and @func{thread_yield}. Before any of these functions call @func{schedule}, they disable interrupts (or ensure that they are already disabled) and then change the running thread's state to something other than running. @func{schedule} is short but tricky. It records the current thread in local variable @var{cur}, determines the next thread to run as local variable @var{next} (by calling @func{next_thread_to_run}), and then calls @func{switch_threads} to do the actual thread switch. The thread we switched to was also running inside @func{switch_threads}, as are all the threads not currently running, so the new thread now returns out of @func{switch_threads}, returning the previously running thread. @func{switch_threads} is an assembly language routine in @file{threads/switch.S}. It saves registers on the stack, saves the CPU's current stack pointer in the current @struct{thread}'s @code{stack} member, restores the new thread's @code{stack} into the CPU's stack pointer, restores registers from the stack, and returns. The rest of the scheduler is implemented in @func{thread_schedule_tail}. It marks the new thread as running. If the thread we just switched from is in the dying state, then it also frees the page that contained the dying thread's @struct{thread} and stack. These couldn't be freed prior to the thread switch because the switch needed to use it. Running a thread for the first time is a special case. When @func{thread_create} creates a new thread, it goes through a fair amount of trouble to get it started properly. In particular, the new thread hasn't started running yet, so there's no way for it to be running inside @func{switch_threads} as the scheduler expects. To solve the problem, @func{thread_create} creates some fake stack frames in the new thread's stack: @itemize @bullet @item The topmost fake stack frame is for @func{switch_threads}, represented by @struct{switch_threads_frame}. The important part of this frame is its @code{eip} member, the return address. We point @code{eip} to @func{switch_entry}, indicating it to be the function that called @func{switch_threads}. @item The next fake stack frame is for @func{switch_entry}, an assembly language routine in @file{threads/switch.S} that adjusts the stack pointer,@footnote{This is because @func{switch_threads} takes arguments on the stack and the 80@var{x}86 SVR4 calling convention requires the caller, not the called function, to remove them when the call is complete. See @bibref{SysV-i386} chapter 3 for details.} calls @func{thread_schedule_tail} (this special case is why @func{thread_schedule_tail} is separate from @func{schedule}), and returns. We fill in its stack frame so that it returns into @func{kernel_thread}, a function in @file{threads/thread.c}. @item The final stack frame is for @func{kernel_thread}, which enables interrupts and calls the thread's function (the function passed to @func{thread_create}). If the thread's function returns, it calls @func{thread_exit} to terminate the thread. @end itemize @node Synchronization @section Synchronization If sharing of resources between threads is not handled in a careful, controlled fashion, the result is usually a big mess. This is especially the case in operating system kernels, where faulty sharing can crash the entire machine. Pintos provides several synchronization primitives to help out. @cartouche @noindent@strong{Important:} For the scope of all Pintos tasks, you may assume that any 1, 2 or 4 byte read or write operation on aligned memory is atomic. All other read or write operations could potentially be interrupted or descheduled. @end cartouche @menu * Disabling Interrupts:: * Semaphores:: * Locks:: * Monitors:: * Optimization Barriers:: @end menu @node Disabling Interrupts @subsection Disabling Interrupts The crudest way to do synchronization is to disable interrupts, that is, to temporarily prevent the CPU from responding to interrupts. If interrupts are off, no other thread will preempt the running thread, because thread preemption is driven by the timer interrupt. If interrupts are on, as they normally are, then the running thread may be preempted by another at any time, whether between two C statements or even within the execution of one. Incidentally, this means that Pintos is a ``preemptible kernel,'' that is, kernel threads can be preempted at any time. Traditional Unix systems are ``nonpreemptible,'' that is, kernel threads can only be preempted at points where they explicitly call into the scheduler. (User programs can be preempted at any time in both models.) As you might imagine, preemptible kernels require more explicit synchronization. You should have little need to set the interrupt state directly. Most of the time you should use the other synchronization primitives described in the following sections. The main reason to disable interrupts is to synchronize kernel threads with external interrupt handlers, which cannot sleep and thus cannot use most other forms of synchronization (@pxref{External Interrupt Handling}). Some external interrupts cannot be postponed, even by disabling interrupts. These interrupts, called @dfn{non-maskable interrupts} (NMIs), are supposed to be used only in emergencies, e.g.@: when the computer is on fire. Pintos does not handle non-maskable interrupts. Types and functions for disabling and enabling interrupts are in @file{threads/interrupt.h}. @deftp Type {enum intr_level} One of @code{INTR_OFF} or @code{INTR_ON}, denoting that interrupts are disabled or enabled, respectively. @end deftp @deftypefun {enum intr_level} intr_get_level (void) Returns the current interrupt state. @end deftypefun @deftypefun {enum intr_level} intr_set_level (enum intr_level @var{level}) Turns interrupts on or off according to @var{level}. Returns the previous interrupt state. @end deftypefun @deftypefun {enum intr_level} intr_enable (void) Turns interrupts on. Returns the previous interrupt state. @end deftypefun @deftypefun {enum intr_level} intr_disable (void) Turns interrupts off. Returns the previous interrupt state. @end deftypefun @node Semaphores @subsection Semaphores A @dfn{semaphore} is a nonnegative integer together with two operators that manipulate it atomically, which are: @itemize @bullet @item ``Down'' or ``P'': wait for the value to become positive, then decrement it. @item ``Up'' or ``V'': increment the value (and wake up one waiting thread, if any). @end itemize A semaphore initialized to 0 may be used to wait for an event that will happen exactly once. For example, suppose thread @var{A} starts another thread @var{B} and wants to wait for @var{B} to signal that some activity is complete. @var{A} can create a semaphore initialized to 0, pass it to @var{B} as it starts it, and then ``down'' the semaphore. When @var{B} finishes its activity, it ``ups'' the semaphore. This works regardless of whether @var{A} ``downs'' the semaphore or @var{B} ``ups'' it first. A semaphore initialized to 1 is typically used for controlling access to a resource. Before a block of code starts using the resource, it ``downs'' the semaphore, then after it is done with the resource it ``ups'' the resource. In such a case a lock, described below, may be more appropriate. Semaphores can also be initialized to values larger than 1. These are rarely used. Semaphores were invented by Edsger Dijkstra and first used in the THE operating system (@bibref{Dijkstra}). Pintos' semaphore type and operations are declared in @file{threads/synch.h}. @deftp {Type} {struct semaphore} Represents a semaphore. @end deftp @deftypefun void sema_init (struct semaphore *@var{sema}, unsigned @var{value}) Initializes @var{sema} as a new semaphore with the given initial @var{value}. @end deftypefun @deftypefun void sema_down (struct semaphore *@var{sema}) Executes the ``down'' or ``P'' operation on @var{sema}, waiting for its value to become positive and then decrementing it by one. @end deftypefun @deftypefun bool sema_try_down (struct semaphore *@var{sema}) Tries to execute the ``down'' or ``P'' operation on @var{sema}, without waiting. Returns true if @var{sema} was successfully decremented, or false if it was already zero and thus could not be decremented without waiting. Calling this function in a tight loop wastes CPU time, so use @func{sema_down} or find a different approach instead. @end deftypefun @deftypefun void sema_up (struct semaphore *@var{sema}) Executes the ``up'' or ``V'' operation on @var{sema}, incrementing its value. If any threads are waiting on @var{sema}, wakes one of them up. Unlike most synchronization primitives, @func{sema_up} may be called inside an external interrupt handler (@pxref{External Interrupt Handling}). @end deftypefun Semaphores are internally implemented by disabling interrupts (@pxref{Disabling Interrupts}) and blocking and unblocking threads (@func{thread_block} and @func{thread_unblock}). Each semaphore maintains a list of waiting threads, using the linked list implementation in @file{lib/kernel/list.c}. @node Locks @subsection Locks A @dfn{lock} is like a semaphore with an initial value of 1 (@pxref{Semaphores}). A lock's equivalent of ``up'' is called ``release'', and the ``down'' operation is called ``acquire''. Compared to a semaphore, a lock has one added restriction: only the thread that acquires a lock, called the lock's ``owner'', is allowed to release it. If this restriction is a problem, it's a good sign that a semaphore should be used, instead of a lock. Locks in Pintos are not ``recursive,'' that is, it is an error for the thread currently holding a lock to try to acquire that lock. Lock types and functions are declared in @file{threads/synch.h}. @deftp {Type} {struct lock} Represents a lock. @end deftp @deftypefun void lock_init (struct lock *@var{lock}) Initializes @var{lock} as a new lock. The lock is not initially owned by any thread. @end deftypefun @deftypefun void lock_acquire (struct lock *@var{lock}) Acquires @var{lock} for the current thread, first waiting for any current owner to release it if necessary. @end deftypefun @deftypefun bool lock_try_acquire (struct lock *@var{lock}) Tries to acquire @var{lock} for use by the current thread, without waiting. Returns true if successful, false if the lock is already owned. Calling this function in a tight loop is a bad idea because it wastes CPU time, so use @func{lock_acquire} instead. @end deftypefun @deftypefun void lock_release (struct lock *@var{lock}) Releases @var{lock}, which the current thread must own. @end deftypefun @deftypefun bool lock_held_by_current_thread (const struct lock *@var{lock}) Returns true if the running thread owns @var{lock}, false otherwise. There is no function to test whether an arbitrary thread owns a lock, because the answer could change before the caller could act on it. @end deftypefun @node Monitors @subsection Monitors A @dfn{monitor} is a higher-level form of synchronization than a semaphore or a lock. A monitor consists of data being synchronized, plus a lock, called the @dfn{monitor lock}, and one or more @dfn{condition variables}. Before it accesses the protected data, a thread first acquires the monitor lock. It is then said to be ``in the monitor''. While in the monitor, the thread has control over all the protected data, which it may freely examine or modify. When access to the protected data is complete, it releases the monitor lock. Condition variables allow code in the monitor to wait for a condition to become true. Each condition variable is associated with an abstract condition, e.g.@: ``some data has arrived for processing'' or ``over 10 seconds has passed since the user's last keystroke''. When code in the monitor needs to wait for a condition to become true, it ``waits'' on the associated condition variable, which releases the lock and waits for the condition to be signaled. If, on the other hand, it has caused one of these conditions to become true, it ``signals'' the condition to wake up one waiter, or ``broadcasts'' the condition to wake all of them. The theoretical framework for monitors was laid out by C.@: A.@: R.@: Hoare (@bibref{Hoare}). Their practical usage was later elaborated in a paper on the Mesa operating system (@bibref{Lampson}). Condition variable types and functions are declared in @file{threads/synch.h}. @deftp {Type} {struct condition} Represents a condition variable. @end deftp @deftypefun void cond_init (struct condition *@var{cond}) Initializes @var{cond} as a new condition variable. @end deftypefun @deftypefun void cond_wait (struct condition *@var{cond}, struct lock *@var{lock}) Atomically releases @var{lock} (the monitor lock) and waits for @var{cond} to be signaled by some other piece of code. After @var{cond} is signaled, reacquires @var{lock} before returning. @var{lock} must be held before calling this function. Sending a signal and waking up from a wait are not an atomic operation. Thus, typically, @func{cond_wait}'s caller must recheck the condition after the wait completes and, if necessary, wait again. See the next section for an example. @end deftypefun @deftypefun void cond_signal (struct condition *@var{cond}, struct lock *@var{lock}) If any threads are waiting on @var{cond} (protected by monitor lock @var{lock}), then this function wakes up one of them. If no threads are waiting, returns without performing any action. @var{lock} must be held before calling this function. @end deftypefun @deftypefun void cond_broadcast (struct condition *@var{cond}, struct lock *@var{lock}) Wakes up all threads, if any, waiting on @var{cond} (protected by monitor lock @var{lock}). @var{lock} must be held before calling this function. @end deftypefun @subsubsection Monitor Example The classical example of a monitor is handling a buffer into which one or more ``producer'' threads write characters and out of which one or more ``consumer'' threads read characters. To implement this we need, besides the monitor lock, two condition variables which we will call @var{not_full} and @var{not_empty}: @example char buf[BUF_SIZE]; /* @r{Buffer.} */ size_t n = 0; /* @r{0 <= n <= @var{BUF_SIZE}: # of characters in buffer.} */ size_t head = 0; /* @r{@var{buf} index of next char to write (mod @var{BUF_SIZE}).} */ size_t tail = 0; /* @r{@var{buf} index of next char to read (mod @var{BUF_SIZE}).} */ struct lock lock; /* @r{Monitor lock.} */ struct condition not_empty; /* @r{Signaled when the buffer is not empty.} */ struct condition not_full; /* @r{Signaled when the buffer is not full.} */ @dots{}@r{initialize the locks and condition variables}@dots{} void put (char ch) @{ lock_acquire (&lock); while (n == BUF_SIZE) @{ /* @r{Can't add to @var{buf} as long as it's full.} */ cond_wait (¬_full, &lock); @} buf[head++ % BUF_SIZE] = ch; /* @r{Add @var{ch} to @var{buf}.} */ n++; cond_signal (¬_empty, &lock); /* @r{@var{buf} can't be empty anymore.} */ lock_release (&lock); @} char get (void) @{ char ch; lock_acquire (&lock); while (n == 0) @{ /* @r{Can't read @var{buf} as long as it's empty.} */ cond_wait (¬_empty, &lock); @} ch = buf[tail++ % BUF_SIZE]; /* @r{Get @var{ch} from @var{buf}.} */ n--; cond_signal (¬_full, &lock); /* @r{@var{buf} can't be full anymore.} */ lock_release (&lock); return ch; @} @end example Note that @code{BUF_SIZE} must divide evenly into @code{SIZE_MAX + 1} for the above code to be completely correct. Otherwise, it will fail the first time @code{head} wraps around to 0. In practice, @code{BUF_SIZE} would ordinarily be a power of 2. @node Optimization Barriers @subsection Optimization Barriers @c We should try to come up with a better example. @c Perhaps something with a linked list? An @dfn{optimization barrier} is a special statement that prevents the compiler from making assumptions about the state of memory across the barrier. The compiler will not reorder reads or writes of variables across the barrier or assume that a variable's value is unmodified across the barrier, except for local variables whose address is never taken. In Pintos, @file{threads/synch.h} defines the @code{barrier()} macro as an optimization barrier. One reason to use an optimization barrier is when data can change asynchronously, without the compiler's knowledge, e.g.@: by another thread or an interrupt handler. The @func{too_many_loops} function in @file{devices/timer.c} is an example. This function starts out by busy-waiting in a loop until a timer tick occurs: @example /* Wait for a timer tick. */ int64_t start = ticks; while (ticks == start) @{ barrier (); @} @end example @noindent Without an optimization barrier in the loop, the compiler could conclude that the loop would never terminate, because @code{start} and @code{ticks} start out equal and the loop itself never changes them. It could then ``optimize'' the function into an infinite loop, which would definitely be undesirable. Optimization barriers can be used to avoid other compiler optimizations. The @func{busy_wait} function, also in @file{devices/timer.c}, is an example. It contains this loop: @example while (loops-- > 0) @{ barrier (); @} @end example @noindent The goal of this loop is to busy-wait by counting @code{loops} down from its original value to 0. Without the barrier, the compiler could delete the loop entirely, because it produces no useful output and has no side effects. The barrier forces the compiler to pretend that the loop body has an important effect. Finally, optimization barriers can be used to force the ordering of memory reads or writes. For example, suppose we add a ``feature'' that, whenever a timer interrupt occurs, the character in global variable @code{timer_put_char} is printed on the console, but only if global Boolean variable @code{timer_do_put} is true. The best way to set up @samp{x} to be printed is then to use an optimization barrier, like this: @example timer_put_char = 'x'; barrier (); timer_do_put = true; @end example Without the barrier, the code is buggy because the compiler is free to reorder operations when it doesn't see a reason to keep them in the same order. In this case, the compiler doesn't know that the order of assignments is important, so its optimizer is permitted to exchange their order. There's no telling whether it will actually do this, and it is possible that passing the compiler different optimization flags or using a different version of the compiler will produce different behavior. Another solution is to disable interrupts around the assignments. This does not prevent reordering, but it prevents the interrupt handler from intervening between the assignments. It also has the extra runtime cost of disabling and re-enabling interrupts: @example enum intr_level old_level = intr_disable (); timer_put_char = 'x'; timer_do_put = true; intr_set_level (old_level); @end example A third solution is to mark the declarations of @code{timer_put_char} and @code{timer_do_put} as @samp{volatile}. This keyword tells the compiler that the variables are externally observable and restricts its latitude for optimization. However, the semantics of @samp{volatile} are not well-defined, so it is not a good general solution. The base Pintos code does not use @samp{volatile} at all. The following is @emph{not} a solution, because locks neither prevent interrupts nor prevent the compiler from reordering the code within the region where the lock is held: @example lock_acquire (&timer_lock); /* INCORRECT CODE */ timer_put_char = 'x'; timer_do_put = true; lock_release (&timer_lock); @end example The compiler treats invocation of any function defined externally, that is, in another source file, as a limited form of optimization barrier. Specifically, the compiler assumes that any externally defined function may access any statically or dynamically allocated data and any local variable whose address is taken. This often means that explicit barriers can be omitted. It is one reason that Pintos contains few explicit barriers. A function defined in the same source file, or in a header included by the source file, cannot be relied upon as a optimization barrier. This applies even to invocation of a function before its definition, because the compiler may read and parse the entire source file before performing optimization. @node Interrupt Handling @section Interrupt Handling An @dfn{interrupt} notifies the CPU of some event. Much of the work of an operating system relates to interrupts in one way or another. For our purposes, we classify interrupts into two broad categories: @itemize @bullet @item @dfn{Internal interrupts}, that is, interrupts caused directly by CPU instructions. System calls, attempts at invalid memory access (@dfn{page faults}), and attempts to divide by zero are some activities that cause internal interrupts. Because they are caused by CPU instructions, internal interrupts are @dfn{synchronous} or synchronized with CPU instructions. @func{intr_disable} does not disable internal interrupts. @item @dfn{External interrupts}, that is, interrupts originating outside the CPU. These interrupts come from hardware devices such as the system timer, keyboard, serial ports, and disks. External interrupts are @dfn{asynchronous}, meaning that their delivery is not synchronized with instruction execution. Handling of external interrupts can be postponed with @func{intr_disable} and related functions (@pxref{Disabling Interrupts}). @end itemize The CPU treats both classes of interrupts largely the same way, so Pintos has common infrastructure to handle both classes. The following section describes this common infrastructure. The sections after that give the specifics of external and internal interrupts. If you haven't already read chapter 3, ``Basic Execution Environment,'' in @bibref{IA32-v1}, it is recommended that you do so now. You might also want to skim chapter 5, ``Interrupt and Exception Handling,'' in @bibref{IA32-v3a}. @menu * Interrupt Infrastructure:: * Internal Interrupt Handling:: * External Interrupt Handling:: @end menu @node Interrupt Infrastructure @subsection Interrupt Infrastructure When an interrupt occurs, the CPU saves its most essential state on the current stack (determined by esp) and jumps to an interrupt handler routine. The 80@var{x}86 architecture supports 256 interrupts, numbered 0 through 255, each with an independent handler defined in an array called the @dfn{interrupt descriptor table} or IDT. In Pintos, @func{intr_init} in @file{threads/interrupt.c} sets up the IDT so that each entry points to a unique entry point in @file{threads/intr-stubs.S} named @func{intr@var{NN}_stub}, where @var{NN} is the interrupt number in hexadecimal. Because the CPU doesn't give us any other way to find out the interrupt number, this entry point pushes the interrupt number on the stack. Then it jumps to @func{intr_entry}, which pushes all the registers that the processor didn't already push for us, and then calls @func{intr_handler}, which brings us back into C in @file{threads/interrupt.c}. The main job of @func{intr_handler} is to call the function registered for handling the particular interrupt. (If no function is registered, it dumps some information to the console and panics.) It also does some extra processing for external interrupts (@pxref{External Interrupt Handling}). When @func{intr_handler} returns, the assembly code in @file{threads/intr-stubs.S} restores all the CPU registers saved earlier and directs the CPU to return from the interrupt. The following types and functions are common to all interrupts: @deftp {Type} {void intr_handler_func (struct intr_frame *@var{frame})} This is how an interrupt handler function must be declared. Its @var{frame} argument (see below) allows it to determine the cause of the interrupt and the state of the thread that was interrupted. @end deftp @deftp {Type} {struct intr_frame} The stack frame of an interrupt handler, as saved by the CPU, the interrupt stubs, and @func{intr_entry}. Its most interesting members are described below. @end deftp @deftypecv {Member} {@struct{intr_frame}} uint32_t edi @deftypecvx {Member} {@struct{intr_frame}} uint32_t esi @deftypecvx {Member} {@struct{intr_frame}} uint32_t ebp @deftypecvx {Member} {@struct{intr_frame}} uint32_t esp_dummy @deftypecvx {Member} {@struct{intr_frame}} uint32_t ebx @deftypecvx {Member} {@struct{intr_frame}} uint32_t edx @deftypecvx {Member} {@struct{intr_frame}} uint32_t ecx @deftypecvx {Member} {@struct{intr_frame}} uint32_t eax @deftypecvx {Member} {@struct{intr_frame}} uint16_t es @deftypecvx {Member} {@struct{intr_frame}} uint16_t ds Register values in the interrupted thread, pushed by @func{intr_entry}. The @code{esp_dummy} value isn't actually used (refer to the description of @code{PUSHA} in @bibref{IA32-v2b} for details). @end deftypecv @deftypecv {Member} {@struct{intr_frame}} uint32_t vec_no The interrupt vector number, ranging from 0 to 255. @end deftypecv @deftypecv {Member} {@struct{intr_frame}} uint32_t error_code The ``error code'' pushed on the stack by the CPU for some internal interrupts. @end deftypecv @deftypecv {Member} {@struct{intr_frame}} void (*eip) (void) The address of the next instruction to be executed by the interrupted thread. @end deftypecv @deftypecv {Member} {@struct{intr_frame}} {void *} esp The interrupted thread's stack pointer. @end deftypecv @deftypefun {const char *} intr_name (uint8_t @var{vec}) Returns the name of the interrupt numbered @var{vec}, or @code{"unknown"} if the interrupt has no registered name. @end deftypefun @node Internal Interrupt Handling @subsection Internal Interrupt Handling Internal interrupts are caused directly by CPU instructions executed by the running kernel thread or user process (from task 2 onward). An internal interrupt is therefore said to arise in a ``process context.'' In an internal interrupt's handler, it can make sense to examine the @struct{intr_frame} passed to the interrupt handler, or even to modify it. When the interrupt returns, modifications in @struct{intr_frame} become changes to the calling thread or process's state. For example, the Pintos system call handler returns a value to the user program by modifying the saved EAX register (@pxref{System Call Details}). There are no special restrictions on what an internal interrupt handler can or can't do. Generally they should run with interrupts enabled, just like other code, so they can be preempted by other kernel threads. Thus, they do need to synchronize with other threads on shared data and other resources (@pxref{Synchronization}). Of course, this only makes sense if they are not updating critical system data at the time. Internal interrupt handlers can be invoked recursively. For example, the system call handler might cause a page fault while attempting to read user memory. Deep recursion would risk overflowing the limited kernel stack (@pxref{struct thread}), but should be unnecessary. @deftypefun void intr_register_int (uint8_t @var{vec}, int @var{dpl}, enum intr_level @var{level}, intr_handler_func *@var{handler}, const char *@var{name}) Registers @var{handler} to be called when internal interrupt numbered @var{vec} is triggered. Names the interrupt @var{name} for debugging purposes. If @var{level} is @code{INTR_ON}, external interrupts will be processed normally during the interrupt handler's execution, which is normally desirable. Specifying @code{INTR_OFF} will cause the CPU to disable external interrupts when it invokes the interrupt handler. The effect is slightly different from calling @func{intr_disable} inside the handler, because that leaves a window of one or more CPU instructions in which external interrupts are still enabled. This is important for the page fault handler; refer to the comments in @file{userprog/exception.c} for details. @var{dpl} determines how the interrupt can be invoked. If @var{dpl} is 0, then the interrupt can be invoked only by kernel threads. Otherwise @var{dpl} should be 3, which allows user processes to invoke the interrupt with an explicit INT instruction. The value of @var{dpl} doesn't affect user processes' ability to invoke the interrupt indirectly, e.g.@: an invalid memory reference will cause a page fault regardless of @var{dpl}. @end deftypefun @node External Interrupt Handling @subsection External Interrupt Handling External interrupts are caused by events outside the CPU. They are asynchronous, so they can be invoked at any time that interrupts have not been disabled. We say that an external interrupt runs in an ``interrupt context.'' In an external interrupt, the @struct{intr_frame} passed to the handler is not very meaningful. It describes the state of the thread or process that was interrupted, but there is no way to predict which one that is. It is possible, although rarely useful, to examine it, but modifying it is a recipe for disaster. Only one external interrupt may be processed at a time. Neither internal nor external interrupt may nest within an external interrupt handler. Thus, an external interrupt's handler must run with interrupts disabled (@pxref{Disabling Interrupts}). An external interrupt handler must not sleep or yield, which rules out calling @func{lock_acquire}, @func{thread_yield}, and many other functions. Sleeping in interrupt context would effectively put the interrupted thread to sleep, too, until the interrupt handler was again scheduled and returned. This would be unfair to the unlucky thread, and it would deadlock if the handler were waiting for the sleeping thread to, e.g., release a lock. An external interrupt handler effectively monopolizes the machine and delays all other activities. Therefore, external interrupt handlers should complete as quickly as they can. Anything that requires a significant amount of CPU time should instead run in a kernel thread, possibly one that the interrupt triggers using a synchronization primitive. External interrupts are controlled by a pair of devices outside the CPU called @dfn{programmable interrupt controllers}, @dfn{PICs} for short. When @func{intr_init} sets up the CPU's IDT, it also initializes the PICs for interrupt handling. The PICs also must be ``acknowledged'' at the end of processing for each external interrupt. @func{intr_handler} takes care of that by calling @func{pic_end_of_interrupt}, which properly signals the PICs. The following functions relate to external interrupts: @deftypefun void intr_register_ext (uint8_t @var{vec}, intr_handler_func *@var{handler}, const char *@var{name}) Registers @var{handler} to be called when external interrupt numbered @var{vec} is triggered. Names the interrupt @var{name} for debugging purposes. The handler will run with interrupts disabled. @end deftypefun @deftypefun bool intr_context (void) Returns true if we are running in an interrupt context, otherwise false. Mainly used in functions that might sleep or that otherwise should not be called from interrupt context, in this form: @example ASSERT (!intr_context ()); @end example @end deftypefun @deftypefun void intr_yield_on_return (void) When called in an interrupt context, causes @func{thread_yield} to be called just before the interrupt returns. Used in the timer interrupt handler when a thread's time slice expires, to cause a new thread to be scheduled. @end deftypefun @node Memory Allocation @section Memory Allocation Pintos contains two memory allocators, one that allocates memory in units of a page, and one that can allocate blocks of any size. @menu * Page Allocator:: * Block Allocator:: @end menu @node Page Allocator @subsection Page Allocator The page allocator declared in @file{threads/palloc.h} allocates memory in units of a page. It is most often used to allocate memory one page at a time, but it can also allocate multiple contiguous pages at once. The page allocator divides the memory it allocates into two pools, called the kernel and user pools. By default, each pool gets half of system memory above @w{1 MB}, but the division can be changed with the @option{-ul} kernel command line option (@pxref{Why PAL_USER?}). An allocation request draws from one pool or the other. If one pool becomes empty, the other may still have free pages. The user pool should be used for allocating memory for user processes and the kernel pool for all other allocations. This will only become important starting with task 3. Until then, all allocations should be made from the kernel pool. Each pool's usage is tracked with a bitmap, one bit per page in the pool. A request to allocate @var{n} pages scans the bitmap for @var{n} consecutive bits set to false, indicating that those pages are free, and then sets those bits to true to mark them as used. This is a ``first fit'' allocation strategy (@pxref{Wilson}). The page allocator is subject to fragmentation. That is, it may not be possible to allocate @var{n} contiguous pages even though @var{n} or more pages are free, because the free pages are separated by used pages. In fact, in pathological cases it may be impossible to allocate 2 contiguous pages even though half of the pool's pages are free. Single-page requests can't fail due to fragmentation, so requests for multiple contiguous pages should be limited as much as possible. Pages may not be allocated from interrupt context, but they may be freed. When a page is freed, all of its bytes are cleared to @t{0xcc}, as a debugging aid (@pxref{Debugging Tips}). Page allocator types and functions are described below: @deftypefun {void *} palloc_get_page (enum palloc_flags @var{flags}) @deftypefunx {void *} palloc_get_multiple (enum palloc_flags @var{flags}, size_t @var{page_cnt}) Obtains and returns one page, or @var{page_cnt} contiguous pages, respectively. Returns a null pointer if the pages cannot be allocated. The @var{flags} argument may be any combination of the following flags: @defvr {Page Allocator Flag} @code{PAL_ASSERT} If the pages cannot be allocated, panic the kernel. This is only appropriate during kernel initialization. User processes should never be permitted to panic the kernel. @end defvr @defvr {Page Allocator Flag} @code{PAL_ZERO} Zero all the bytes in the allocated pages before returning them. If not set, the contents of newly allocated pages are unpredictable. @end defvr @defvr {Page Allocator Flag} @code{PAL_USER} Obtain the pages from the user pool. If not set, pages are allocated from the kernel pool. @end defvr @end deftypefun @deftypefun void palloc_free_page (void *@var{page}) @deftypefunx void palloc_free_multiple (void *@var{pages}, size_t @var{page_cnt}) Frees one page, or @var{page_cnt} contiguous pages, respectively, starting at @var{pages}. All of the pages must have been obtained using @func{palloc_get_page} or @func{palloc_get_multiple}. @end deftypefun @node Block Allocator @subsection Block Allocator The block allocator, declared in @file{threads/malloc.h}, can allocate blocks of any size. It is layered on top of the page allocator described in the previous section. Blocks returned by the block allocator are obtained from the kernel pool. The block allocator uses two different strategies for allocating memory. The first strategy applies to blocks that are 1 kB or smaller (one-fourth of the page size). These allocations are rounded up to the nearest power of 2, or 16 bytes, whichever is larger. Then they are grouped into a page used only for allocations of that size. The second strategy applies to blocks larger than 1 kB. These allocations (plus a small amount of overhead) are rounded up to the nearest page in size, and then the block allocator requests that number of contiguous pages from the page allocator. In either case, the difference between the allocation requested size and the actual block size is wasted. A real operating system would carefully tune its allocator to minimize this waste, but this is unimportant in an instructional system like Pintos. As long as a page can be obtained from the page allocator, small allocations always succeed. Most small allocations do not require a new page from the page allocator at all, because they are satisfied using part of a page already allocated. However, large allocations always require calling into the page allocator, and any allocation that needs more than one contiguous page can fail due to fragmentation, as already discussed in the previous section. Thus, you should minimize the number of large allocations in your code, especially those over approximately 4 kB each. When a block is freed, all of its bytes are cleared to @t{0xcc}, as a debugging aid (@pxref{Debugging Tips}). The block allocator may not be called from interrupt context. The block allocator functions are described below. Their interfaces are the same as the standard C library functions of the same names. @deftypefun {void *} malloc (size_t @var{size}) Obtains and returns a new block, from the kernel pool, at least @var{size} bytes long. Returns a null pointer if @var{size} is zero or if memory is not available. @end deftypefun @deftypefun {void *} calloc (size_t @var{a}, size_t @var{b}) Obtains a returns a new block, from the kernel pool, at least @code{@var{a} * @var{b}} bytes long. The block's contents will be cleared to zeros. Returns a null pointer if @var{a} or @var{b} is zero or if insufficient memory is available. @end deftypefun @deftypefun {void *} realloc (void *@var{block}, size_t @var{new_size}) Attempts to resize @var{block} to @var{new_size} bytes, possibly moving it in the process. If successful, returns the new block, in which case the old block must no longer be accessed. On failure, returns a null pointer, and the old block remains valid. A call with @var{block} null is equivalent to @func{malloc}. A call with @var{new_size} zero is equivalent to @func{free}. @end deftypefun @deftypefun void free (void *@var{block}) Frees @var{block}, which must have been previously returned by @func{malloc}, @func{calloc}, or @func{realloc} (and not yet freed). @end deftypefun @node Virtual Addresses @section Virtual Addresses A 32-bit virtual address can be divided into a 20-bit @dfn{page number} and a 12-bit @dfn{page offset} (or just @dfn{offset}), like this: @example @group 31 12 11 0 +-------------------+-----------+ | Page Number | Offset | +-------------------+-----------+ Virtual Address @end group @end example Header @file{threads/vaddr.h} defines these functions and macros for working with virtual addresses: @defmac PGSHIFT @defmacx PGBITS The bit index (0) and number of bits (12) of the offset part of a virtual address, respectively. @end defmac @defmac PGMASK A bit mask with the bits in the page offset set to 1, the rest set to 0 (@t{0xfff}). @end defmac @defmac PGSIZE The page size in bytes (4,096). @end defmac @deftypefun unsigned pg_ofs (const void *@var{va}) Extracts and returns the page offset in virtual address @var{va}. @end deftypefun @deftypefun uintptr_t pg_no (const void *@var{va}) Extracts and returns the page number in virtual address @var{va}. @end deftypefun @deftypefun {void *} pg_round_down (const void *@var{va}) Returns the start of the virtual page that @var{va} points within, that is, @var{va} with the page offset set to 0. @end deftypefun @deftypefun {void *} pg_round_up (const void *@var{va}) Returns @var{va} rounded up to the nearest page boundary. @end deftypefun Virtual memory in Pintos is divided into two regions: user virtual memory and kernel virtual memory (@pxref{Virtual Memory Layout}). The boundary between them is @code{PHYS_BASE}: @defmac PHYS_BASE Base address of kernel virtual memory. It defaults to @t{0xc0000000} (3 GB), but it may be changed to any multiple of @t{0x10000000} from @t{0x80000000} to @t{0xf0000000}. User virtual memory ranges from virtual address 0 up to @code{PHYS_BASE}. Kernel virtual memory occupies the rest of the virtual address space, from @code{PHYS_BASE} up to 4 GB. @end defmac @deftypefun {bool} is_user_vaddr (const void *@var{va}) @deftypefunx {bool} is_kernel_vaddr (const void *@var{va}) Returns true if @var{va} is a user or kernel virtual address, respectively, false otherwise. @end deftypefun The 80@var{x}86 architecture doesn't provide any way to directly access memory given a physical address. This ability is often necessary in an operating system kernel, so Pintos works around it by mapping kernel virtual memory one-to-one to physical memory. That is, virtual address @code{PHYS_BASE} accesses physical address 0, virtual address @code{PHYS_BASE} + @t{0x1234} accesses physical address @t{0x1234}, and so on up to the size of the machine's physical memory. Thus, adding @code{PHYS_BASE} to a physical address obtains a kernel virtual address that accesses that address; conversely, subtracting @code{PHYS_BASE} from a kernel virtual address obtains the corresponding physical address. Header @file{threads/vaddr.h} provides a pair of functions to do these translations: @deftypefun {void *} ptov (uintptr_t @var{pa}) Returns the kernel virtual address corresponding to physical address @var{pa}, which should be between 0 and the number of bytes of physical memory. @end deftypefun @deftypefun {uintptr_t} vtop (void *@var{va}) Returns the physical address corresponding to @var{va}, which must be a kernel virtual address. @end deftypefun @node Page Table @section Page Table The code in @file{pagedir.c} is an abstract interface to the 80@var{x}86 hardware page table, also called a ``page directory'' by Intel processor documentation. The page table interface uses a @code{uint32_t *} to represent a page table because this is convenient for accessing their internal structure. The sections below describe the page table interface and internals. @menu * Page Table Creation Destruction Activation:: * Page Tables Inspection and Updates:: * Page Table Accessed and Dirty Bits:: * Page Table Details:: @end menu @node Page Table Creation Destruction Activation @subsection Creation, Destruction, and Activation These functions create, destroy, and activate page tables. The base Pintos code already calls these functions where necessary, so it should not be necessary to call them yourself. @deftypefun {uint32_t *} pagedir_create (void) Creates and returns a new page table. The new page table contains Pintos's normal kernel virtual page mappings, but no user virtual mappings. Returns a null pointer if memory cannot be obtained. @end deftypefun @deftypefun void pagedir_destroy (uint32_t *@var{pd}) Frees all of the resources held by @var{pd}, including the page table itself and the frames that it maps. @end deftypefun @deftypefun void pagedir_activate (uint32_t *@var{pd}) Activates @var{pd}. The active page table is the one used by the CPU to translate memory references. @end deftypefun @node Page Tables Inspection and Updates @subsection Inspection and Updates These functions examine or update the mappings from pages to frames encapsulated by a page table. They work on both active and inactive page tables (that is, those for running and suspended processes), flushing the TLB as necessary. @deftypefun bool pagedir_set_page (uint32_t *@var{pd}, void *@var{upage}, void *@var{kpage}, bool @var{writable}) Adds to @var{pd} a mapping from user page @var{upage} to the frame identified by kernel virtual address @var{kpage}. If @var{writable} is true, the page is mapped read/write; otherwise, it is mapped read-only. User page @var{upage} must not already be mapped in @var{pd}. Kernel page @var{kpage} should be a kernel virtual address obtained from the user pool with @code{palloc_get_page(PAL_USER)} (@pxref{Why PAL_USER?}). Returns true if successful, false on failure. Failure will occur if additional memory required for the page table cannot be obtained. @end deftypefun @deftypefun {void *} pagedir_get_page (uint32_t *@var{pd}, const void *@var{uaddr}) Looks up the frame mapped to @var{uaddr} in @var{pd}. Returns the kernel virtual address for that frame, if @var{uaddr} is mapped, or a null pointer if it is not. @end deftypefun @deftypefun void pagedir_clear_page (uint32_t *@var{pd}, void *@var{page}) Marks @var{page} ``not present'' in @var{pd}. Later accesses to the page will fault. Other bits in the page table for @var{page} are preserved, permitting the accessed and dirty bits (see the next section) to be checked. This function has no effect if @var{page} is not mapped. @end deftypefun @node Page Table Accessed and Dirty Bits @subsection Accessed and Dirty Bits 80@var{x}86 hardware provides some assistance for implementing page replacement algorithms, through a pair of bits in the page table entry (PTE) for each page. On any read or write to a page, the CPU sets the @dfn{accessed bit} to 1 in the page's PTE, and on any write, the CPU sets the @dfn{dirty bit} to 1. The CPU never resets these bits to 0, but the OS may do so. Proper interpretation of these bits requires understanding of @dfn{aliases}, that is, two (or more) pages that refer to the same frame. When an aliased frame is accessed, the accessed and dirty bits are updated in only one page table entry (the one for the page used for access). The accessed and dirty bits for the other aliases are not updated. @xref{Accessed and Dirty Bits}, on applying these bits in implementing page replacement algorithms. @deftypefun bool pagedir_is_dirty (uint32_t *@var{pd}, const void *@var{page}) @deftypefunx bool pagedir_is_accessed (uint32_t *@var{pd}, const void *@var{page}) Returns true if page directory @var{pd} contains a page table entry for @var{page} that is marked dirty (or accessed). Otherwise, returns false. @end deftypefun @deftypefun void pagedir_set_dirty (uint32_t *@var{pd}, const void *@var{page}, bool @var{value}) @deftypefunx void pagedir_set_accessed (uint32_t *@var{pd}, const void *@var{page}, bool @var{value}) If page directory @var{pd} has a page table entry for @var{page}, then its dirty (or accessed) bit is set to @var{value}. @end deftypefun @node Page Table Details @subsection Page Table Details The functions provided with Pintos are sufficient to implement the tasks. However, you may still find it worthwhile to understand the hardware page table format, so we'll go into a little detail in this section. @menu * Page Table Structure:: * Page Table Entry Format:: * Page Directory Entry Format:: @end menu @node Page Table Structure @subsubsection Structure The top-level paging data structure is a page called the ``page directory'' (PD) arranged as an array of 1,024 32-bit page directory entries (PDEs), each of which represents 4 MB of virtual memory. Each PDE may point to the physical address of another page called a ``page table'' (PT) arranged, similarly, as an array of 1,024 32-bit page table entries (PTEs), each of which translates a single 4 kB virtual page to a physical page. Translation of a virtual address into a physical address follows the three-step process illustrated in the diagram below:@footnote{Actually, virtual to physical translation on the 80@var{x}86 architecture occurs via an intermediate ``linear address,'' but Pintos (and most modern 80@var{x}86 OSes) set up the CPU so that linear and virtual addresses are one and the same. Thus, you can effectively ignore this CPU feature.} @enumerate 1 @item The most-significant 10 bits of the virtual address (bits 22@dots{}31) index the page directory. If the PDE is marked ``present,'' the physical address of a page table is read from the PDE thus obtained. If the PDE is marked ``not present'' then a page fault occurs. @item The next 10 bits of the virtual address (bits 12@dots{}21) index the page table. If the PTE is marked ``present,'' the physical address of a data page is read from the PTE thus obtained. If the PTE is marked ``not present'' then a page fault occurs. @item The least-significant 12 bits of the virtual address (bits 0@dots{}11) are added to the data page's physical base address, yielding the final physical address. @end enumerate @example @group 31 22 21 12 11 0 +----------------------+----------------------+----------------------+ | Page Directory Index | Page Table Index | Page Offset | +----------------------+----------------------+----------------------+ | | | _______/ _______/ _____/ / / / / Page Directory / Page Table / Data Page / .____________. / .____________. / .____________. |1,023|____________| |1,023|____________| | |____________| |1,022|____________| |1,022|____________| | |____________| |1,021|____________| |1,021|____________| \__\|____________| |1,020|____________| |1,020|____________| /|____________| | | | | | | | | | | | \____\| |_ | | | | . | /| . | \ | . | \____\| . |_ | . | | | . | /| . | \ | . | | | . | | . | | | . | | | . | | | | | | | | | |____________| | |____________| | |____________| 4|____________| | 4|____________| | |____________| 3|____________| | 3|____________| | |____________| 2|____________| | 2|____________| | |____________| 1|____________| | 1|____________| | |____________| 0|____________| \__\0|____________| \____\|____________| / / @end group @end example Pintos provides some macros and functions that are useful for working with raw page tables: @defmac PTSHIFT @defmacx PTBITS The starting bit index (12) and number of bits (10), respectively, in a page table index. @end defmac @defmac PTMASK A bit mask with the bits in the page table index set to 1 and the rest set to 0 (@t{0x3ff000}). @end defmac @defmac PTSPAN The number of bytes of virtual address space that a single page table page covers (4,194,304 bytes, or 4 MB). @end defmac @defmac PDSHIFT @defmacx PDBITS The starting bit index (22) and number of bits (10), respectively, in a page directory index. @end defmac @defmac PDMASK A bit mask with the bits in the page directory index set to 1 and other bits set to 0 (@t{0xffc00000}). @end defmac @deftypefun uintptr_t pd_no (const void *@var{va}) @deftypefunx uintptr_t pt_no (const void *@var{va}) Returns the page directory index or page table index, respectively, for virtual address @var{va}. These functions are defined in @file{threads/pte.h}. @end deftypefun @deftypefun unsigned pg_ofs (const void *@var{va}) Returns the page offset for virtual address @var{va}. This function is defined in @file{threads/vaddr.h}. @end deftypefun @node Page Table Entry Format @subsubsection Page Table Entry Format You do not need to understand the PTE format to do the Pintos tasks, unless you wish to incorporate the page table into your supplemental page table (@pxref{Managing the Supplemental Page Table}). The actual format of a page table entry is summarized below. For complete information, refer to section 3.7, ``Page Translation Using 32-Bit Physical Addressing,'' in @bibref{IA32-v3a}. @example @group 31 12 11 9 6 5 2 1 0 +---------------------------------------+----+----+-+-+---+-+-+-+ | Physical Address | AVL| |D|A| |U|W|P| +---------------------------------------+----+----+-+-+---+-+-+-+ @end group @end example Some more information on each bit is given below. The names are @file{threads/pte.h} macros that represent the bits' values: @defmac PTE_P Bit 0, the ``present'' bit. When this bit is 1, the other bits are interpreted as described below. When this bit is 0, any attempt to access the page will page fault. The remaining bits are then not used by the CPU and may be used by the OS for any purpose. @end defmac @defmac PTE_W Bit 1, the ``read/write'' bit. When it is 1, the page is writable. When it is 0, write attempts will page fault. @end defmac @defmac PTE_U Bit 2, the ``user/supervisor'' bit. When it is 1, user processes may access the page. When it is 0, only the kernel may access the page (user accesses will page fault). Pintos clears this bit in PTEs for kernel virtual memory, to prevent user processes from accessing them. @end defmac @defmac PTE_A Bit 5, the ``accessed'' bit. @xref{Page Table Accessed and Dirty Bits}. @end defmac @defmac PTE_D Bit 6, the ``dirty'' bit. @xref{Page Table Accessed and Dirty Bits}. @end defmac @defmac PTE_AVL Bits 9@dots{}11, available for operating system use. Pintos, as provided, does not use them and sets them to 0. @end defmac @defmac PTE_ADDR Bits 12@dots{}31, the top 20 bits of the physical address of a frame. The low 12 bits of the frame's address are always 0. @end defmac The other bits are either reserved or uninteresting in a Pintos context and should be set to@tie{}0. Header @file{threads/pte.h} defines three functions for working with page table entries: @deftypefun uint32_t pte_create_kernel (uint32_t *@var{page}, bool @var{writable}) Returns a page table entry that points to @var{page}, which should be a kernel virtual address. The PTE's present bit will be set. It will be marked for kernel-only access. If @var{writable} is true, the PTE will also be marked read/write; otherwise, it will be read-only. @end deftypefun @deftypefun uint32_t pte_create_user (uint32_t *@var{page}, bool @var{writable}) Returns a page table entry that points to @var{page}, which should be the kernel virtual address of a frame in the user pool (@pxref{Why PAL_USER?}). The PTE's present bit will be set and it will be marked to allow user-mode access. If @var{writable} is true, the PTE will also be marked read/write; otherwise, it will be read-only. @end deftypefun @deftypefun {void *} pte_get_page (uint32_t @var{pte}) Returns the kernel virtual address for the frame that @var{pte} points to. The @var{pte} may be present or not-present; if it is not-present then the pointer returned is only meaningful if the address bits in the PTE actually represent a physical address. @end deftypefun @node Page Directory Entry Format @subsubsection Page Directory Entry Format Page directory entries have the same format as PTEs, except that the physical address points to a page table page instead of a frame. Header @file{threads/pte.h} defines two functions for working with page directory entries: @deftypefun uint32_t pde_create (uint32_t *@var{pt}) Returns a page directory that points to @var{page}, which should be the kernel virtual address of a page table page. The PDE's present bit will be set, it will be marked to allow user-mode access, and it will be marked read/write. @end deftypefun @deftypefun {uint32_t *} pde_get_pt (uint32_t @var{pde}) Returns the kernel virtual address for the page table page that @var{pde}, which must be marked present, points to. @end deftypefun @node Hash Table @section Hash Table Pintos provides a hash table data structure in @file{lib/kernel/hash.c}. To use it you will need to include its header file, @file{lib/kernel/hash.h}, with @code{#include <hash.h>}. No code provided with Pintos uses the hash table, which means that you are free to use it as is, modify its implementation for your own purposes, or ignore it, as you wish. Most implementations of the virtual memory task use a hash table to translate pages to frames. You may find other uses for hash tables as well. @menu * Hash Data Types:: * Basic Hash Functions:: * Hash Search Functions:: * Hash Iteration Functions:: * Hash Table Example:: * Hash Auxiliary Data:: * Hash Synchronization:: @end menu @node Hash Data Types @subsection Data Types A hash table is represented by @struct{hash}. @deftp {Type} {struct hash} Represents an entire hash table. The actual members of @struct{hash} are ``opaque.'' That is, code that uses a hash table should not access @struct{hash} members directly, nor should it need to. Instead, use hash table functions and macros. @end deftp The hash table operates on elements of type @struct{hash_elem}. @deftp {Type} {struct hash_elem} Embed a @struct{hash_elem} member in the structure you want to include in a hash table. Like @struct{hash}, @struct{hash_elem} is opaque. All functions for operating on hash table elements actually take and return pointers to @struct{hash_elem}, not pointers to your hash table's real element type. @end deftp You will often need to obtain a @struct{hash_elem} given a real element of the hash table, and vice versa. Given a real element of the hash table, you may use the @samp{&} operator to obtain a pointer to its @struct{hash_elem}. Use the @code{hash_entry()} macro to go the other direction. @deftypefn {Macro} {@var{type} *} hash_entry (struct hash_elem *@var{elem}, @var{type}, @var{member}) Returns a pointer to the structure that @var{elem}, a pointer to a @struct{hash_elem}, is embedded within. You must provide @var{type}, the name of the structure that @var{elem} is inside, and @var{member}, the name of the member in @var{type} that @var{elem} points to. For example, suppose @code{h} is a @code{struct hash_elem *} variable that points to a @struct{thread} member (of type @struct{hash_elem}) named @code{h_elem}. Then, @code{hash_entry@tie{}(h, struct thread, h_elem)} yields the address of the @struct{thread} that @code{h} points within. @end deftypefn @xref{Hash Table Example}, for an example. Each hash table element must contain a key, that is, data that identifies and distinguishes elements, which must be unique among elements in the hash table. (Elements may also contain non-key data that need not be unique.) While an element is in a hash table, its key data must not be changed. Instead, if need be, remove the element from the hash table, modify its key, then reinsert the element. For each hash table, you must write two functions that act on keys: a hash function and a comparison function. These functions must match the following prototypes: @deftp {Type} {unsigned hash_hash_func (const struct hash_elem *@var{element}, void *@var{aux})} Returns a hash of @var{element}'s data, as a value anywhere in the range of @code{unsigned int}. The hash of an element should be a pseudo-random function of the element's key. It must not depend on non-key data in the element or on any non-constant data other than the key. Pintos provides the following functions as a suitable basis for hash functions. @deftypefun unsigned hash_bytes (const void *@var{buf}, size_t *@var{size}) Returns a hash of the @var{size} bytes starting at @var{buf}. The implementation is the general-purpose @uref{http://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash, Fowler-Noll-Vo hash} for 32-bit words. @end deftypefun @deftypefun unsigned hash_string (const char *@var{s}) Returns a hash of null-terminated string @var{s}. @end deftypefun @deftypefun unsigned hash_int (int @var{i}) Returns a hash of integer @var{i}. @end deftypefun If your key is a single piece of data of an appropriate type, it is sensible for your hash function to directly return the output of one of these functions. For multiple pieces of data, you may wish to combine the output of more than one call to them using, e.g., the @samp{^} (exclusive or) operator. Finally, you may entirely ignore these functions and write your own hash function from scratch, but remember that your goal is to build an operating system kernel, not to design a hash function. @xref{Hash Auxiliary Data}, for an explanation of @var{aux}. @end deftp @deftp {Type} {bool hash_less_func (const struct hash_elem *@var{a}, const struct hash_elem *@var{b}, void *@var{aux})} Compares the keys stored in elements @var{a} and @var{b}. Returns true if @var{a} is less than @var{b}, false if @var{a} is greater than or equal to @var{b}. If two elements compare equal, then they must hash to equal values. @xref{Hash Auxiliary Data}, for an explanation of @var{aux}. @end deftp @xref{Hash Table Example}, for hash and comparison function examples. A few functions accept a pointer to a third kind of function as an argument: @deftp {Type} {void hash_action_func (struct hash_elem *@var{element}, void *@var{aux})} Performs some kind of action, chosen by the caller, on @var{element}. @xref{Hash Auxiliary Data}, for an explanation of @var{aux}. @end deftp @node Basic Hash Functions @subsection Basic Functions These functions create, destroy, and inspect hash tables. @deftypefun bool hash_init (struct hash *@var{hash}, hash_hash_func *@var{hash_func}, hash_less_func *@var{less_func}, void *@var{aux}) Initializes @var{hash} as a hash table with @var{hash_func} as hash function, @var{less_func} as comparison function, and @var{aux} as auxiliary data. Returns true if successful, false on failure. @func{hash_init} calls @func{malloc} and fails if memory cannot be allocated. @xref{Hash Auxiliary Data}, for an explanation of @var{aux}, which is most often a null pointer. @end deftypefun @deftypefun void hash_clear (struct hash *@var{hash}, hash_action_func *@var{action}) Removes all the elements from @var{hash}, which must have been previously initialized with @func{hash_init}. If @var{action} is non-null, then it is called once for each element in the hash table, which gives the caller an opportunity to deallocate any memory or other resources used by the element. For example, if the hash table elements are dynamically allocated using @func{malloc}, then @var{action} could @func{free} the element. This is safe because @func{hash_clear} will not access the memory in a given hash element after calling @var{action} on it. However, @var{action} must not call any function that may modify the hash table, such as @func{hash_insert} or @func{hash_delete}. @end deftypefun @deftypefun void hash_destroy (struct hash *@var{hash}, hash_action_func *@var{action}) If @var{action} is non-null, calls it for each element in the hash, with the same semantics as a call to @func{hash_clear}. Then, frees the memory held by @var{hash}. Afterward, @var{hash} must not be passed to any hash table function, absent an intervening call to @func{hash_init}. @end deftypefun @deftypefun size_t hash_size (struct hash *@var{hash}) Returns the number of elements currently stored in @var{hash}. @end deftypefun @deftypefun bool hash_empty (struct hash *@var{hash}) Returns true if @var{hash} currently contains no elements, false if @var{hash} contains at least one element. @end deftypefun @node Hash Search Functions @subsection Search Functions Each of these functions searches a hash table for an element that compares equal to one provided. Based on the success of the search, they perform some action, such as inserting a new element into the hash table, or simply return the result of the search. @deftypefun {struct hash_elem *} hash_insert (struct hash *@var{hash}, struct hash_elem *@var{element}) Searches @var{hash} for an element equal to @var{element}. If none is found, inserts @var{element} into @var{hash} and returns a null pointer. If the table already contains an element equal to @var{element}, it is returned without modifying @var{hash}. @end deftypefun @deftypefun {struct hash_elem *} hash_replace (struct hash *@var{hash}, struct hash_elem *@var{element}) Inserts @var{element} into @var{hash}. Any element equal to @var{element} already in @var{hash} is removed. Returns the element removed, or a null pointer if @var{hash} did not contain an element equal to @var{element}. The caller is responsible for deallocating any resources associated with the returned element, as appropriate. For example, if the hash table elements are dynamically allocated using @func{malloc}, then the caller must @func{free} the element after it is no longer needed. @end deftypefun The element passed to the following functions is only used for hashing and comparison purposes. It is never actually inserted into the hash table. Thus, only key data in the element needs to be initialized, and other data in the element will not be used. It often makes sense to declare an instance of the element type as a local variable, initialize the key data, and then pass the address of its @struct{hash_elem} to @func{hash_find} or @func{hash_delete}. @xref{Hash Table Example}, for an example. (Large structures should not be allocated as local variables. @xref{struct thread}, for more information.) @deftypefun {struct hash_elem *} hash_find (struct hash *@var{hash}, struct hash_elem *@var{element}) Searches @var{hash} for an element equal to @var{element}. Returns the element found, if any, or a null pointer otherwise. @end deftypefun @deftypefun {struct hash_elem *} hash_delete (struct hash *@var{hash}, struct hash_elem *@var{element}) Searches @var{hash} for an element equal to @var{element}. If one is found, it is removed from @var{hash} and returned. Otherwise, a null pointer is returned and @var{hash} is unchanged. The caller is responsible for deallocating any resources associated with the returned element, as appropriate. For example, if the hash table elements are dynamically allocated using @func{malloc}, then the caller must @func{free} the element after it is no longer needed. @end deftypefun @node Hash Iteration Functions @subsection Iteration Functions These functions allow iterating through the elements in a hash table. Two interfaces are supplied. The first requires writing and supplying a @var{hash_action_func} to act on each element (@pxref{Hash Data Types}). @deftypefun void hash_apply (struct hash *@var{hash}, hash_action_func *@var{action}) Calls @var{action} once for each element in @var{hash}, in arbitrary order. @var{action} must not call any function that may modify the hash table, such as @func{hash_insert} or @func{hash_delete}. @var{action} must not modify key data in elements, although it may modify any other data. @end deftypefun The second interface is based on an ``iterator'' data type. Idiomatically, iterators are used as follows: @example struct hash_iterator i; hash_first (&i, h); while (hash_next (&i)) @{ struct foo *f = hash_entry (hash_cur (&i), struct foo, elem); @r{@dots{}do something with @i{f}@dots{}} @} @end example @deftp {Type} {struct hash_iterator} Represents a position within a hash table. Calling any function that may modify a hash table, such as @func{hash_insert} or @func{hash_delete}, invalidates all iterators within that hash table. Like @struct{hash} and @struct{hash_elem}, @struct{hash_elem} is opaque. @end deftp @deftypefun void hash_first (struct hash_iterator *@var{iterator}, struct hash *@var{hash}) Initializes @var{iterator} to just before the first element in @var{hash}. @end deftypefun @deftypefun {struct hash_elem *} hash_next (struct hash_iterator *@var{iterator}) Advances @var{iterator} to the next element in @var{hash}, and returns that element. Returns a null pointer if no elements remain. After @func{hash_next} returns null for @var{iterator}, calling it again yields undefined behavior. @end deftypefun @deftypefun {struct hash_elem *} hash_cur (struct hash_iterator *@var{iterator}) Returns the value most recently returned by @func{hash_next} for @var{iterator}. Yields undefined behavior after @func{hash_first} has been called on @var{iterator} but before @func{hash_next} has been called for the first time. @end deftypefun @node Hash Table Example @subsection Hash Table Example Suppose you have a structure, called @struct{page}, that you want to put into a hash table. First, define @struct{page} to include a @struct{hash_elem} member: @example struct page @{ struct hash_elem hash_elem; /* @r{Hash table element.} */ void *addr; /* @r{Virtual address.} */ /* @r{@dots{}other members@dots{}} */ @}; @end example We write a hash function and a comparison function using @var{addr} as the key. A pointer can be hashed based on its bytes, and the @samp{<} operator works fine for comparing pointers: @example /* @r{Returns a hash value for page @var{p}.} */ unsigned page_hash (const struct hash_elem *p_, void *aux UNUSED) @{ const struct page *p = hash_entry (p_, struct page, hash_elem); return hash_bytes (&p->addr, sizeof p->addr); @} /* @r{Returns true if page @var{a} precedes page @var{b}.} */ bool page_less (const struct hash_elem *a_, const struct hash_elem *b_, void *aux UNUSED) @{ const struct page *a = hash_entry (a_, struct page, hash_elem); const struct page *b = hash_entry (b_, struct page, hash_elem); return a->addr < b->addr; @} @end example @noindent (The use of @code{UNUSED} in these functions' prototypes suppresses a warning that @var{aux} is unused. @xref{Function and Parameter Attributes}, for information about @code{UNUSED}. @xref{Hash Auxiliary Data}, for an explanation of @var{aux}.) Then, we can create a hash table like this: @example struct hash pages; hash_init (&pages, page_hash, page_less, NULL); @end example Now we can manipulate the hash table we've created. If @code{@var{p}} is a pointer to a @struct{page}, we can insert it into the hash table with: @example hash_insert (&pages, &p->hash_elem); @end example @noindent If there's a chance that @var{pages} might already contain a page with the same @var{addr}, then we should check @func{hash_insert}'s return value. To search for an element in the hash table, use @func{hash_find}. This takes a little setup, because @func{hash_find} takes an element to compare against. Here's a function that will find and return a page based on a virtual address, assuming that @var{pages} is defined at file scope: @example /* @r{Returns the page containing the given virtual @var{address},} @r{or a null pointer if no such page exists.} */ struct page * page_lookup (const void *address) @{ struct page p; struct hash_elem *e; p.addr = address; e = hash_find (&pages, &p.hash_elem); return e != NULL ? hash_entry (e, struct page, hash_elem) : NULL; @} @end example @noindent @struct{page} is allocated as a local variable here on the assumption that it is fairly small. Large structures should not be allocated as local variables. @xref{struct thread}, for more information. A similar function could delete a page by address using @func{hash_delete}. @node Hash Auxiliary Data @subsection Auxiliary Data In simple cases like the example above, there's no need for the @var{aux} parameters. In these cases, just pass a null pointer to @func{hash_init} for @var{aux} and ignore the values passed to the hash function and comparison functions. (You'll get a compiler warning if you don't use the @var{aux} parameter, but you can turn that off with the @code{UNUSED} macro, as shown in the example, or you can just ignore it.) @var{aux} is useful when you have some property of the data in the hash table is both constant and needed for hashing or comparison, but not stored in the data items themselves. For example, if the items in a hash table are fixed-length strings, but the items themselves don't indicate what that fixed length is, you could pass the length as an @var{aux} parameter. @node Hash Synchronization @subsection Synchronization The hash table does not do any internal synchronization. It is the caller's responsibility to synchronize calls to hash table functions. In general, any number of functions that examine but do not modify the hash table, such as @func{hash_find} or @func{hash_next}, may execute simultaneously. However, these function cannot safely execute at the same time as any function that may modify a given hash table, such as @func{hash_insert} or @func{hash_delete}, nor may more than one function that can modify a given hash table execute safely at once. It is also the caller's responsibility to synchronize access to data in hash table elements. How to synchronize access to this data depends on how it is designed and organized, as with any other data structure.