@node Reference Guide
@appendix Reference Guide

This chapter is a reference for the Pintos code.  The reference guide
does not cover all of the code in Pintos, but it does cover those
pieces that students most often find troublesome.  You may find that
you want to read each part of the reference guide as you work on the
task where it becomes important.

We recommend using ``tags'' to follow along with references to function
and variable names (@pxref{Tags}).

@menu
* Pintos Loading::              
* Threads::                     
* Synchronization::             
* Interrupt Handling::          
* Memory Allocation::           
* Virtual Addresses::           
* Page Table::                  
* Hash Table::                  
@end menu

@node Pintos Loading
@section Loading

This section covers the Pintos loader and basic kernel
initialization.

@menu
* Pintos Loader::               
* Low-Level Kernel Initialization::
* High-Level Kernel Initialization::
* Physical Memory Map::
@end menu

@node Pintos Loader
@subsection The Loader

The first part of Pintos that runs is the loader, in
@file{threads/loader.S}.  The PC BIOS loads the loader into memory.
The loader, in turn, is responsible for finding the kernel on disk,
loading it into memory, and then jumping to its start.  It's
not important to understand exactly how the loader works, but if
you're interested, read on.  You should probably read along with the
loader's source.  You should also understand the basics of the
80@var{x}86 architecture as described by chapter 3, ``Basic Execution
Environment,'' of @bibref{IA32-v1}.

The PC BIOS loads the loader from the first sector of the first hard
disk, called the @dfn{master boot record} (MBR).  PC conventions
reserve 64 bytes of the MBR for the partition table, and Pintos uses
about 128 additional bytes for kernel command-line arguments.  This
leaves a little over 300 bytes for the loader's own code.  This is a
severe restriction that means, practically speaking, the loader must
be written in assembly language.

The Pintos loader and kernel don't have to be on the same disk, nor 
is the kernel required to be in any particular location on a
given disk.  The loader's first job, then, is to find the kernel by
reading the partition table on each hard disk, looking for a bootable
partition of the type used for a Pintos kernel.

When the loader finds a bootable kernel partition, it reads the
partition's contents into memory at physical address @w{128 kB}.  The
kernel is at the beginning of the partition, which might be larger
than necessary due to partition boundary alignment conventions, so the
loader reads no more than @w{512 kB} (and the Pintos build process
will refuse to produce kernels larger than that).  Reading more data
than this would cross into the region from @w{640 kB} to @w{1 MB} that
the PC architecture reserves for hardware and the BIOS, and a standard
PC BIOS does not provide any means to load the kernel above @w{1 MB}.

The loader's final job is to extract the entry point from the loaded
kernel image and transfer control to it.  The entry point is not at a
predictable location, but the kernel's ELF header contains a pointer
to it.  The loader extracts the pointer and jumps to the location it
points to.

The Pintos kernel command line
is stored in the boot loader.  The @command{pintos} program actually
modifies a copy of the boot loader on disk each time it runs the kernel,
inserting whatever command-line arguments the user supplies to the kernel,
and then the kernel at boot time reads those arguments out of the boot
loader in memory.  This is not an elegant solution, but it is simple
and effective.

@node Low-Level Kernel Initialization
@subsection Low-Level Kernel Initialization

The loader's last action is to transfer control to the kernel's entry
point, which is @func{start} in @file{threads/start.S}.  The job of
this code is to switch the CPU from legacy 16-bit ``real mode'' into
the 32-bit ``protected mode'' used by all modern 80@var{x}86 operating
systems.

The startup code's first task is actually to obtain the machine's
memory size, by asking the BIOS for the PC's memory size.  The
simplest BIOS function to do this can only detect up to 64 MB of RAM,
so that's the practical limit that Pintos can support.  The function
stores the memory size, in pages, in global variable
@code{init_ram_pages}.

The first part of CPU initialization is to enable the A20 line, that
is, the CPU's address line numbered 20.  For historical reasons, PCs
boot with this address line fixed at 0, which means that attempts to
access memory beyond the first 1 MB (2 raised to the 20th power) will
fail.  Pintos wants to access more memory than this, so we have to
enable it.

Next, the loader creates a basic page table.  This page table maps
the 64 MB at the base of virtual memory (starting at virtual address
0) directly to the identical physical addresses.  It also maps the
same physical memory starting at virtual address
@code{LOADER_PHYS_BASE}, which defaults to @t{0xc0000000} (3 GB).  The
Pintos kernel only wants the latter mapping, but there's a
chicken-and-egg problem if we don't include the former: our current
virtual address is roughly @t{0x20000}, the location where the loader
put us, and we can't jump to @t{0xc0020000} until we turn on the
page table, but if we turn on the page table without jumping there,
then we've just pulled the rug out from under ourselves.

After the page table is initialized, we load the CPU's control
registers to turn on protected mode and paging, and set up the segment
registers.  We aren't yet equipped to handle interrupts in protected
mode, so we disable interrupts.  The final step is to call @func{main}.

@node High-Level Kernel Initialization
@subsection High-Level Kernel Initialization

The kernel proper starts with the @func{main} function.  The
@func{main} function is written in C, as will be most of the code we
encounter in Pintos from here on out.

When @func{main} starts, the system is in a pretty raw state.  We're
in 32-bit protected mode with paging enabled, but hardly anything else is
ready.  Thus, the @func{main} function consists primarily of calls
into other Pintos modules' initialization functions.
These are usually named @func{@var{module}_init}, where
@var{module} is the module's name, @file{@var{module}.c} is the
module's source code, and @file{@var{module}.h} is the module's
header.

The first step in @func{main} is to call @func{bss_init}, which clears
out the kernel's ``BSS'', which is the traditional name for a
segment that should be initialized to all zeros.  In most C
implementations, whenever you
declare a variable outside a function without providing an
initializer, that variable goes into the BSS.  Because it's all zeros, the
BSS isn't stored in the image that the loader brought into memory.  We
just use @func{memset} to zero it out.

Next, @func{main} calls @func{read_command_line} to break the kernel command
line into arguments, then @func{parse_options} to read any options at
the beginning of the command line.  (Actions specified on the
command line execute later.)

@func{thread_init} initializes the thread system.  We will defer full
discussion to our discussion of Pintos threads below.  It is called so
early in initialization because a valid thread structure is a
prerequisite for acquiring a lock, and lock acquisition in turn is
important to other Pintos subsystems.  Then we initialize the console
and print a startup message to the console.

The next block of functions we call initializes the kernel's memory
system.  @func{palloc_init} sets up the kernel page allocator, which
doles out memory one or more pages at a time (@pxref{Page Allocator}).
@func{malloc_init} sets
up the allocator that handles allocations of arbitrary-size blocks of
memory (@pxref{Block Allocator}).
@func{paging_init} sets up a page table for the kernel (@pxref{Page
Table}).

In tasks 2 and later, @func{main} also calls @func{tss_init} and
@func{gdt_init}.

The next set of calls initializes the interrupt system.
@func{intr_init} sets up the CPU's @dfn{interrupt descriptor table}
(IDT) to ready it for interrupt handling (@pxref{Interrupt
Infrastructure}), then @func{timer_init} and @func{kbd_init} prepare for
handling timer interrupts and keyboard interrupts, respectively. 
@func{input_init} sets up to merge serial and keyboard input into one
stream.  In
tasks 2 and later, we also prepare to handle interrupts caused by
user programs using @func{exception_init} and @func{syscall_init}.

Now that interrupts are set up, we can start the scheduler
with @func{thread_start}, which creates the idle thread and enables
interrupts.
With interrupts enabled, interrupt-driven serial port I/O becomes
possible, so we use
@func{serial_init_queue} to switch to that mode.  Finally,
@func{timer_calibrate} calibrates the timer for accurate short delays.

If the file system is compiled in, as it will starting in task 2, we
initialize the IDE disks with @func{ide_init}, then the
file system with @func{filesys_init}.

The Pintos boot is then complete, so we print a message.

Function @func{run_actions} now parses and executes actions specified on
the kernel command line, such as @command{run} to run a test (in task
1) or a user program (in later tasks).

Finally, if @option{-q} was specified on the kernel command line, we
call @func{shutdown_power_off} to terminate the machine simulator.  Otherwise,
@func{main} calls @func{thread_exit}, which allows any other running
threads to continue running.

@node Physical Memory Map
@subsection Physical Memory Map

@multitable {@t{00000000}--@t{00000000}} {Hardware} {Some much longer explanatory text}
@headitem Memory Range
@tab Owner
@tab Contents

@item @t{00000000}--@t{000003ff} @tab CPU @tab Real mode interrupt table.
@item @t{00000400}--@t{000005ff} @tab BIOS @tab Miscellaneous data area.
@item @t{00000600}--@t{00007bff} @tab --- @tab ---
@item @t{00007c00}--@t{00007dff} @tab Pintos @tab Loader.
@item @t{0000e000}--@t{0000efff} @tab Pintos 
@tab Stack for loader; kernel stack and @struct{thread} for initial
kernel thread.
@item @t{0000f000}--@t{0000ffff} @tab Pintos
@tab Page directory for startup code.
@item @t{00010000}--@t{00020000} @tab Pintos
@tab Page tables for startup code.
@item @t{00020000}--@t{0009ffff} @tab Pintos
@tab Kernel code, data, and uninitialized data segments.
@item @t{000a0000}--@t{000bffff} @tab Video @tab VGA display memory.
@item @t{000c0000}--@t{000effff} @tab Hardware 
@tab Reserved for expansion card RAM and ROM.
@item @t{000f0000}--@t{000fffff} @tab BIOS @tab ROM BIOS.
@item @t{00100000}--@t{03ffffff} @tab Pintos @tab Dynamic memory allocation.
@end multitable

@node Threads
@section Threads

@menu
* struct thread::               
* Thread Functions::            
* Thread Switching::            
@end menu

@node struct thread
@subsection @code{struct thread}

The main Pintos data structure for threads is @struct{thread},
declared in @file{threads/thread.h}.

@deftp {Structure} {struct thread}
Represents a thread or a user process.  In the tasks, you will have
to add your own members to @struct{thread}.  You may also change or
delete the definitions of existing members.

Every @struct{thread} occupies the beginning of its own page of
memory.  The rest of the page is used for the thread's stack, which
grows downward from the end of the page.  It looks like this:

@example
@group
                  4 kB +---------------------------------+
                       |          kernel stack           |
                       |                |                |
                       |                |                |
                       |                V                |
                       |         grows downward          |
                       |                                 |
                       |                                 |
                       |                                 |
                       |                                 |
                       |                                 |
                       |                                 |
                       |                                 |
                       |                                 |
sizeof (struct thread) +---------------------------------+
                       |              magic              |
                       |                :                |
                       |                :                |
                       |              status             |
                       |               tid               |
                  0 kB +---------------------------------+
@end group
@end example

This has two consequences.  First, @struct{thread} must not be allowed
to grow too big.  If it does, then there will not be enough room for the
kernel stack.  The base @struct{thread} is only a few bytes in size.  It
probably should stay well under 1 kB.

Second, kernel stacks must not be allowed to grow too large.  If a stack
overflows, it will corrupt the thread state.  Thus, kernel functions
should not allocate large structures or arrays as non-static local
variables.  Use dynamic allocation with @func{malloc} or
@func{palloc_get_page} instead (@pxref{Memory Allocation}).
@end deftp

@deftypecv {Member} {@struct{thread}} {tid_t} tid
The thread's thread identifier or @dfn{tid}.  Every thread must have a
tid that is unique over the entire lifetime of the kernel.  By
default, @code{tid_t} is a @code{typedef} for @code{int} and each new
thread receives the numerically next higher tid, starting from 1 for
the initial process.  You can change the type and the numbering scheme
if you like.
@end deftypecv

@deftypecv {Member} {@struct{thread}} {enum thread_status} status
@anchor{Thread States}
The thread's state, one of the following:

@defvr {Thread State} @code{THREAD_RUNNING}
The thread is running.  Exactly one thread is running at a given time.
@func{thread_current} returns the running thread.
@end defvr

@defvr {Thread State} @code{THREAD_READY}
The thread is ready to run, but it's not running right now.  The
thread could be selected to run the next time the scheduler is
invoked.  Ready threads are kept in a doubly linked list called
@code{ready_list}.
@end defvr

@defvr {Thread State} @code{THREAD_BLOCKED}
The thread is waiting for something, e.g.@: a lock to become
available, an interrupt to be invoked.  The thread won't be scheduled
again until it transitions to the @code{THREAD_READY} state with a
call to @func{thread_unblock}.  This is most conveniently done
indirectly, using one of the Pintos synchronization primitives that
block and unblock threads automatically (@pxref{Synchronization}).

There is no @i{a priori} way to tell what a blocked thread is waiting
for, but a backtrace can help (@pxref{Backtraces}).
@end defvr

@defvr {Thread State} @code{THREAD_DYING}
The thread will be destroyed by the scheduler after switching to the
next thread.
@end defvr
@end deftypecv

@deftypecv {Member} {@struct{thread}} {char} name[16]
The thread's name as a string, or at least the first few characters of
it.
@end deftypecv

@deftypecv {Member} {@struct{thread}} {uint8_t *} stack
Every thread has its own stack to keep track of its state.  When the
thread is running, the CPU's stack pointer register tracks the top of
the stack and this member is unused.  But when the CPU switches to
another thread, this member saves the thread's stack pointer.  No
other members are needed to save the thread's registers, because the
other registers that must be saved are saved on the stack.

When an interrupt occurs, whether in the kernel or a user program, an
@struct{intr_frame} is pushed onto the stack.  When the interrupt occurs
in a user program, the @struct{intr_frame} is always at the very top of
the page.  @xref{Interrupt Handling}, for more information.
@end deftypecv

@deftypecv {Member} {@struct{thread}} {int} priority
A thread priority, ranging from @code{PRI_MIN} (0) to @code{PRI_MAX}
(63).  Lower numbers correspond to lower priorities, so that
priority 0 is the lowest priority and priority 63 is the highest.
Pintos, as initially provided, ignores thread priorities, but you will implement
priority scheduling in task 1 (@pxref{Priority Scheduling}).
@end deftypecv

@deftypecv {Member} {@struct{thread}} {@struct{list_elem}} allelem
This ``list element'' is used to link the thread into the list of all
threads.  Each thread is inserted into this list when it is created
and removed when it exits.  The @func{thread_foreach} function should 
be used to iterate over all threads.
@end deftypecv

@deftypecv {Member} {@struct{thread}} {@struct{list_elem}} elem
A ``list element'' used to put the thread into doubly linked lists,
either @code{ready_list} (the list of threads ready to run) or a list of
threads waiting on a semaphore in @func{sema_down}.  It can do double
duty because a thread waiting on a semaphore is not ready, and vice
versa.
@end deftypecv

@deftypecv {Member} {@struct{thread}} {uint32_t *} pagedir
Only present in task 2 and later.  @xref{Page Tables}.
@end deftypecv

@deftypecv {Member} {@struct{thread}} {unsigned} magic
Always set to @code{THREAD_MAGIC}, which is just an arbitrary number defined
in @file{threads/thread.c}, and used to detect stack overflow.
@func{thread_current} checks that the @code{magic} member of the running
thread's @struct{thread} is set to @code{THREAD_MAGIC}.  Stack overflow
tends to change this value, triggering the assertion.  For greatest
benefit, as you add members to @struct{thread}, leave @code{magic} at
the end.
@end deftypecv

@node Thread Functions
@subsection Thread Functions

@file{threads/thread.c} implements several public functions for thread
support.  Let's take a look at the most useful:

@deftypefun void thread_init (void)
Called by @func{main} to initialize the thread system.  Its main
purpose is to create a @struct{thread} for Pintos's initial thread.
This is possible because the Pintos loader puts the initial
thread's stack at the top of a page, in the same position as any other
Pintos thread.

Before @func{thread_init} runs,
@func{thread_current} will fail because the running thread's
@code{magic} value is incorrect.  Lots of functions call
@func{thread_current} directly or indirectly, including
@func{lock_acquire} for locking a lock, so @func{thread_init} is
called early in Pintos initialization.
@end deftypefun

@deftypefun void thread_start (void)
Called by @func{main} to start the scheduler.  Creates the idle
thread, that is, the thread that is scheduled when no other thread is
ready.  Then enables interrupts, which as a side effect enables the
scheduler because the scheduler runs on return from the timer interrupt, using
@func{intr_yield_on_return} (@pxref{External Interrupt Handling}).
@end deftypefun

@deftypefun void thread_tick (void)
Called by the timer interrupt at each timer tick.  It keeps track of
thread statistics and triggers the scheduler when a time slice expires.
@end deftypefun

@deftypefun void thread_print_stats (void)
Called during Pintos shutdown to print thread statistics.
@end deftypefun

@deftypefun tid_t thread_create (const char *@var{name}, int @var{priority}, thread_func *@var{func}, void *@var{aux})
Creates and starts a new thread named @var{name} with the given
@var{priority}, returning the new thread's tid.  The thread executes
@var{func}, passing @var{aux} as the function's single argument.

@func{thread_create} allocates a page for the thread's
@struct{thread} and stack and initializes its members, then it sets
up a set of fake stack frames for it (@pxref{Thread Switching}).  The
thread is initialized in the blocked state, then unblocked just before
returning, which allows the new thread to
be scheduled (@pxref{Thread States}).

@deftp {Type} {void thread_func (void *@var{aux})}
This is the type of the function passed to @func{thread_create}, whose
@var{aux} argument is passed along as the function's argument.
@end deftp
@end deftypefun

@deftypefun void thread_block (void)
Transitions the running thread from the running state to the blocked
state (@pxref{Thread States}).  The thread will not run again until
@func{thread_unblock} is
called on it, so you'd better have some way arranged for that to happen.
Because @func{thread_block} is so low-level, you should prefer to use
one of the synchronization primitives instead (@pxref{Synchronization}).
@end deftypefun

@deftypefun void thread_unblock (struct thread *@var{thread})
Transitions @var{thread}, which must be in the blocked state, to the
ready state, allowing it to resume running (@pxref{Thread States}).
This is called when the event that the thread is waiting for occurs,
e.g.@: when the lock that 
the thread is waiting on becomes available.
@end deftypefun

@deftypefun {struct thread *} thread_current (void)
Returns the running thread.
@end deftypefun

@deftypefun {tid_t} thread_tid (void)
Returns the running thread's thread id.  Equivalent to
@code{thread_current ()->tid}.
@end deftypefun

@deftypefun {const char *} thread_name (void)
Returns the running thread's name.  Equivalent to @code{thread_current
()->name}.
@end deftypefun

@deftypefun void thread_exit (void) @code{NO_RETURN}
Causes the current thread to exit.  Never returns, hence
@code{NO_RETURN} (@pxref{Function and Parameter Attributes}).
@end deftypefun

@deftypefun void thread_yield (void)
Yields the CPU to the scheduler, which picks a new thread to run.  The
new thread might be the current thread, so you can't depend on this
function to keep this thread from running for any particular length of
time.
@end deftypefun

@deftypefun void thread_foreach (thread_action_func *@var{action}, void *@var{aux})
Iterates over all threads @var{t} and invokes @code{action(t, aux)} on each.
@var{action} must refer to a function that matches the signature 
given by @func{thread_action_func}:

@deftp {Type} {void thread_action_func (struct thread *@var{thread}, void *@var{aux})}
Performs some action on a thread, given @var{aux}.
@end deftp
@end deftypefun

@deftypefun int thread_get_priority (void)
@deftypefunx void thread_set_priority (int @var{new_priority})
Stub to set and get thread priority.  @xref{Priority Scheduling}.
@end deftypefun

@deftypefun int thread_get_nice (void)
@deftypefunx void thread_set_nice (int @var{new_nice})
@deftypefunx int thread_get_recent_cpu (void)
@deftypefunx int thread_get_load_avg (void)
Stubs for the advanced scheduler.  @xref{4.4BSD Scheduler}.
@end deftypefun

@node Thread Switching
@subsection Thread Switching

@func{schedule} is responsible for switching threads.  It
is internal to @file{threads/thread.c} and called only by the three
public thread functions that need to switch threads:
@func{thread_block}, @func{thread_exit}, and @func{thread_yield}.
Before any of these functions call @func{schedule}, they disable
interrupts (or ensure that they are already disabled) and then change
the running thread's state to something other than running.

@func{schedule} is short but tricky.  It records the
current thread in local variable @var{cur}, determines the next thread
to run as local variable @var{next} (by calling
@func{next_thread_to_run}), and then calls @func{switch_threads} to do
the actual thread switch.  The thread we switched to was also running
inside @func{switch_threads}, as are all the threads not currently
running, so the new thread now returns out of
@func{switch_threads}, returning the previously running thread.

@func{switch_threads} is an assembly language routine in
@file{threads/switch.S}.  It saves registers on the stack, saves the
CPU's current stack pointer in the current @struct{thread}'s @code{stack}
member, restores the new thread's @code{stack} into the CPU's stack
pointer, restores registers from the stack, and returns.

The rest of the scheduler is implemented in @func{thread_schedule_tail}.  It
marks the new thread as running.  If the thread we just switched from
is in the dying state, then it also frees the page that contained the
dying thread's @struct{thread} and stack.  These couldn't be freed
prior to the thread switch because the switch needed to use it.

Running a thread for the first time is a special case.  When
@func{thread_create} creates a new thread, it goes through a fair
amount of trouble to get it started properly.  In particular, the new
thread hasn't started running yet, so there's no way for it to be
running inside @func{switch_threads} as the scheduler expects.  To
solve the problem, @func{thread_create} creates some fake stack frames
in the new thread's stack:

@itemize @bullet
@item
The topmost fake stack frame is for @func{switch_threads}, represented
by @struct{switch_threads_frame}.  The important part of this frame is
its @code{eip} member, the return address.  We point @code{eip} to
@func{switch_entry}, indicating it to be the function that called
@func{switch_threads}.

@item
The next fake stack frame is for @func{switch_entry}, an assembly
language routine in @file{threads/switch.S} that adjusts the stack
pointer,@footnote{This is because @func{switch_threads} takes
arguments on the stack and the 80@var{x}86 SVR4 calling convention
requires the caller, not the called function, to remove them when the
call is complete.  See @bibref{SysV-i386} chapter 3 for details.}
calls @func{thread_schedule_tail} (this special case is why
@func{thread_schedule_tail} is separate from @func{schedule}), and returns.
We fill in its stack frame so that it returns into
@func{kernel_thread}, a function in @file{threads/thread.c}.

@item
The final stack frame is for @func{kernel_thread}, which enables
interrupts and calls the thread's function (the function passed to
@func{thread_create}).  If the thread's function returns, it calls
@func{thread_exit} to terminate the thread.
@end itemize

@node Synchronization
@section Synchronization

If sharing of resources between threads is not handled in a careful, controlled fashion, the result is usually a big mess.
This is especially the case in operating system kernels, where faulty sharing can crash the entire machine.  
Pintos provides several synchronization primitives to help out.

@cartouche
@noindent@strong{Important:} For the scope of all Pintos tasks, you may assume that any 1, 2 or 4 byte read or write operation on aligned memory is atomic.
All other read or write operations could potentially be interrupted or descheduled.
@end cartouche

@menu
* Disabling Interrupts::        
* Semaphores::                  
* Locks::                       
* Monitors::                    
* Optimization Barriers::
@end menu

@node Disabling Interrupts
@subsection Disabling Interrupts

The crudest way to do synchronization is to disable interrupts, that
is, to temporarily prevent the CPU from responding to interrupts.  If
interrupts are off, no other thread will preempt the running thread,
because thread preemption is driven by the timer interrupt.  If
interrupts are on, as they normally are, then the running thread may
be preempted by another at any time, whether between two C statements
or even within the execution of one.

Incidentally, this means that Pintos is a ``preemptible kernel,'' that
is, kernel threads can be preempted at any time.  Traditional Unix
systems are ``nonpreemptible,'' that is, kernel threads can only be
preempted at points where they explicitly call into the scheduler.
(User programs can be preempted at any time in both models.)  As you
might imagine, preemptible kernels require more explicit
synchronization.

You should have little need to set the interrupt state directly.  Most
of the time you should use the other synchronization primitives
described in the following sections.  The main reason to disable
interrupts is to synchronize kernel threads with external interrupt
handlers, which cannot sleep and thus cannot use most other forms of
synchronization (@pxref{External Interrupt Handling}).

Some external interrupts cannot be postponed, even by disabling
interrupts.  These interrupts, called @dfn{non-maskable interrupts}
(NMIs), are supposed to be used only in emergencies, e.g.@: when the
computer is on fire.  Pintos does not handle non-maskable interrupts.

Types and functions for disabling and enabling interrupts are in
@file{threads/interrupt.h}.

@deftp Type {enum intr_level}
One of @code{INTR_OFF} or @code{INTR_ON}, denoting that interrupts are
disabled or enabled, respectively.
@end deftp

@deftypefun {enum intr_level} intr_get_level (void)
Returns the current interrupt state.
@end deftypefun

@deftypefun {enum intr_level} intr_set_level (enum intr_level @var{level})
Turns interrupts on or off according to @var{level}.  Returns the
previous interrupt state.
@end deftypefun

@deftypefun {enum intr_level} intr_enable (void)
Turns interrupts on.  Returns the previous interrupt state.
@end deftypefun

@deftypefun {enum intr_level} intr_disable (void)
Turns interrupts off.  Returns the previous interrupt state.
@end deftypefun

@node Semaphores
@subsection Semaphores

A @dfn{semaphore} is a nonnegative integer together with two operators
that manipulate it atomically, which are:

@itemize @bullet
@item
``Down'' or ``P'': wait for the value to become positive, then
decrement it.

@item
``Up'' or ``V'': increment the value (and wake up one waiting thread,
if any).
@end itemize

A semaphore initialized to 0 may be used to wait for an event
that will happen exactly once.  For example, suppose thread @var{A}
starts another thread @var{B} and wants to wait for @var{B} to signal
that some activity is complete.  @var{A} can create a semaphore
initialized to 0, pass it to @var{B} as it starts it, and then
``down'' the semaphore.  When @var{B} finishes its activity, it
``ups'' the semaphore.  This works regardless of whether @var{A}
``downs'' the semaphore or @var{B} ``ups'' it first.

A semaphore initialized to 1 is typically used for controlling access
to a resource.  Before a block of code starts using the resource, it
``downs'' the semaphore, then after it is done with the resource it
``ups'' the resource.  In such a case a lock, described below, may be
more appropriate.

Semaphores can also be initialized to values larger than 1.  These are
rarely used.

Semaphores were invented by Edsger Dijkstra and first used in the THE
operating system (@bibref{Dijkstra}).

Pintos' semaphore type and operations are declared in
@file{threads/synch.h}.  

@deftp {Type} {struct semaphore}
Represents a semaphore.
@end deftp

@deftypefun void sema_init (struct semaphore *@var{sema}, unsigned @var{value})
Initializes @var{sema} as a new semaphore with the given initial
@var{value}.
@end deftypefun

@deftypefun void sema_down (struct semaphore *@var{sema})
Executes the ``down'' or ``P'' operation on @var{sema}, waiting for
its value to become positive and then decrementing it by one.
@end deftypefun

@deftypefun bool sema_try_down (struct semaphore *@var{sema})
Tries to execute the ``down'' or ``P'' operation on @var{sema},
without waiting.  Returns true if @var{sema}
was successfully decremented, or false if it was already
zero and thus could not be decremented without waiting.  Calling this
function in a
tight loop wastes CPU time, so use @func{sema_down} or find a
different approach instead.
@end deftypefun

@deftypefun void sema_up (struct semaphore *@var{sema})
Executes the ``up'' or ``V'' operation on @var{sema},
incrementing its value.  If any threads are waiting on
@var{sema}, wakes one of them up.

Unlike most synchronization primitives, @func{sema_up} may be called
inside an external interrupt handler (@pxref{External Interrupt
Handling}).
@end deftypefun

Semaphores are internally implemented by disabling interrupts
(@pxref{Disabling Interrupts}) and  blocking and unblocking threads
(@func{thread_block} and @func{thread_unblock}).  Each semaphore maintains
a list of waiting threads, using the linked list
implementation in @file{lib/kernel/list.c}.

@node Locks
@subsection Locks

A @dfn{lock} is like a semaphore with an initial value of 1
(@pxref{Semaphores}).  A lock's equivalent of ``up'' is called
``release'', and the ``down'' operation is called ``acquire''.

Compared to a semaphore, a lock has one added restriction: only the
thread that acquires a lock, called the lock's ``owner'', is allowed to
release it.  If this restriction is a problem, it's a good sign that a
semaphore should be used, instead of a lock.

Locks in Pintos are not ``recursive,'' that is, it is an error for the
thread currently holding a lock to try to acquire that lock.

Lock types and functions are declared in @file{threads/synch.h}.

@deftp {Type} {struct lock}
Represents a lock.
@end deftp

@deftypefun void lock_init (struct lock *@var{lock})
Initializes @var{lock} as a new lock.
The lock is not initially owned by any thread.
@end deftypefun

@deftypefun void lock_acquire (struct lock *@var{lock})
Acquires @var{lock} for the current thread, first waiting for
any current owner to release it if necessary.
@end deftypefun

@deftypefun bool lock_try_acquire (struct lock *@var{lock})
Tries to acquire @var{lock} for use by the current thread, without
waiting.  Returns true if successful, false if the lock is already
owned.  Calling this function in a tight loop is a bad idea because it
wastes CPU time, so use @func{lock_acquire} instead.
@end deftypefun

@deftypefun void lock_release (struct lock *@var{lock})
Releases @var{lock}, which the current thread must own.
@end deftypefun

@deftypefun bool lock_held_by_current_thread (const struct lock *@var{lock})
Returns true if the running thread owns @var{lock},
false otherwise.
There is no function to test whether an arbitrary thread owns a lock,
because the answer could change before the caller could act on it.
@end deftypefun

@node Monitors
@subsection Monitors

A @dfn{monitor} is a higher-level form of synchronization than a
semaphore or a lock.  A monitor consists of data being synchronized,
plus a lock, called the @dfn{monitor lock}, and one or more
@dfn{condition variables}.  Before it accesses the protected data, a
thread first acquires the monitor lock.  It is then said to be ``in the
monitor''.  While in the monitor, the thread has control over all the
protected data, which it may freely examine or modify.  When access to
the protected data is complete, it releases the monitor lock.

Condition variables allow code in the monitor to wait for a condition to
become true.  Each condition variable is associated with an abstract
condition, e.g.@: ``some data has arrived for processing'' or ``over 10
seconds has passed since the user's last keystroke''.  When code in the
monitor needs to wait for a condition to become true, it ``waits'' on
the associated condition variable, which releases the lock and waits for
the condition to be signaled.  If, on the other hand, it has caused one
of these conditions to become true, it ``signals'' the condition to wake
up one waiter, or ``broadcasts'' the condition to wake all of them.

The theoretical framework for monitors was laid out by C.@: A.@: R.@:
Hoare (@bibref{Hoare}).  Their practical usage was later elaborated in a
paper on the Mesa operating system (@bibref{Lampson}).

Condition variable types and functions are declared in
@file{threads/synch.h}.

@deftp {Type} {struct condition}
Represents a condition variable.
@end deftp

@deftypefun void cond_init (struct condition *@var{cond})
Initializes @var{cond} as a new condition variable.
@end deftypefun

@deftypefun void cond_wait (struct condition *@var{cond}, struct lock *@var{lock})
Atomically releases @var{lock} (the monitor lock) and waits for
@var{cond} to be signaled by some other piece of code.  After
@var{cond} is signaled, reacquires @var{lock} before returning.
@var{lock} must be held before calling this function.

Sending a signal and waking up from a wait are not an atomic operation.
Thus, typically, @func{cond_wait}'s caller must recheck the condition
after the wait completes and, if necessary, wait again.  See the next
section for an example.
@end deftypefun

@deftypefun void cond_signal (struct condition *@var{cond}, struct lock *@var{lock})
If any threads are waiting on @var{cond} (protected by monitor lock
@var{lock}), then this function wakes up one of them.  If no threads are
waiting, returns without performing any action.
@var{lock} must be held before calling this function.
@end deftypefun

@deftypefun void cond_broadcast (struct condition *@var{cond}, struct lock *@var{lock})
Wakes up all threads, if any, waiting on @var{cond} (protected by
monitor lock @var{lock}).  @var{lock} must be held before calling this
function.
@end deftypefun

@subsubsection Monitor Example

The classical example of a monitor is handling a buffer into which one
or more
``producer'' threads write characters and out of which one or more
``consumer'' threads read characters.  To implement this we need,
besides the monitor lock, two condition variables which we will call
@var{not_full} and @var{not_empty}:

@example
char buf[BUF_SIZE];     /* @r{Buffer.} */
size_t n = 0;           /* @r{0 <= n <= @var{BUF_SIZE}: # of characters in buffer.} */
size_t head = 0;        /* @r{@var{buf} index of next char to write (mod @var{BUF_SIZE}).} */
size_t tail = 0;        /* @r{@var{buf} index of next char to read (mod @var{BUF_SIZE}).} */
struct lock lock;       /* @r{Monitor lock.} */
struct condition not_empty; /* @r{Signaled when the buffer is not empty.} */
struct condition not_full; /* @r{Signaled when the buffer is not full.} */

@dots{}@r{initialize the locks and condition variables}@dots{}

void put (char ch) @{
  lock_acquire (&lock);
  while (n == BUF_SIZE) @{            /* @r{Can't add to @var{buf} as long as it's full.} */
    cond_wait (&not_full, &lock);
  @}
  buf[head++ % BUF_SIZE] = ch;     /* @r{Add @var{ch} to @var{buf}.} */
  n++;
  cond_signal (&not_empty, &lock); /* @r{@var{buf} can't be empty anymore.} */
  lock_release (&lock);
@}

char get (void) @{
  char ch;
  lock_acquire (&lock);
  while (n == 0) @{                  /* @r{Can't read @var{buf} as long as it's empty.} */
    cond_wait (&not_empty, &lock);
  @}
  ch = buf[tail++ % BUF_SIZE];    /* @r{Get @var{ch} from @var{buf}.} */
  n--;
  cond_signal (&not_full, &lock); /* @r{@var{buf} can't be full anymore.} */
  lock_release (&lock);
  return ch;
@}
@end example

Note that @code{BUF_SIZE} must divide evenly into @code{SIZE_MAX + 1}
for the above code to be completely correct.  Otherwise, it will fail
the first time @code{head} wraps around to 0.  In practice,
@code{BUF_SIZE} would ordinarily be a power of 2.

@node Optimization Barriers
@subsection Optimization Barriers

@c We should try to come up with a better example.
@c Perhaps something with a linked list?

An @dfn{optimization barrier} is a special statement that prevents the
compiler from making assumptions about the state of memory across the
barrier.  The compiler will not reorder reads or writes of variables
across the barrier or assume that a variable's value is unmodified
across the barrier, except for local variables whose address is never
taken.  In Pintos, @file{threads/synch.h} defines the @code{barrier()}
macro as an optimization barrier.

One reason to use an optimization barrier is when data can change
asynchronously, without the compiler's knowledge, e.g.@: by another
thread or an interrupt handler.  The @func{too_many_loops} function in
@file{devices/timer.c} is an example.  This function starts out by
busy-waiting in a loop until a timer tick occurs:

@example
/* Wait for a timer tick. */
int64_t start = ticks;
while (ticks == start) @{
  barrier ();
@}
@end example

@noindent
Without an optimization barrier in the loop, the compiler could
conclude that the loop would never terminate, because @code{start} and
@code{ticks} start out equal and the loop itself never changes them.
It could then ``optimize'' the function into an infinite loop, which
would definitely be undesirable.

Optimization barriers can be used to avoid other compiler
optimizations.  The @func{busy_wait} function, also in
@file{devices/timer.c}, is an example.  It contains this loop:

@example
while (loops-- > 0) @{
  barrier ();
@}
@end example

@noindent
The goal of this loop is to busy-wait by counting @code{loops} down
from its original value to 0.  Without the barrier, the compiler could
delete the loop entirely, because it produces no useful output and has
no side effects.  The barrier forces the compiler to pretend that the
loop body has an important effect.

Finally, optimization barriers can be used to force the ordering of
memory reads or writes.  For example, suppose we add a ``feature''
that, whenever a timer interrupt occurs, the character in global
variable @code{timer_put_char} is printed on the console, but only if
global Boolean variable @code{timer_do_put} is true.  The best way to
set up @samp{x} to be printed is then to use an optimization barrier,
like this:

@example
timer_put_char = 'x';
barrier ();
timer_do_put = true;
@end example

Without the barrier, the code is buggy because the compiler is free to
reorder operations when it doesn't see a reason to keep them in the
same order.  In this case, the compiler doesn't know that the order of
assignments is important, so its optimizer is permitted to exchange
their order.  There's no telling whether it will actually do this, and
it is possible that passing the compiler different optimization flags
or using a different version of the compiler will produce different
behavior.

Another solution is to disable interrupts around the assignments.
This does not prevent reordering, but it prevents the interrupt
handler from intervening between the assignments.  It also has the
extra runtime cost of disabling and re-enabling interrupts:

@example
enum intr_level old_level = intr_disable ();
timer_put_char = 'x';
timer_do_put = true;
intr_set_level (old_level);
@end example

A third solution is to mark the declarations of
@code{timer_put_char} and @code{timer_do_put} as @samp{volatile}.  This
keyword tells the compiler that the variables are externally observable
and restricts its latitude for optimization.  However, the semantics of
@samp{volatile} are not well-defined, so it is not a good general
solution.  The base Pintos code does not use @samp{volatile} at all.

The following is @emph{not} a solution, because locks neither prevent
interrupts nor prevent the compiler from reordering the code within the
region where the lock is held:

@example
lock_acquire (&timer_lock);     /* INCORRECT CODE */
timer_put_char = 'x';
timer_do_put = true;
lock_release (&timer_lock);
@end example

The compiler treats invocation of any function defined externally,
that is, in another source file, as a limited form of optimization
barrier.  Specifically, the compiler assumes that any externally
defined function may access any statically or dynamically allocated
data and any local variable whose address is taken.  This often means
that explicit barriers can be omitted.  It is one reason that Pintos
contains few explicit barriers.

A function defined in the same source file, or in a header included by
the source file, cannot be relied upon as a optimization barrier.
This applies even to invocation of a function before its
definition, because the compiler may read and parse the entire source
file before performing optimization.

@node Interrupt Handling
@section Interrupt Handling

An @dfn{interrupt} notifies the CPU of some event.  Much of the work
of an operating system relates to interrupts in one way or another.
For our purposes, we classify interrupts into two broad categories:

@itemize @bullet
@item
@dfn{Internal interrupts}, that is, interrupts caused directly by CPU
instructions.  System calls, attempts at invalid memory access
(@dfn{page faults}), and attempts to divide by zero are some activities
that cause internal interrupts.  Because they are caused by CPU
instructions, internal interrupts are @dfn{synchronous} or synchronized
with CPU instructions.  @func{intr_disable} does not disable internal
interrupts.

@item
@dfn{External interrupts}, that is, interrupts originating outside the
CPU.  These interrupts come from hardware devices such as the system
timer, keyboard, serial ports, and disks.  External interrupts are
@dfn{asynchronous}, meaning that their delivery is not
synchronized with instruction execution.  Handling of external interrupts
can be postponed with @func{intr_disable} and related functions
(@pxref{Disabling Interrupts}).
@end itemize

The CPU treats both classes of interrupts largely the same way,
so Pintos has common infrastructure to handle both classes.
The following section describes this
common infrastructure.  The sections after that give the specifics of
external and internal interrupts.

If you haven't already read chapter 3, ``Basic Execution Environment,''
in @bibref{IA32-v1}, it is recommended that you do so now.  You might
also want to skim chapter 5, ``Interrupt and Exception Handling,'' in
@bibref{IA32-v3a}.

@menu
* Interrupt Infrastructure::    
* Internal Interrupt Handling::  
* External Interrupt Handling::  
@end menu

@node Interrupt Infrastructure
@subsection Interrupt Infrastructure

When an interrupt occurs, the CPU saves
its most essential state on the current stack (determined by esp) 
and jumps to an interrupt handler routine.  
The 80@var{x}86 architecture supports 256
interrupts, numbered 0 through 255, each with an independent
handler defined in an array called the @dfn{interrupt
descriptor table} or IDT.

In Pintos, @func{intr_init} in @file{threads/interrupt.c} sets up the
IDT so that each entry points to a unique entry point in
@file{threads/intr-stubs.S} named @func{intr@var{NN}_stub}, where
@var{NN} is the interrupt number in
hexadecimal.  Because the CPU doesn't give
us any other way to find out the interrupt number, this entry point
pushes the interrupt number on the stack.  Then it jumps to
@func{intr_entry}, which pushes all the registers that the processor
didn't already push for us, and then calls @func{intr_handler}, which
brings us back into C in @file{threads/interrupt.c}.

The main job of @func{intr_handler} is to call the function
registered for handling the particular interrupt.  (If no
function is registered, it dumps some information to the console and
panics.)  It also does some extra processing for external
interrupts (@pxref{External Interrupt Handling}).

When @func{intr_handler} returns, the assembly code in
@file{threads/intr-stubs.S} restores all the CPU registers saved
earlier and directs the CPU to return from the interrupt.

The following types and functions are common to all interrupts:

@deftp {Type} {void intr_handler_func (struct intr_frame *@var{frame})}
This is how an interrupt handler function must be declared.  Its @var{frame}
argument (see below) allows it to determine the cause of the interrupt
and the state of the thread that was interrupted.
@end deftp

@deftp {Type} {struct intr_frame}
The stack frame of an interrupt handler, as saved by the CPU, the interrupt
stubs, and @func{intr_entry}.  Its most interesting members are described
below.
@end deftp

@deftypecv {Member} {@struct{intr_frame}} uint32_t edi
@deftypecvx {Member} {@struct{intr_frame}} uint32_t esi
@deftypecvx {Member} {@struct{intr_frame}} uint32_t ebp
@deftypecvx {Member} {@struct{intr_frame}} uint32_t esp_dummy
@deftypecvx {Member} {@struct{intr_frame}} uint32_t ebx
@deftypecvx {Member} {@struct{intr_frame}} uint32_t edx
@deftypecvx {Member} {@struct{intr_frame}} uint32_t ecx
@deftypecvx {Member} {@struct{intr_frame}} uint32_t eax
@deftypecvx {Member} {@struct{intr_frame}} uint16_t es
@deftypecvx {Member} {@struct{intr_frame}} uint16_t ds
Register values in the interrupted thread, pushed by @func{intr_entry}.
The @code{esp_dummy} value isn't actually used (refer to the
description of @code{PUSHA} in @bibref{IA32-v2b} for details).
@end deftypecv

@deftypecv {Member} {@struct{intr_frame}} uint32_t vec_no
The interrupt vector number, ranging from 0 to 255.
@end deftypecv

@deftypecv {Member} {@struct{intr_frame}} uint32_t error_code
The ``error code'' pushed on the stack by the CPU for some internal
interrupts.
@end deftypecv

@deftypecv {Member} {@struct{intr_frame}} void (*eip) (void)
The address of the next instruction to be executed by the interrupted
thread.
@end deftypecv

@deftypecv {Member} {@struct{intr_frame}} {void *} esp
The interrupted thread's stack pointer.
@end deftypecv

@deftypefun {const char *} intr_name (uint8_t @var{vec})
Returns the name of the interrupt numbered @var{vec}, or
@code{"unknown"} if the interrupt has no registered name.
@end deftypefun

@node Internal Interrupt Handling
@subsection Internal Interrupt Handling

Internal interrupts are caused directly by CPU instructions executed by
the running kernel thread or user process (from task 2 onward).  An
internal interrupt is therefore said to arise in a ``process context.''

In an internal interrupt's handler, it can make sense to examine the
@struct{intr_frame} passed to the interrupt handler, or even to modify
it.  When the interrupt returns, modifications in @struct{intr_frame}
become changes to the calling thread or process's state.  For example,
the Pintos system call handler returns a value to the user program by
modifying the saved EAX register (@pxref{System Call Details}).

There are no special restrictions on what an internal interrupt
handler can or can't do.  Generally they should run with interrupts
enabled, just like other code, so they can be preempted by other
kernel threads.  Thus, they do need to synchronize with other threads
on shared data and other resources (@pxref{Synchronization}).  Of course, this
only makes sense if they are not updating critical system data at the time. 

Internal interrupt handlers can be invoked recursively.  For example,
the system call handler might cause a page fault while attempting to
read user memory.  Deep recursion would risk overflowing the limited
kernel stack (@pxref{struct thread}), but should be unnecessary.

@deftypefun void intr_register_int (uint8_t @var{vec}, int @var{dpl}, enum intr_level @var{level}, intr_handler_func *@var{handler}, const char *@var{name})
Registers @var{handler} to be called when internal interrupt numbered
@var{vec} is triggered.  Names the interrupt @var{name} for debugging
purposes.

If @var{level} is @code{INTR_ON}, external interrupts will be processed
normally during the interrupt handler's execution, which is normally
desirable.  Specifying @code{INTR_OFF} will cause the CPU to disable
external interrupts when it invokes the interrupt handler.  The effect
is slightly different from calling @func{intr_disable} inside the
handler, because that leaves a window of one or more CPU instructions in
which external interrupts are still enabled.  This is important for the
page fault handler; refer to the comments in @file{userprog/exception.c}
for details.

@var{dpl} determines how the interrupt can be invoked.  If @var{dpl} is
0, then the interrupt can be invoked only by kernel threads.  Otherwise
@var{dpl} should be 3, which allows user processes to invoke the
interrupt with an explicit INT instruction.  The value of @var{dpl}
doesn't affect user processes' ability to invoke the interrupt
indirectly, e.g.@: an invalid memory reference will cause a page fault
regardless of @var{dpl}.
@end deftypefun

@node External Interrupt Handling
@subsection External Interrupt Handling

External interrupts are caused by events outside the CPU.
They are asynchronous, so they can be invoked at any time that
interrupts have not been disabled.  We say that an external interrupt
runs in an ``interrupt context.''

In an external interrupt, the @struct{intr_frame} passed to the
handler is not very meaningful.  It describes the state of the thread
or process that was interrupted, but there is no way to predict which
one that is.  It is possible, although rarely useful, to examine it, but
modifying it is a recipe for disaster.

Only one external interrupt may be processed at a time.  Neither
internal nor external interrupt may nest within an external interrupt
handler.  Thus, an external interrupt's handler must run with interrupts
disabled (@pxref{Disabling Interrupts}).

An external interrupt handler must not sleep or yield, which rules out
calling @func{lock_acquire}, @func{thread_yield}, and many other
functions.  Sleeping in interrupt context would effectively put the
interrupted thread to sleep, too, until the interrupt handler was again
scheduled and returned.  This would be unfair to the unlucky thread, and
it would deadlock if the handler were waiting for the sleeping thread
to, e.g., release a lock.

An external interrupt handler effectively monopolizes the machine and delays 
all other activities. Therefore, external interrupt handlers should complete 
as quickly as they can.  Anything that requires a significant amount of CPU 
time should instead run in a kernel thread, possibly one that the interrupt 
triggers using a synchronization primitive.

External interrupts are controlled by a
pair of devices outside the CPU called @dfn{programmable interrupt
controllers}, @dfn{PICs} for short.  When @func{intr_init} sets up the
CPU's IDT, it also initializes the PICs for interrupt handling.  The
PICs also must be ``acknowledged'' at the end of processing for each
external interrupt.  @func{intr_handler} takes care of that by calling
@func{pic_end_of_interrupt}, which properly signals the PICs.

The following functions relate to external interrupts:

@deftypefun void intr_register_ext (uint8_t @var{vec}, intr_handler_func *@var{handler}, const char *@var{name})
Registers @var{handler} to be called when external interrupt numbered
@var{vec} is triggered.  Names the interrupt @var{name} for debugging
purposes.  The handler will run with interrupts disabled.
@end deftypefun

@deftypefun bool intr_context (void)
Returns true if we are running in an interrupt context, otherwise
false.  Mainly used in functions that might sleep
or that otherwise should not be called from interrupt context, in this
form:
@example
ASSERT (!intr_context ());
@end example
@end deftypefun

@deftypefun void intr_yield_on_return (void)
When called in an interrupt context, causes @func{thread_yield} to be
called just before the interrupt returns.  Used
in the timer interrupt handler when a thread's time slice expires, to
cause a new thread to be scheduled.
@end deftypefun

@node Memory Allocation
@section Memory Allocation

Pintos contains two memory allocators, one that allocates memory in
units of a page, and one that can allocate blocks of any size.

@menu
* Page Allocator::              
* Block Allocator::             
@end menu

@node Page Allocator
@subsection Page Allocator

The page allocator declared in @file{threads/palloc.h} allocates
memory in units of a page.  It is most often used to allocate memory
one page at a time, but it can also allocate multiple contiguous pages
at once.

The page allocator divides the memory it allocates into two pools,
called the kernel and user pools.  By default, each pool gets half of
system memory above @w{1 MB}, but the division can be changed with the
@option{-ul} kernel
command line
option (@pxref{Why PAL_USER?}).  An allocation request draws from one
pool or the other.  If one pool becomes empty, the other may still
have free pages.  The user pool should be used for allocating memory
for user processes and the kernel pool for all other allocations.
This will only become important starting with task 3.  Until then,
all allocations should be made from the kernel pool.

Each pool's usage is tracked with a bitmap, one bit per page in
the pool.  A request to allocate @var{n} pages scans the bitmap
for @var{n} consecutive bits set to
false, indicating that those pages are free, and then sets those bits
to true to mark them as used.  This is a ``first fit'' allocation
strategy (@pxref{Wilson}).

The page allocator is subject to fragmentation.  That is, it may not
be possible to allocate @var{n} contiguous pages even though @var{n}
or more pages are free, because the free pages are separated by used
pages.  In fact, in pathological cases it may be impossible to
allocate 2 contiguous pages even though half of the pool's pages are free.
Single-page requests can't fail due to fragmentation, so
requests for multiple contiguous pages should be limited as much as
possible.

Pages may not be allocated from interrupt context, but they may be
freed.

When a page is freed, all of its bytes are cleared to @t{0xcc}, as
a debugging aid (@pxref{Debugging Tips}).

Page allocator types and functions are described below:

@deftypefun {void *} palloc_get_page (enum palloc_flags @var{flags})
@deftypefunx {void *} palloc_get_multiple (enum palloc_flags @var{flags}, size_t @var{page_cnt})
Obtains and returns one page, or @var{page_cnt} contiguous pages,
respectively.  Returns a null pointer if the pages cannot be allocated.

The @var{flags} argument may be any combination of the following flags:

@defvr {Page Allocator Flag} @code{PAL_ASSERT}
If the pages cannot be allocated, panic the kernel.  This is only
appropriate during kernel initialization.  User processes
should never be permitted to panic the kernel.
@end defvr

@defvr {Page Allocator Flag} @code{PAL_ZERO}
Zero all the bytes in the allocated pages before returning them.  If not
set, the contents of newly allocated pages are unpredictable.
@end defvr

@defvr {Page Allocator Flag} @code{PAL_USER}
Obtain the pages from the user pool.  If not set, pages are allocated
from the kernel pool.
@end defvr
@end deftypefun

@deftypefun void palloc_free_page (void *@var{page})
@deftypefunx void palloc_free_multiple (void *@var{pages}, size_t @var{page_cnt})
Frees one page, or @var{page_cnt} contiguous pages, respectively,
starting at @var{pages}.  All of the pages must have been obtained using
@func{palloc_get_page} or @func{palloc_get_multiple}.
@end deftypefun

@node Block Allocator
@subsection Block Allocator

The block allocator, declared in @file{threads/malloc.h}, can allocate
blocks of any size.  It is layered on top of the page allocator
described in the previous section.  Blocks returned by the block
allocator are obtained from the kernel pool.

The block allocator uses two different strategies for allocating memory.
The first strategy applies to blocks that are 1 kB or smaller
(one-fourth of the page size).  These allocations are rounded up to the
nearest power of 2, or 16 bytes, whichever is larger.  Then they are
grouped into a page used only for allocations of that size.

The second strategy applies to blocks larger than 1 kB.
These allocations (plus a small amount of overhead) are rounded up to
the nearest page in size, and then the block allocator requests that
number of contiguous pages from the page allocator.

In either case, the difference between the allocation requested size
and the actual block size is wasted.  A real operating system would
carefully tune its allocator to minimize this waste, but this is
unimportant in an instructional system like Pintos.

As long as a page can be obtained from the page allocator, small
allocations always succeed.  Most small allocations do not require a
new page from the page allocator at all, because they are satisfied
using part of a page already allocated.  However, large allocations
always require calling into the page allocator, and any allocation
that needs more than one contiguous page can fail due to fragmentation,
as already discussed in the previous section.  Thus, you should
minimize the number of large allocations in your code, especially
those over approximately 4 kB each.

When a block is freed, all of its bytes are cleared to @t{0xcc}, as
a debugging aid (@pxref{Debugging Tips}).

The block allocator may not be called from interrupt context.

The block allocator functions are described below.  Their interfaces are
the same as the standard C library functions of the same names.

@deftypefun {void *} malloc (size_t @var{size})
Obtains and returns a new block, from the kernel pool, at least
@var{size} bytes long.  Returns a null pointer if @var{size} is zero or
if memory is not available.
@end deftypefun

@deftypefun {void *} calloc (size_t @var{a}, size_t @var{b})
Obtains a returns a new block, from the kernel pool, at least
@code{@var{a} * @var{b}} bytes long.  The block's contents will be
cleared to zeros.  Returns a null pointer if @var{a} or @var{b} is zero
or if insufficient memory is available.
@end deftypefun

@deftypefun {void *} realloc (void *@var{block}, size_t @var{new_size})
Attempts to resize @var{block} to @var{new_size} bytes, possibly moving
it in the process.  If successful, returns the new block, in which case
the old block must no longer be accessed.  On failure, returns a null
pointer, and the old block remains valid.

A call with @var{block} null is equivalent to @func{malloc}.  A call
with @var{new_size} zero is equivalent to @func{free}.
@end deftypefun

@deftypefun void free (void *@var{block})
Frees @var{block}, which must have been previously returned by
@func{malloc}, @func{calloc}, or @func{realloc} (and not yet freed).
@end deftypefun

@node Virtual Addresses
@section Virtual Addresses

A 32-bit virtual address can be divided into a 20-bit @dfn{page number}
and a 12-bit @dfn{page offset} (or just @dfn{offset}), like this:

@example
@group
               31               12 11        0
              +-------------------+-----------+
              |    Page Number    |   Offset  |
              +-------------------+-----------+
                       Virtual Address
@end group
@end example

Header @file{threads/vaddr.h} defines these functions and macros for
working with virtual addresses:

@defmac PGSHIFT
@defmacx PGBITS
The bit index (0) and number of bits (12) of the offset part of a
virtual address, respectively.
@end defmac

@defmac PGMASK
A bit mask with the bits in the page offset set to 1, the rest set to 0
(@t{0xfff}).
@end defmac

@defmac PGSIZE
The page size in bytes (4,096).
@end defmac

@deftypefun unsigned pg_ofs (const void *@var{va})
Extracts and returns the page offset in virtual address @var{va}.
@end deftypefun

@deftypefun uintptr_t pg_no (const void *@var{va})
Extracts and returns the page number in virtual address @var{va}.
@end deftypefun

@deftypefun {void *} pg_round_down (const void *@var{va})
Returns the start of the virtual page that @var{va} points within, that
is, @var{va} with the page offset set to 0.
@end deftypefun

@deftypefun {void *} pg_round_up (const void *@var{va})
Returns @var{va} rounded up to the nearest page boundary.
@end deftypefun

Virtual memory in Pintos is divided into two regions: user virtual
memory and kernel virtual memory (@pxref{Virtual Memory Layout}).  The
boundary between them is @code{PHYS_BASE}:

@defmac PHYS_BASE
Base address of kernel virtual memory.  It defaults to @t{0xc0000000} (3
GB), but it may be changed to any multiple of @t{0x10000000} from
@t{0x80000000} to @t{0xf0000000}.

User virtual memory ranges from virtual address 0 up to
@code{PHYS_BASE}.  Kernel virtual memory occupies the rest of the
virtual address space, from @code{PHYS_BASE} up to 4 GB.
@end defmac

@deftypefun {bool} is_user_vaddr (const void *@var{va})
@deftypefunx {bool} is_kernel_vaddr (const void *@var{va})
Returns true if @var{va} is a user or kernel virtual address,
respectively, false otherwise.
@end deftypefun

The 80@var{x}86 architecture doesn't provide any way to directly access memory given
a physical address.  This ability is often necessary in an operating
system kernel, so Pintos works around it by mapping kernel virtual
memory one-to-one to physical memory.  That is, virtual address
@code{PHYS_BASE} accesses physical address 0, virtual address
@code{PHYS_BASE} + @t{0x1234} accesses physical address @t{0x1234}, and
so on up to the size of the machine's physical memory.  Thus, adding
@code{PHYS_BASE} to a physical address obtains a kernel virtual address
that accesses that address; conversely, subtracting @code{PHYS_BASE}
from a kernel virtual address obtains the corresponding physical
address.  Header @file{threads/vaddr.h} provides a pair of functions to
do these translations:

@deftypefun {void *} ptov (uintptr_t @var{pa})
Returns the kernel virtual address corresponding to physical address
@var{pa}, which should be between 0 and the number of bytes of physical
memory.
@end deftypefun

@deftypefun {uintptr_t} vtop (void *@var{va})
Returns the physical address corresponding to @var{va}, which must be a
kernel virtual address.
@end deftypefun

@node Page Table
@section Page Table

The code in @file{pagedir.c} is an abstract interface to the 80@var{x}86
hardware page table, also called a ``page directory'' by Intel processor
documentation.  The page table interface uses a @code{uint32_t *} to
represent a page table because this is convenient for accessing their
internal structure.

The sections below describe the page table interface and internals.

@menu
* Page Table Creation Destruction Activation::  
* Page Tables Inspection and Updates::  
* Page Table Accessed and Dirty Bits::  
* Page Table Details::          
@end menu

@node Page Table Creation Destruction Activation
@subsection Creation, Destruction, and Activation

These functions create, destroy, and activate page tables.  The base
Pintos code already calls these functions where necessary, so it should
not be necessary to call them yourself.

@deftypefun {uint32_t *} pagedir_create (void)
Creates and returns a new page table.  The new page table contains
Pintos's normal kernel virtual page mappings, but no user virtual
mappings.

Returns a null pointer if memory cannot be obtained.
@end deftypefun

@deftypefun void pagedir_destroy (uint32_t *@var{pd})
Frees all of the resources held by @var{pd}, including the page table
itself and the frames that it maps.
@end deftypefun

@deftypefun void pagedir_activate (uint32_t *@var{pd})
Activates @var{pd}.  The active page table is the one used by the CPU to
translate memory references.
@end deftypefun

@node Page Tables Inspection and Updates
@subsection Inspection and Updates

These functions examine or update the mappings from pages to frames
encapsulated by a page table.  They work on both active and inactive
page tables (that is, those for running and suspended processes),
flushing the TLB as necessary.

@deftypefun bool pagedir_set_page (uint32_t *@var{pd}, void *@var{upage}, void *@var{kpage}, bool @var{writable})
Adds to @var{pd} a mapping from user page @var{upage} to the frame identified
by kernel virtual address @var{kpage}.  If @var{writable} is true, the
page is mapped read/write; otherwise, it is mapped read-only.

User page @var{upage} must not already be mapped in @var{pd}.

Kernel page @var{kpage} should be a kernel virtual address obtained from
the user pool with @code{palloc_get_page(PAL_USER)} (@pxref{Why
PAL_USER?}).

Returns true if successful, false on failure.  Failure will occur if
additional memory required for the page table cannot be obtained.
@end deftypefun

@deftypefun {void *} pagedir_get_page (uint32_t *@var{pd}, const void *@var{uaddr})
Looks up the frame mapped to @var{uaddr} in @var{pd}.  Returns the
kernel virtual address for that frame, if @var{uaddr} is mapped, or a
null pointer if it is not.
@end deftypefun

@deftypefun void pagedir_clear_page (uint32_t *@var{pd}, void *@var{page})
Marks @var{page} ``not present'' in @var{pd}.  Later accesses to
the page will fault.

Other bits in the page table for @var{page} are preserved, permitting
the accessed and dirty bits (see the next section) to be checked.

This function has no effect if @var{page} is not mapped.
@end deftypefun

@node Page Table Accessed and Dirty Bits
@subsection Accessed and Dirty Bits

80@var{x}86 hardware provides some assistance for implementing page
replacement algorithms, through a pair of bits in the page table entry
(PTE) for each page.  On any read or write to a page, the CPU sets the
@dfn{accessed bit} to 1 in the page's PTE, and on any write, the CPU
sets the @dfn{dirty bit} to 1.  The CPU never resets these bits to 0,
but the OS may do so.

Proper interpretation of these bits requires understanding of
@dfn{aliases}, that is, two (or more) pages that refer to the same
frame.  When an aliased frame is accessed, the accessed and dirty bits
are updated in only one page table entry (the one for the page used for
access).  The accessed and dirty bits for the other aliases are not
updated.

@xref{Accessed and Dirty Bits}, on applying these bits in implementing
page replacement algorithms.

@deftypefun bool pagedir_is_dirty (uint32_t *@var{pd}, const void *@var{page})
@deftypefunx bool pagedir_is_accessed (uint32_t *@var{pd}, const void *@var{page})
Returns true if page directory @var{pd} contains a page table entry for
@var{page} that is marked dirty (or accessed).  Otherwise,
returns false.
@end deftypefun

@deftypefun void pagedir_set_dirty (uint32_t *@var{pd}, const void *@var{page}, bool @var{value})
@deftypefunx void pagedir_set_accessed (uint32_t *@var{pd}, const void *@var{page}, bool @var{value})
If page directory @var{pd} has a page table entry for @var{page}, then
its dirty (or accessed) bit is set to @var{value}.
@end deftypefun

@node Page Table Details
@subsection Page Table Details

The functions provided with Pintos are sufficient to implement the
tasks.  However, you may still find it worthwhile to understand the
hardware page table format, so we'll go into a little detail in this
section.

@menu
* Page Table Structure::        
* Page Table Entry Format::     
* Page Directory Entry Format::  
@end menu

@node Page Table Structure
@subsubsection Structure

The top-level paging data structure is a page called the ``page
directory'' (PD) arranged as an array of 1,024 32-bit page directory
entries (PDEs), each of which represents 4 MB of virtual memory.  Each
PDE may point to the physical address of another page called a
``page table'' (PT) arranged, similarly, as an array of 1,024
32-bit page table entries (PTEs), each of which translates a single 4
kB virtual page to a physical page.

Translation of a virtual address into a physical address follows
the three-step process illustrated in the diagram
below:@footnote{Actually, virtual to physical translation on the
80@var{x}86 architecture occurs via an intermediate ``linear
address,'' but Pintos (and most modern 80@var{x}86 OSes) set up the CPU
so that linear and virtual addresses are one and the same.  Thus, you
can effectively ignore this CPU feature.}

@enumerate 1
@item
The most-significant 10 bits of the virtual address (bits 22@dots{}31)
index the page directory.  If the PDE is marked ``present,'' the
physical address of a page table is read from the PDE thus obtained.
If the PDE is marked ``not present'' then a page fault occurs.

@item
The next 10 bits of the virtual address (bits 12@dots{}21) index
the page table.  If the PTE is marked ``present,'' the physical
address of a data page is read from the PTE thus obtained.  If the PTE
is marked ``not present'' then a page fault occurs.

@item
The least-significant 12 bits of the virtual address (bits 0@dots{}11)
are added to the data page's physical base address, yielding the final
physical address.
@end enumerate

@example
@group
 31                  22 21                  12 11                   0
+----------------------+----------------------+----------------------+
| Page Directory Index |   Page Table Index   |    Page Offset       |
+----------------------+----------------------+----------------------+
             |                    |                     |
     _______/             _______/                _____/
    /                    /                       /
   /    Page Directory  /      Page Table       /    Data Page
  /     .____________. /     .____________.    /   .____________.
  |1,023|____________| |1,023|____________|    |   |____________|
  |1,022|____________| |1,022|____________|    |   |____________|
  |1,021|____________| |1,021|____________|    \__\|____________|
  |1,020|____________| |1,020|____________|       /|____________|
  |     |            | |     |            |        |            |
  |     |            | \____\|            |_       |            |
  |     |      .     |      /|      .     | \      |      .     |
  \____\|      .     |_      |      .     |  |     |      .     |
       /|      .     | \     |      .     |  |     |      .     |
        |      .     |  |    |      .     |  |     |      .     |
        |            |  |    |            |  |     |            |
        |____________|  |    |____________|  |     |____________|
       4|____________|  |   4|____________|  |     |____________|
       3|____________|  |   3|____________|  |     |____________|
       2|____________|  |   2|____________|  |     |____________|
       1|____________|  |   1|____________|  |     |____________|
       0|____________|  \__\0|____________|  \____\|____________|
                           /                      /
@end group
@end example

Pintos provides some macros and functions that are useful for working
with raw page tables:

@defmac PTSHIFT
@defmacx PTBITS
The starting bit index (12) and number of bits (10), respectively, in a
page table index.
@end defmac

@defmac PTMASK
A bit mask with the bits in the page table index set to 1 and the rest
set to 0 (@t{0x3ff000}).
@end defmac

@defmac PTSPAN
The number of bytes of virtual address space that a single page table
page covers (4,194,304 bytes, or 4 MB).
@end defmac

@defmac PDSHIFT
@defmacx PDBITS
The starting bit index (22) and number of bits (10), respectively, in a
page directory index.
@end defmac

@defmac PDMASK
A bit mask with the bits in the page directory index set to 1 and other
bits set to 0 (@t{0xffc00000}).
@end defmac

@deftypefun uintptr_t pd_no (const void *@var{va})
@deftypefunx uintptr_t pt_no (const void *@var{va})
Returns the page directory index or page table index, respectively, for
virtual address @var{va}.  These functions are defined in
@file{threads/pte.h}.
@end deftypefun

@deftypefun unsigned pg_ofs (const void *@var{va})
Returns the page offset for virtual address @var{va}.  This function is
defined in @file{threads/vaddr.h}.
@end deftypefun

@node Page Table Entry Format
@subsubsection Page Table Entry Format

You do not need to understand the PTE format to do the Pintos
tasks, unless you wish to incorporate the page table into your
supplemental page table (@pxref{Managing the Supplemental Page Table}).

The actual format of a page table entry is summarized below.  For
complete information, refer to section 3.7, ``Page Translation Using
32-Bit Physical Addressing,'' in @bibref{IA32-v3a}.

@example
@group
 31                                   12 11 9      6 5     2 1 0
+---------------------------------------+----+----+-+-+---+-+-+-+
|           Physical Address            | AVL|    |D|A|   |U|W|P|
+---------------------------------------+----+----+-+-+---+-+-+-+
@end group
@end example

Some more information on each bit is given below.  The names are
@file{threads/pte.h} macros that represent the bits' values:

@defmac PTE_P
Bit 0, the ``present'' bit.  When this bit is 1, the
other bits are interpreted as described below.  When this bit is 0, any
attempt to access the page will page fault.  The remaining bits are then
not used by the CPU and may be used by the OS for any purpose.
@end defmac

@defmac PTE_W
Bit 1, the ``read/write'' bit.  When it is 1, the page
is writable.  When it is 0, write attempts will page fault.
@end defmac

@defmac PTE_U
Bit 2, the ``user/supervisor'' bit.  When it is 1, user
processes may access the page.  When it is 0, only the kernel may access
the page (user accesses will page fault).

Pintos clears this bit in PTEs for kernel virtual memory, to prevent
user processes from accessing them.
@end defmac
 
@defmac PTE_A
Bit 5, the ``accessed'' bit.  @xref{Page Table Accessed
and Dirty Bits}.
@end defmac

@defmac PTE_D
Bit 6, the ``dirty'' bit.  @xref{Page Table Accessed and
Dirty Bits}.
@end defmac

@defmac PTE_AVL
Bits 9@dots{}11, available for operating system use.
Pintos, as provided, does not use them and sets them to 0.
@end defmac

@defmac PTE_ADDR
Bits 12@dots{}31, the top 20 bits of the physical address of a frame.
The low 12 bits of the frame's address are always 0.
@end defmac

The other bits are either reserved or uninteresting in a Pintos context and
should be set to@tie{}0.

Header @file{threads/pte.h} defines three functions for working with
page table entries:

@deftypefun uint32_t pte_create_kernel (uint32_t *@var{page}, bool @var{writable})
Returns a page table entry that points to @var{page}, which should be a
kernel virtual address.  The PTE's present bit will be set.  It will be
marked for kernel-only access.  If @var{writable} is true, the PTE will
also be marked read/write; otherwise, it will be read-only.
@end deftypefun

@deftypefun uint32_t pte_create_user (uint32_t *@var{page}, bool @var{writable})
Returns a page table entry that points to @var{page}, which should be
the kernel virtual address of a frame in the user pool (@pxref{Why
PAL_USER?}).  The PTE's present bit will be set and it will be marked to
allow user-mode access.  If @var{writable} is true, the PTE will also be
marked read/write; otherwise, it will be read-only.
@end deftypefun

@deftypefun {void *} pte_get_page (uint32_t @var{pte})
Returns the kernel virtual address for the frame that @var{pte} points
to.  The @var{pte} may be present or not-present; if it is not-present
then the pointer returned is only meaningful if the address bits in the PTE
actually represent a physical address.
@end deftypefun

@node Page Directory Entry Format
@subsubsection Page Directory Entry Format

Page directory entries have the same format as PTEs, except that the
physical address points to a page table page instead of a frame.  Header
@file{threads/pte.h} defines two functions for working with page
directory entries:

@deftypefun uint32_t pde_create (uint32_t *@var{pt})
Returns a page directory that points to @var{page}, which should be the
kernel virtual address of a page table page.  The PDE's present bit will
be set, it will be marked to allow user-mode access, and it will be
marked read/write.
@end deftypefun

@deftypefun {uint32_t *} pde_get_pt (uint32_t @var{pde})
Returns the kernel virtual address for the page table page that
@var{pde}, which must be marked present, points to.
@end deftypefun

@node Hash Table
@section Hash Table

Pintos provides a hash table data structure in @file{lib/kernel/hash.c}.
To use it you will need to include its header file,
@file{lib/kernel/hash.h}, with @code{#include <hash.h>}.
No code provided with Pintos uses the hash table, which means that you
are free to use it as is, modify its implementation for your own
purposes, or ignore it, as you wish.

Most implementations of the virtual memory task use a hash table to
translate pages to frames.  You may find other uses for hash tables as
well.

@menu
* Hash Data Types::             
* Basic Hash Functions::        
* Hash Search Functions::       
* Hash Iteration Functions::    
* Hash Table Example::          
* Hash Auxiliary Data::         
* Hash Synchronization::        
@end menu

@node Hash Data Types
@subsection Data Types

A hash table is represented by @struct{hash}.

@deftp {Type} {struct hash}
Represents an entire hash table.  The actual members of @struct{hash}
are ``opaque.''  That is, code that uses a hash table should not access
@struct{hash} members directly, nor should it need to.  Instead, use
hash table functions and macros.
@end deftp

The hash table operates on elements of type @struct{hash_elem}.

@deftp {Type} {struct hash_elem}
Embed a @struct{hash_elem} member in the structure you want to include
in a hash table.  Like @struct{hash}, @struct{hash_elem} is opaque.
All functions for operating on hash table elements actually take and
return pointers to @struct{hash_elem}, not pointers to your hash table's
real element type.
@end deftp

You will often need to obtain a @struct{hash_elem} given a real element
of the hash table, and vice versa.  Given a real element of the hash
table, you may use the @samp{&} operator to obtain a pointer to its
@struct{hash_elem}.  Use the @code{hash_entry()} macro to go the other
direction.

@deftypefn {Macro} {@var{type} *} hash_entry (struct hash_elem *@var{elem}, @var{type}, @var{member})
Returns a pointer to the structure that @var{elem}, a pointer to a
@struct{hash_elem}, is embedded within.  You must provide @var{type},
the name of the structure that @var{elem} is inside, and @var{member},
the name of the member in @var{type} that @var{elem} points to.

For example, suppose @code{h} is a @code{struct hash_elem *} variable
that points to a @struct{thread} member (of type @struct{hash_elem})
named @code{h_elem}.  Then, @code{hash_entry@tie{}(h, struct thread, h_elem)}
yields the address of the @struct{thread} that @code{h} points within.
@end deftypefn

@xref{Hash Table Example}, for an example.

Each hash table element must contain a key, that is, data that
identifies and distinguishes elements, which must be unique
among elements in the hash table.  (Elements may
also contain non-key data that need not be unique.)  While an element is
in a hash table, its key data must not be changed.  Instead, if need be,
remove the element from the hash table, modify its key, then reinsert
the element.

For each hash table, you must write two functions that act on keys: a
hash function and a comparison function.  These functions must match the
following prototypes:

@deftp {Type} {unsigned hash_hash_func (const struct hash_elem *@var{element}, void *@var{aux})}
Returns a hash of @var{element}'s data, as a value anywhere in the range
of @code{unsigned int}.  The hash of an element should be a
pseudo-random function of the element's key.  It must not depend on
non-key data in the element or on any non-constant data other than the
key.  Pintos provides the following functions as a suitable basis for
hash functions.

@deftypefun unsigned hash_bytes (const void *@var{buf}, size_t *@var{size})
Returns a hash of the @var{size} bytes starting at @var{buf}.  The
implementation is the general-purpose
@uref{http://en.wikipedia.org/wiki/Fowler_Noll_Vo_hash, Fowler-Noll-Vo
hash} for 32-bit words.
@end deftypefun

@deftypefun unsigned hash_string (const char *@var{s})
Returns a hash of null-terminated string @var{s}.
@end deftypefun

@deftypefun unsigned hash_int (int @var{i})
Returns a hash of integer @var{i}.
@end deftypefun

If your key is a single piece of data of an appropriate type, it is
sensible for your hash function to directly return the output of one of
these functions.  For multiple pieces of data, you may wish to combine
the output of more than one call to them using, e.g., the @samp{^}
(exclusive or)
operator.  Finally, you may entirely ignore these functions and write
your own hash function from scratch, but remember that your goal is to
build an operating system kernel, not to design a hash function.

@xref{Hash Auxiliary Data}, for an explanation of @var{aux}.
@end deftp

@deftp {Type} {bool hash_less_func (const struct hash_elem *@var{a}, const struct hash_elem *@var{b}, void *@var{aux})}
Compares the keys stored in elements @var{a} and @var{b}.  Returns
true if @var{a} is less than @var{b}, false if @var{a} is greater than
or equal to @var{b}.

If two elements compare equal, then they must hash to equal values.

@xref{Hash Auxiliary Data}, for an explanation of @var{aux}.
@end deftp

@xref{Hash Table Example}, for hash and comparison function examples.

A few functions accept a pointer to a third kind of
function as an argument:

@deftp {Type} {void hash_action_func (struct hash_elem *@var{element}, void *@var{aux})}
Performs some kind of action, chosen by the caller, on @var{element}.

@xref{Hash Auxiliary Data}, for an explanation of @var{aux}.
@end deftp

@node Basic Hash Functions
@subsection Basic Functions

These functions create, destroy, and inspect hash tables.

@deftypefun bool hash_init (struct hash *@var{hash}, hash_hash_func *@var{hash_func}, hash_less_func *@var{less_func}, void *@var{aux})
Initializes @var{hash} as a hash table with @var{hash_func} as hash
function, @var{less_func} as comparison function, and @var{aux} as
auxiliary data.
Returns true if successful, false on failure.  @func{hash_init} calls
@func{malloc} and fails if memory cannot be allocated.

@xref{Hash Auxiliary Data}, for an explanation of @var{aux}, which is
most often a null pointer.
@end deftypefun

@deftypefun void hash_clear (struct hash *@var{hash}, hash_action_func *@var{action})
Removes all the elements from @var{hash}, which must have been
previously initialized with @func{hash_init}.

If @var{action} is non-null, then it is called once for each element in
the hash table, which gives the caller an opportunity to deallocate any
memory or other resources used by the element.  For example, if the hash
table elements are dynamically allocated using @func{malloc}, then
@var{action} could @func{free} the element.  This is safe because
@func{hash_clear} will not access the memory in a given hash element
after calling @var{action} on it.  However, @var{action} must not call
any function that may modify the hash table, such as @func{hash_insert}
or @func{hash_delete}.
@end deftypefun

@deftypefun void hash_destroy (struct hash *@var{hash}, hash_action_func *@var{action})
If @var{action} is non-null, calls it for each element in the hash, with
the same semantics as a call to @func{hash_clear}.  Then, frees the
memory held by @var{hash}.  Afterward, @var{hash} must not be passed to
any hash table function, absent an intervening call to @func{hash_init}.
@end deftypefun

@deftypefun size_t hash_size (struct hash *@var{hash})
Returns the number of elements currently stored in @var{hash}.
@end deftypefun

@deftypefun bool hash_empty (struct hash *@var{hash})
Returns true if @var{hash} currently contains no elements,
false if @var{hash} contains at least one element.
@end deftypefun

@node Hash Search Functions
@subsection Search Functions

Each of these functions searches a hash table for an element that
compares equal to one provided.  Based on the success of the search,
they perform some action, such as inserting a new element into the hash
table, or simply return the result of the search.

@deftypefun {struct hash_elem *} hash_insert (struct hash *@var{hash}, struct hash_elem *@var{element})
Searches @var{hash} for an element equal to @var{element}.  If none is
found, inserts @var{element} into @var{hash} and returns a null pointer.
If the table already contains an element equal to @var{element}, it is
returned without modifying @var{hash}.
@end deftypefun

@deftypefun {struct hash_elem *} hash_replace (struct hash *@var{hash}, struct hash_elem *@var{element})
Inserts @var{element} into @var{hash}.  Any element equal to
@var{element} already in @var{hash} is removed.  Returns the element
removed, or a null pointer if @var{hash} did not contain an element
equal to @var{element}.

The caller is responsible for deallocating any resources associated with
the returned element, as appropriate.  For example, if the hash table
elements are dynamically allocated using @func{malloc}, then the caller
must @func{free} the element after it is no longer needed.
@end deftypefun

The element passed to the following functions is only used for hashing
and comparison purposes.  It is never actually inserted into the hash
table.  Thus, only key data in the element needs to be initialized, and
other data in the element will not be used.  It often makes sense to
declare an instance of the element type as a local variable, initialize
the key data, and then pass the address of its @struct{hash_elem} to
@func{hash_find} or @func{hash_delete}.  @xref{Hash Table Example}, for
an example.  (Large structures should not be
allocated as local variables.  @xref{struct thread}, for more
information.)

@deftypefun {struct hash_elem *} hash_find (struct hash *@var{hash}, struct hash_elem *@var{element})
Searches @var{hash} for an element equal to @var{element}.  Returns the
element found, if any, or a null pointer otherwise.
@end deftypefun

@deftypefun {struct hash_elem *} hash_delete (struct hash *@var{hash}, struct hash_elem *@var{element})
Searches @var{hash} for an element equal to @var{element}.  If one is
found, it is removed from @var{hash} and returned.  Otherwise, a null
pointer is returned and @var{hash} is unchanged.

The caller is responsible for deallocating any resources associated with
the returned element, as appropriate.  For example, if the hash table
elements are dynamically allocated using @func{malloc}, then the caller
must @func{free} the element after it is no longer needed.
@end deftypefun

@node Hash Iteration Functions
@subsection Iteration Functions

These functions allow iterating through the elements in a hash table.
Two interfaces are supplied.  The first requires writing and supplying a
@var{hash_action_func} to act on each element (@pxref{Hash Data Types}).

@deftypefun void hash_apply (struct hash *@var{hash}, hash_action_func *@var{action})
Calls @var{action} once for each element in @var{hash}, in arbitrary
order.  @var{action} must not call any function that may modify the hash
table, such as @func{hash_insert} or @func{hash_delete}.  @var{action}
must not modify key data in elements, although it may modify any other
data.
@end deftypefun

The second interface is based on an ``iterator'' data type.
Idiomatically, iterators are used as follows:

@example
struct hash_iterator i;

hash_first (&i, h);
while (hash_next (&i))
  @{
    struct foo *f = hash_entry (hash_cur (&i), struct foo, elem);
    @r{@dots{}do something with @i{f}@dots{}}
  @}
@end example

@deftp {Type} {struct hash_iterator}
Represents a position within a hash table.  Calling any function that
may modify a hash table, such as @func{hash_insert} or
@func{hash_delete}, invalidates all iterators within that hash table.

Like @struct{hash} and @struct{hash_elem}, @struct{hash_elem} is opaque.
@end deftp

@deftypefun void hash_first (struct hash_iterator *@var{iterator}, struct hash *@var{hash})
Initializes @var{iterator} to just before the first element in
@var{hash}.
@end deftypefun

@deftypefun {struct hash_elem *} hash_next (struct hash_iterator *@var{iterator})
Advances @var{iterator} to the next element in @var{hash}, and returns
that element.  Returns a null pointer if no elements remain.  After
@func{hash_next} returns null for @var{iterator}, calling it again
yields undefined behavior.
@end deftypefun

@deftypefun {struct hash_elem *} hash_cur (struct hash_iterator *@var{iterator})
Returns the value most recently returned by @func{hash_next} for
@var{iterator}.  Yields undefined behavior after @func{hash_first} has
been called on @var{iterator} but before @func{hash_next} has been
called for the first time.
@end deftypefun

@node Hash Table Example
@subsection Hash Table Example

Suppose you have a structure, called @struct{page}, that you
want to put into a hash table.  First, define @struct{page} to include a
@struct{hash_elem} member:

@example
struct page
  @{
    struct hash_elem hash_elem; /* @r{Hash table element.} */
    void *addr;                 /* @r{Virtual address.} */
    /* @r{@dots{}other members@dots{}} */
  @};
@end example

We write a hash function and a comparison function using @var{addr} as
the key.  A pointer can be hashed based on its bytes, and the @samp{<}
operator works fine for comparing pointers:

@example
/* @r{Returns a hash value for page @var{p}.} */
unsigned
page_hash (const struct hash_elem *p_, void *aux UNUSED)
@{
  const struct page *p = hash_entry (p_, struct page, hash_elem);
  return hash_bytes (&p->addr, sizeof p->addr);
@}

/* @r{Returns true if page @var{a} precedes page @var{b}.} */
bool
page_less (const struct hash_elem *a_, const struct hash_elem *b_,
           void *aux UNUSED)
@{
  const struct page *a = hash_entry (a_, struct page, hash_elem);
  const struct page *b = hash_entry (b_, struct page, hash_elem);

  return a->addr < b->addr;
@}
@end example

@noindent
(The use of @code{UNUSED} in these functions' prototypes suppresses a
warning that @var{aux} is unused.  @xref{Function and Parameter
Attributes}, for information about @code{UNUSED}.  @xref{Hash Auxiliary
Data}, for an explanation of @var{aux}.)

Then, we can create a hash table like this:

@example
struct hash pages;

hash_init (&pages, page_hash, page_less, NULL);
@end example

Now we can manipulate the hash table we've created.  If @code{@var{p}}
is a pointer to a @struct{page}, we can insert it into the hash table
with:

@example
hash_insert (&pages, &p->hash_elem);
@end example

@noindent If there's a chance that @var{pages} might already contain a
page with the same @var{addr}, then we should check @func{hash_insert}'s
return value.

To search for an element in the hash table, use @func{hash_find}.  This
takes a little setup, because @func{hash_find} takes an element to
compare against.  Here's a function that will find and return a page
based on a virtual address, assuming that @var{pages} is defined at file
scope:

@example
/* @r{Returns the page containing the given virtual @var{address},}
   @r{or a null pointer if no such page exists.} */
struct page *
page_lookup (const void *address)
@{
  struct page p;
  struct hash_elem *e;

  p.addr = address;
  e = hash_find (&pages, &p.hash_elem);
  return e != NULL ? hash_entry (e, struct page, hash_elem) : NULL;
@}
@end example

@noindent
@struct{page} is allocated as a local variable here on the assumption
that it is fairly small.  Large structures should not be allocated as
local variables.  @xref{struct thread}, for more information.

A similar function could delete a page by address using
@func{hash_delete}.

@node Hash Auxiliary Data
@subsection Auxiliary Data

In simple cases like the example above, there's no need for the
@var{aux} parameters.  In these cases, just pass a null pointer to
@func{hash_init} for @var{aux} and ignore the values passed to the hash
function and comparison functions.  (You'll get a compiler warning if
you don't use the @var{aux} parameter, but you can turn that off with
the @code{UNUSED} macro, as shown in the example, or you can just ignore
it.)

@var{aux} is useful when you have some property of the data in the
hash table is both constant and needed for hashing or comparison,
but not stored in the data items themselves.  For example, if
the items in a hash table are fixed-length strings, but the items
themselves don't indicate what that fixed length is, you could pass
the length as an @var{aux} parameter.

@node Hash Synchronization
@subsection Synchronization

The hash table does not do any internal synchronization.  It is the
caller's responsibility to synchronize calls to hash table functions.
In general, any number of functions that examine but do not modify the
hash table, such as @func{hash_find} or @func{hash_next}, may execute
simultaneously.  However, these function cannot safely execute at the
same time as any function that may modify a given hash table, such as
@func{hash_insert} or @func{hash_delete}, nor may more than one function
that can modify a given hash table execute safely at once.

It is also the caller's responsibility to synchronize access to data in
hash table elements.  How to synchronize access to this data depends on
how it is designed and organized, as with any other data structure.