Commit 95429a6d authored by hlgr's avatar hlgr

added readme

parent 48114dab
#+TITLE: Interview Assignment
#+DATE:
#+LaTeX_HEADER: \usepackage{fullpage}
* Introduction
The objective of this assignment is to demonstrate an understanding of the
complex interplay between data storage and processing in modern data processing
systems. Depending on the available indices, processing can be implemented using
different algorithms with various optimizations.
You will work with a database of the following schema
#+begin_src sql :exports code
CREATE TABLE Items (salesdate INT, employee INT, price INT);
CREATE TABLE Orders (
salesdate INT,
employee INT,
employeemanagerid INT,
discount INT
);
CREATE TABLE Stores (
managerid INT,
latitude INT,
longitude INT,
countryid INT
);
#+end_src
* Getting started
To get started clone the following url: [[https://gitlab.doc.ic.ac.uk/CO572/co572-coursework-1.git]]
You may want to set up two separate build directories for the code,
one for debugging and one for benchmarking. Here is how you could do
that:
#+begin_src bash :exports code
mkdir Debug
cd Debug
cmake -DCMAKE_BUILD_TYPE=Debug ..
cd ..
mkdir Release
cd Release
cmake -DCMAKE_BUILD_TYPE=Release ..
cd ..
#+end_src
You can compile each by (respectively) typing:
#+begin_src bash :exports code
cmake --build Debug
#+end_src
or
#+begin_src bash :exports code
cmake --build Release
#+end_src
Note that the first time you build each of these will take a long time
since it also builds dependencies.
** Testing
To run the tests (see file ~tests.cpp~), simply run
#+begin_src bash :exports code
./Debug/tests
#+end_src
a successful run output should look like this (pass -? for more options)
#+begin_src bash :exports code
===============================================================================
All tests passed (30 assertions in 3 test cases)
#+end_src
** Benchmarking
To run the microbenchmarks (see file ~microbenchmarks.cpp~), simply run
#+begin_src bash :exports code
./Release/microbenchmarks
#+end_src
a semi-naive implementation (building no indices) would produce output like this
#+begin_src bash :exports code
Running ./Release/microbenchmarks
Run on (4 X 1200 MHz CPU s)
Load Average: 0.31, 0.81, 0.74
-------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------
CreateIndicesBenchmark/1024 2.64 us 2.66 us 262929
CreateIndicesBenchmark/4096 2.65 us 2.66 us 262897
CreateIndicesBenchmark/32768 2.93 us 2.95 us 238952
CreateIndicesBenchmark/262144 2.93 us 2.94 us 237840
CreateIndicesBenchmark/1048576 2.93 us 2.94 us 237703
Query1Benchmark/1024 95.9 us 95.9 us 7192
Query1Benchmark/4096 1010 us 1010 us 704
Query1Benchmark/32768 46032 us 46030 us 15
Query1Benchmark/262144 2038912 us 2038335 us 1
Query1Benchmark/1048576 31529341 us 31528313 us 1
Query2Benchmark/1024 301 us 301 us 2328
Query2Benchmark/4096 5690 us 5690 us 117
Query2Benchmark/32768 120979 us 120959 us 6
Query2Benchmark/262144 5047080 us 5046606 us 1
Query2Benchmark/524288 10067662 us 10067067 us 1
Query3Benchmark/1024 1117 us 1117 us 629
Query3Benchmark/4096 11162 us 11159 us 67
Query3Benchmark/32768 565663 us 565543 us 1
Query3Benchmark/262144 38888371 us 38886797 us 1
#+end_src
a good solution (making use of appropriate indices) produces output
like this (same hardware):
#+begin_src bash :exports code
Running ./Release/microbenchmarks
Run on (4 X 1200 MHz CPU s)
Load Average: 0.24, 0.49, 0.63
-------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------
CreateIndicesBenchmark/1024 1364 us 1364 us 515
CreateIndicesBenchmark/4096 5496 us 5497 us 127
CreateIndicesBenchmark/32768 52534 us 52535 us 13
CreateIndicesBenchmark/262144 504598 us 504595 us 1
CreateIndicesBenchmark/1048576 2237887 us 2237300 us 1
Query1Benchmark/1024 249 us 249 us 2808
Query1Benchmark/4096 1460 us 1460 us 480
Query1Benchmark/32768 11800 us 11796 us 60
Query1Benchmark/262144 99317 us 99310 us 7
Query1Benchmark/1048576 397303 us 397276 us 2
Query2Benchmark/1024 139 us 139 us 5034
Query2Benchmark/4096 2126 us 2126 us 329
Query2Benchmark/32768 17721 us 17720 us 37
Query2Benchmark/262144 22913 us 22913 us 28
Query2Benchmark/524288 28612 us 28611 us 18
Query3Benchmark/1024 643 us 643 us 1090
Query3Benchmark/4096 4174 us 4173 us 168
Query3Benchmark/32768 38987 us 38984 us 18
Query3Benchmark/262144 346434 us 346411 us 2
#+end_src
* Your task
Your task is to implement three queries using the techniques, using
state-of-the-art data processing techniques. You shall also implement one or
more indices to accelerate the queries. Your are free to implement index
structures of your choosing but you need to justify your choice.
The file ~solution.c~ contains stubs for four functions: three of them
need to be filled with the implementation of the queries and one is a
preparation function you can use to build your index.
The objective is to make the "macrobenchmark", (a sequence of operations
simulating the workload of a real data management system) as fast as
possible. You can run the macrobenchmarks yourself:
#+begin_src bash :exports code
./Release/macrobenchmark
#+end_src
. You will find the implementation in ~macrobenchmark.cpp~
** Q1
#+begin_src sql :exports code
SELECT
COUNT(*)
FROM
Items,
Orders
WHERE
Items.price < $1
AND Orders.employeeManagerID = $2
AND Items.salesDate = Orders.salesDate
AND Items.employee = Orders.employee
#+end_src
** Q2:
#+begin_src sql :exports code
SELECT
COUNT(*)
FROM
Items,
Orders
WHERE
Orders.discount = $1
AND Items.salesDate <= Orders.salesDate
AND Orders.salesDate <= Items.salesDate + $2
#+end_src
* Q3:
#+begin_src sql :exports code
SELECT
COUNT(*)
FROM
Items,
Orders,
Stores
WHERE
Stores.managerID = Orders.employeeManagerID
AND Items.salesDate = Orders.salesDate
AND Items.employee = Orders.employee
and store.countryid = $1;
#+end_src
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment