Computational Techniques

Memory hierarchy

The following illustration shows how the memory is structured. At the top we have the CPU registers. Traversing down we go further away from the CPU and the access times get slower.

CPU registers

L1 cache

L2 cache

Main memory

Hard Drive

Here is table showing the access times for each type of memory.

Access times (ns)	Type
1-2	CPU registers
3-10	L1 cache
25-50	L2 cache
30-90	Main memory
5-20 $(1 0^{6})$	Hard drive

Parallel programming models

Multi-threaded programming (MT)

Message Passing Interface (MPI)

Map-reduce (MR)

distributed hash table

Spark (SP)

Resilient Distributed Datasets (RDDs)

lineage

Hadoop

Hadoop Distributed File System (HDFS)

RAID

https://en.wikipedia.org/wiki/RAID

Redundant array of inexpensive disks (RAID) is a collection of methods to combine multiple disks into one or more logical units for the purpose of performance or redundancy.

Bloom filters

Bloom filters are a probabilistic data structure for membership queries. There are two operations: inserting an item $x$ and querying an item $x$ . If $x$ is present in the bloom filter, the query will always be answered correctly. However, there is a probability of $p$ that a query might be positive even though an item $y$ is not present in the bloom filter. Thus, there exists no false negatives, but false positive do occur. Each item $x$ has $k$ different hash functions. They are used to fill in the bits in the bloom filter when inserting item $x$ , and used to query item $x$ .

0	0	1	0	0	0	0	1	0	1	0	0	0	0	0	0	1	0	0	0	0	0

If we assume uniform hashing and that $n$ items have been inserted, then the probability that $k$ positions are set for an item $x$ when inserting item $x$ is the follow

Pr (k positions set for x) = (1 - (1 - \frac{1}{∣ b ∣})^{k n})^{k} \approx (1 - e^{- \frac{k n}{∣ b ∣}})^{k}

Setting $k$ to

k = ln 2 \cdot \frac{∣ b ∣}{n}

will minimize the error probability. The following properties hold for optimal bloom filters:

p ∣ b ∣ k = (0.6185)^{\frac{∣ b ∣}{n}} = - \frac{n ln ( p )}{( ln 2 ) ^{2}} = - \frac{ln ( p )}{ln 2}

Bloom filters can be multi-threaded with some care. Counting items where many items only have a frequency of 1 could be done in the following way with bloom filters

from pybloom import BloomFilter
from collections import defaultdict

f = BloomFilter(capacity=len(X),error_rate=p)
d = defaultdict(int)

for x in X:
    if x in f:
        d[x] += 1
    else:
        f.add(x)

for x in d.keys():
    if (d[x] + 1) > tau:
        print(x, d[x])

Cache behavior

Tries

Optimization ideas

If the current solution is too slow we might try out some general ideas to make it faster:

Change the problem
Refactor
- Generally if we are using python, the shorter the program is the faster it is. Try to simplify and remove uneccesary code.
- Use NumPy whenever you can
Compile
- Cython
- Numba
Optimize
- Measure running time and identify hot spots better data structures. Constants may matter.
- Think about the memory hierarchy. Often the theoretical computational complexity is misleading in the computers today.
Parallelize
- based on structure of workload: data flow, computational effort, hardware, ...
- Measure the serial and parallel parts if that is possible. Use Amdahl's to calculate the theoretical speedup.
- Make use of SIMD instructions.
- Choose suitable multiprogramming paradigm:
  - Multi-threading
  - Message passing
  - Map-reduce
  - Spark
  - ...
- Use specialized frameworks that suits the problem in question.
- Avoid oversubscription
- Scale the solution by
  - Dedicated hardware: GPU, TPU
  - Clusters
  - Clouds

If the data is too big you need to parallelize the solution. Move the data to e.g.:

RAID with SSD
Cluster with local disks and HDFS