Memory hierarchy

    The following illustration shows how the memory is structured. At the top we have the CPU registers. Traversing down we go further away from the CPU and the access times get slower.

    CPU registers
    |
    L1 cache
    |
    L2 cache
    |
    Main memory
    |
    Hard Drive

    Here is table showing the access times for each type of memory.

    Access times (ns) Type
    1-2 CPU registers
    3-10 L1 cache
    25-50 L2 cache
    30-90 Main memory
    5-20 Hard drive

    Parallel programming models

    Multi-threaded programming (MT)

    Message Passing Interface (MPI)

    Map-reduce (MR)

    • distributed hash table

    Spark (SP)

    Resilient Distributed Datasets (RDDs)

    • lineage

    Hadoop

    Hadoop Distributed File System (HDFS)

    RAID

    https://en.wikipedia.org/wiki/RAID

    Redundant array of inexpensive disks (RAID) is a collection of methods to combine multiple disks into one or more logical units for the purpose of performance or redundancy.

    Bloom filters

    Bloom filters are a probabilistic data structure for membership queries. There are two operations: inserting an item and querying an item . If is present in the bloom filter, the query will always be answered correctly. However, there is a probability of that a query might be positive even though an item is not present in the bloom filter. Thus, there exists no false negatives, but false positive do occur. Each item has different hash functions. They are used to fill in the bits in the bloom filter when inserting item , and used to query item .

    0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0

    If we assume uniform hashing and that items have been inserted, then the probability that positions are set for an item when inserting item is the follow

    Setting to

    will minimize the error probability. The following properties hold for optimal bloom filters:

    Bloom filters can be multi-threaded with some care. Counting items where many items only have a frequency of 1 could be done in the following way with bloom filters

    from pybloom import BloomFilter
    from collections import defaultdict
    
    f = BloomFilter(capacity=len(X),error_rate=p)
    d = defaultdict(int)
    
    for x in X:
        if x in f:
            d[x] += 1
        else:
            f.add(x)
    
    for x in d.keys():
        if (d[x] + 1) > tau:
            print(x, d[x])

    Cache behavior

    Tries

    Optimization ideas

    If the current solution is too slow we might try out some general ideas to make it faster:

    1. Change the problem
    2. Refactor

      • Generally if we are using python, the shorter the program is the faster it is. Try to simplify and remove uneccesary code.
      • Use NumPy whenever you can
    3. Compile

      • Cython
      • Numba
    4. Optimize

      • Measure running time and identify hot spots better data structures. Constants may matter.
      • Think about the memory hierarchy. Often the theoretical computational complexity is misleading in the computers today.
    5. Parallelize

      • based on structure of workload: data flow, computational effort, hardware, ...
      • Measure the serial and parallel parts if that is possible. Use Amdahl's to calculate the theoretical speedup.
      • Make use of SIMD instructions.
      • Choose suitable multiprogramming paradigm:

        • Multi-threading
        • Message passing
        • Map-reduce
        • Spark
        • ...
      • Use specialized frameworks that suits the problem in question.
      • Avoid oversubscription
      • Scale the solution by

        • Dedicated hardware: GPU, TPU
        • Clusters
        • Clouds

    If the data is too big you need to parallelize the solution. Move the data to e.g.:

    1. RAID with SSD
    2. Cluster with local disks and HDFS