

#### SoftWrAP: A Lightweight Framework for Transactional Support of Storage Class Memory

Ellis Giles Rice University Houston, Texas erg@rice.edu Kshitij DoshiPeter VaIntel Corp.Rice UiPortland, ORHoustorkshitij.a.doshi@intel.compjv@rice

Peter Varman Rice University Houston, Texas pjv@rice.edu

Massive Storage Systems and Technologies Conference, 2015



Motivation

- In Memory Databases (IMDB) & Caches
  - Main Memory Databases MMDB
  - solidDB, Oracle TimesTen, CSQL, Memcached
  - Fast Access Speeds
  - Lack Durability (ACID) or must log to disk with long recovery times
- New Graph Databases, Neo4j, Graph 500
- Ideally, with byte-addressable persistent storage, applications could operate at inmemory access speeds without having to log to disk.



# Storage Technology



Storage Class Memory

- Fast
- Byte Addressable
- Non-Volatile
- Examples:
  - PCM
  - **ST-MRAM**

Storage

- Non-Volatile
- Slow
- **Block-Based**

Volatile ReRAM

\*Figure Adapted from: M. K. Qureshi, V. Srinivasa, and J. A. Rivers, "Scalable high performance main memory" system using phase-change memory technology," ISCA'09.

#### Memory

- Fast
- Byte Addressable
- DRAM Refreshed

#### RICE **Storage Class Memory**



#### Storage Class Memory (SCM):

- **Byte-Addressable**
- Persistent
- **High-Speed**
- Will sit alongside DRAM
- Near DRAM Speed
- Higher Density
- 1+ TB on the Main Memory Bus
- Can Replace Disks
- Gives Rise To New Applications (no longer slow, block based persistence)
- Sounds Great!





Durability of writes (even a single write) are not guaranteed.

A=1 is only in cache hierarchy and not in memory.

PERSISTENT MEMORY



PERSISTENT

MEMORY



Durability of writes are not guaranteed. A=1 is now only in a store buffer and not memory. Unfortunate side effect of also invalidating cache entry.













Another problem, even before Flush & Commit, random cache evictions can leave persistent memory in an inconsistent state.

PERSISTENT MEMORY



Undo Log





### Atomic Writes to SCM



# SoftWrAP Approach



RICE

Software-based Write-Aside Persistence

- Aliasing catches cache evictions.
- Fast path through cache hierarchy
- Re-Do Log for atomicity

```
Transaction
wrapOpen();
wrapStore(a, 5);
....
c = wrapLoad(b);
wrapClose();
```



# SoftWrAP Approach

Aliasing & Redo Log wrapOpen() CPU Fast foreground CPU Creates Log path through the Streaming Stores cache hierarchy wrapStore(x, val) using alias location. Streams location x and L1 L1 Asynchronous value to log conduit to persistent Writes x to Alias Table COHERENT WrAP memory log using CACHE Software streaming stores. wrapLoad(x) HIERARCHY mm stream si32 Load x from alias table MOVNTI or SCM if not present **Bypass cache** Approach decouples wrapClose() concurrency control WC Buffer MEM CONTROLLER Close log & PCOMMIT from persistence. // Can process table. Alias Log DRAM **SCM** Table



#### SoftWrAP Architecture





#### **SCM Emulation**

- Simulation:
  - Provided only In-Order execution.
  - Single instruction issue
  - Results depend on model of the cache and memory subsystem
  - Multi-threading scheduling
- SoftWrAP Could benefit from Out-Of-Order execution.
- Tested SoftWrAP on HW DRAM Interposer/Tracer at Intel.
- Analysis showed additional features: DRAM based provided better speedup, pre-fetching, etc..
- Validated writes proceeding to memory.

SCM Emulation:

- Streaming stores go into a software emulated write buffer and to DRAM.
- Running software on HW and DRAM

Tunable SCM Model:





# **Global Alias Table**

Hash Table A State: Retiring

| Address     | Value | Size |   | Addres |  |  |  |
|-------------|-------|------|---|--------|--|--|--|
|             |       |      | Ļ | W      |  |  |  |
|             |       |      |   | Х      |  |  |  |
| Μ           | 1     | 8    |   | Υ      |  |  |  |
| Z           | 3     | 8    |   | Z      |  |  |  |
| N           |       | 1024 |   | А      |  |  |  |
|             |       |      |   |        |  |  |  |
| Data Object |       |      |   |        |  |  |  |
| Full        |       |      |   |        |  |  |  |

| ŀ | Hash(W) Hash Table B<br>State: Active |       |      |  |  |  |  |
|---|---------------------------------------|-------|------|--|--|--|--|
|   | Address                               | Value | Size |  |  |  |  |
| > | W                                     | 5     | 4    |  |  |  |  |
|   | Х                                     | -1    | 4    |  |  |  |  |
|   | Υ                                     | 7     | 4    |  |  |  |  |
|   | Z                                     | 8     | 8    |  |  |  |  |
|   | А                                     | 5     | 4    |  |  |  |  |
|   |                                       |       |      |  |  |  |  |

e

Closed

Empty

Retiring

- Double Buffered
   (2 Hash Tables)
- Concurrent Retirement
- Supports primitives and object types
- Reads check both tables (if non-empty)
- 5 States for the Alias Table.
- Locks on state change for retirement and open/closeWrap



## **Performance Model**

- Tw is SCM write time and Talias is hash or alias time
- Ts is overhead to perform persistent memory sync (CPUID) and PCOMMIT
- One log entry for a 4-byte integer requires 4-bytes + 8-byte address = 12 bytes
- Write combining cache lines of 64 bytes of contiguous log entries and 1 write for log management.

| - ( | Commit a group | of <i>n</i> writes to SCM<br>SCM Writes | :<br>pcommits | Estimated Time                                                            |
|-----|----------------|-----------------------------------------|---------------|---------------------------------------------------------------------------|
|     | Non-Atomic     | n                                       | 1             | $nT_w + T_s$                                                              |
|     | UndoLog        | 2n + 1 + [12n/64]                       | n + 1         | $\begin{array}{c}n(2T_w+T_s)+\\ \lceil 12n/64+1\rceil T_w+T_s\end{array}$ |
|     | SoftWrAP       | $1+ \lceil 12n/64 \rceil$               | 1             | $T_w + max(n * T_{alias}, [12n/64]T_w) + T_s$                             |



#### **Response Time**



Large array, groups of 10 stores to random locations in the array, varying arrival rate of incoming transactions. Processing of alias table, size 16k, in background.



#### Alias Table Analysis



Maximum throughput (wraps per second) for various of alias table sizes in double buffered implementation.



#### **Transaction Size**



Block Copy: - All N items in 1k cont. block - Data structure allows for a pointer flip to new block.

Response time for various transaction sizes with arrival rate of 1,000 wraps per second, alias table size of max 8k entries, and SCM Tw=1µs.



Data Reuse



Maximum throughput for various percentage of data reuse across transactions with size n=10, alias table size of max 8k entries, and SCM Tw=1 $\mu$ s



- Created two VFS. One native to SCM and one using SoftWrAP (can also model Undo-Log, Non-Atomic).
- A: SCM VFS uses SCM journal (r/w bytes)
- B: SoftWrAP handles consistency



- TPC-C is an Online Transaction Processing Benchmark. Comprised of 9 tables and a number of transactions
- PY-TPCC is modified and executed to save SQL statements for TPC-C benchmark to file.
- SQLite is executed with VFS under test and generated TPC-C SQL statement file.



#### SQLite SCM TPC-C



Throughput in Transactions Per Second for the TPC-C Benchmark with SQLite. SoftWrAP has similar performance to Non-Atomic.

# RICE Conclusions and Forward Work

- Looking at additional aliasing mechanisms and enhancements such as compiler integrations.
- Evaluation on hardware when available.
- SoftWrAP is a fast, straightforward approach to ensuring transactional support for writing byte-addressed persistent data without any hardware changes.
- It provides a fast path through the cache hierarchy while utilizing a background path to persist groups of stores to SCM atomically.
- SoftWrAP decouples concurrency control from persistence.
- Being released as open source software.
- SoftWrAP has promising results that approach the performance of persistence methods that don't guarantee consistency and outperform Undo-Log approaches.



#### **Questions?**

#### Thank You!

Supported by NSF Grant CCF 1439075 and Intel SSG.