

# Parallel all the time

PLANE LEVEL PARALLELISM EXPLORATION FOR HIGH PERFORMANCE SOLID STATE DRIVES

<u>Congming Gao</u>, Liang Shi, Jason Chun Xue, Cheng Ji, Jun Yang, Youtao Zhang Chongqing University; East China Normal University; City University of Hong Kong; University of Pittsburgh

# Outline

### **Background**

Problem Statement

#### SPD: From Plane to Die Parallelism Exploration

- Overview
- Die Level Write Construction
- Die Level GC

**Experiment Setup** 

Results

Conclusion

### Parallel Organization



### Controller Design



Wear leveling prolongs the flash lifespan

### Advanced Commands



Interleaving Command



### Advanced Commands



### Advanced Commands



# Outline

Background

### **Problem Statement**

SPD: From Plane to Die Parallelism Exploration

- Overview
- Die Level Write Construction
- Die Level GC

**Experiment Setup** 

Results

Conclusion

Caro ).

Due to <u>the restrictions of multi-plane command</u>, plane level parallelism is hard to exploit.

Based on the restrictions of multi-plane command, operations that access the same die can be categorized into one of the following four cases:

Case 1: Operations are issued to <u>one plane only</u> (Single Write );

It can be degraded to Case 1

**Case 3:** Two same type operations with <u>unaligned in-plane addresses</u> are issued to the two planes of the die (Unaligned Writes );

Case 4:Two same type operations with aligned in-plane addresses are issued to the<br/>two planes (Parallel Writes ).

**Case 1, 2 & 3 result in the poor plane level parallelism of SSDs.** 

The percentages of three cases are collected and presented:







Valid pages are moved sequentially due to un-aligned in-plane addresses. Write points in new blocks still are un-aligned

For host writes and GCs,



## Outline

Background

**Problem Statement** 

### **SPD:** From Plane to Die Parallelism Exploration

- Overview
- Die Level Write Construction
- Die Level GC

**Experiment Setup** 

Results

Conclusion



SPD, an SSD from plane to die framework

# Die Level Write Construction

**1.** The <u>amount of data</u> issued to a die should be a multiple of N pages (assuming there are N planes in a die);

2. The *starting locations of data* should be aligned for all the planes in the same die.

SSD buffer evicts a multiple of *N* dirty pages from one die at a time Buffer Supported Die-Write

A plane level dynamic allocation scheme is adopted [Tavakkol et al. 2016]

### Buffer Supported Die-Write



**Die-Write is constructed!!!** 

Organization of write buffer and the die level write construction

### Die Level GC

Traditional GC: 1. Victim block selection; 2. Valid page movement; 3. Victim block erase

Die-GC: Two Goals

Aligning write points of all planes when GCs are activated;
 Reducing the time cost of valid page movement.



# Outline

Background

Problem Statement

#### SPD: From Plane to Die Parallelism Exploration

- Overview
- Die Level Write Construction
- Die Level GC

### **Experiment Setup**

Results

Conclusion

# Experiment Setup

#### Parameters of Simulated SSD

#### Evaluated Workloads

| SSD<br>Configuration | 512GB;16 Channels; 8 Chips/Channel; 1 Die/Chip;<br>2 Planes/Die;2048 Blocks/Plane; 256 Pages/Block;<br>4KB Page; |
|----------------------|------------------------------------------------------------------------------------------------------------------|
| Timing<br>Parameters | 0.075 ms for page read; 1.5 ms for page write; 3.8 ms for block erase; 25 ns for byte transfer.                  |

| Workloads | W/R Ratio <sup>§</sup> | FP§  | R_V§ | W_V§ | R_S§ | W_S§ |
|-----------|------------------------|------|------|------|------|------|
| HM_0      | 67.9%                  | 1.35 | 6.9  | 15.2 | 11.2 | 11.6 |
| PRN_0     | 93.7%                  | 2.93 | 3.0  | 20.5 | 24.8 | 11.6 |
| PRN_1     | 32.1%                  | 5.16 | 31.4 | 10.9 | 24.2 | 11.4 |
| RSR_0     | 90.7%                  | 0.31 | 1.8  | 14.6 | 15.0 | 12.6 |
| STG_0     | 76.9%                  | 0.28 | 7.4  | 9.3  | 33.6 | 12.6 |
| PROJ_0    | 82.9%                  | 1.58 | 7.2  | 56.5 | 21.9 | 35.7 |
| PROJ_3    | 4.89%                  | 1.86 | 21.6 | 2.8  | 11.9 | 29.9 |
| SRC2_0    | 88.6%                  | 0.52 | 1.9  | 13.6 | 12.2 | 11.0 |
| TS_0      | 82.6%                  | 0.57 | 4.9  | 15.9 | 17.5 | 11.8 |
| PRXY_0    | 97.06%                 | 0.17 | 0.27 | 5.8  | 9.6  | 6.2  |
| WDEV_0    | 79.9%                  | 0.34 | 3.2  | 9.2  | 16.5 | 12.1 |

- Buffer Setting:
  - Size: 1/1000 of the footprint of evaluated workloads;
  - Page organization within a die list: LRU

§ W/R Ratio: Write and Read Requests Ratio;

FP: FootPrint (GB);

- R\_V/W\_V: Read/Write Data Volume (GB);
- R\_S/W\_S: Average Read/Write Request Size (KB).

### **Experiment Setup**

#### **Evaluated Schemes:**

**Baseline-D:** Dirty pages are evicted to different dies for exploiting die level parallelism;

• **Baseline-P:** Based on Baseline-D, dirty pages accessing different planes in the same die are evicted at a time;

• **TwinBlk:** <u>Aligning write points of planes in the same die through sending data to different</u> <u>planes in a round-robin policy;</u>

• **ParaGC:** <u>Aligning write points of active blocks in different planes for reducing the time cost</u> of valid page movement during GC process;

Proposed SPD:

# Outline

Background

Problem Statement

#### SPD: From Plane to Die Parallelism Exploration

- Overview
- Die Level Write Construction
- Die Level GC

Experiment Setup

### **Results**

Conclusion

Results without GC—Latency:

#### Read Latency Improvement without GC

|           | Baseline-D | Baseline-P | TwinBlk | ParaGC | SPD    |
|-----------|------------|------------|---------|--------|--------|
| Reduction | 0          | 0.049%     | 0.011   | 0%     | 0.096% |

**Read Latencies of five evaluated schemes are similar.** 

#### **Results without GC—Plane Utilization:**



#### **Results without GC—Buffer Hit Ratio:**



#### Results with GC—Total GC Cost:



2

GC Evaluation—Average GC Cost:



SPD has the minimal GC cost compared with TwinBlk and ParaGC;

The GC cost of SPD is similar to that of Baseline-D and Baseline-P.

#### GC Evaluation—GC Count and GC Induced Erases:



1

2

Sensitive Study—Buffer Size:



With larger buffer size, the write latencies of all schemes can be further reduced;

**Stable write latency reduction is achieved by SPD with different buffer sizes.** 

### Resutls

Sensitive Study—Four Planes:



### Conclusion

Two components are designed in the framework: Die-Write and Die-GC.
Aligning the write points of all planes in the same die all the time.

• The experimental results show that SPD effectively improves write performance of SSDs by <u>48.61%</u> on average without impacting read performance .



# Q & A