Repository logo
 

Coding for emerging archival storage media


Type

Thesis

Change log

Abstract

The race between generating digital data and storing it prompted a search for new media to hold our data for centuries, with fused Silica and DNA in the lead. These media are in a rapid stage of research and development. Error Correcting Codes and coding schemes must be designed for these emerging media’s constraints and noise characteristics, similar to the large body of work on coding for communication applications. Unlike communication standards, digital data storage, primarily archival, can and should capitalise on longer block sizes and more complex coding. Longer blocks have the potential to reduce coding overhead and therefore cost, while longer retrieval latency allows for more complex algorithms. This cycle of noise characterisation and code design for storage media could be made more efficient by automation and generalisation. In this work, we present the use of Reinforcement Learning to construct long Error Correcting Codes. We show that Reinforcement Learning is effective when targeting the end goal of reducing Bit Error Rate rather than proxy metrics used in the state-of-the-art heuristics. In addition, we present a unified approach to handle constraints in coding data into DNA. Together these provide a practical toolbox that would allow a co-design of a storage medium and its accompanying coding scheme. Finally, we show that our toolbox requires little human expert intervention, which facilitates designing coding schemes in lockstep with rapid development.

Description

Date

2022-08-04

Advisors

Moore, Andrew

Keywords

archival data storage, coding, DNA data storage, error correcting code, LDPC

Qualification

Doctor of Philosophy (PhD)

Awarding Institution

University of Cambridge
Sponsorship
This work was supported by Microsoft Research through its PhD Scholarship Programme.