Coding for emerging archival storage media
Repository URI
Repository DOI
Change log
Authors
Abstract
The race between generating digital data and storing it prompted a search for new media to hold our data for centuries, with fused Silica and DNA in the lead. These media are in a rapid stage of research and development. Error Correcting Codes and coding schemes must be designed for these emerging media’s constraints and noise characteristics, similar to the large body of work on coding for communication applications. Unlike communication standards, digital data storage, primarily archival, can and should capitalise on longer block sizes and more complex coding. Longer blocks have the potential to reduce coding overhead and therefore cost, while longer retrieval latency allows for more complex algorithms. This cycle of noise characterisation and code design for storage media could be made more efficient by automation and generalisation. In this work, we present the use of Reinforcement Learning to construct long Error Correcting Codes. We show that Reinforcement Learning is effective when targeting the end goal of reducing Bit Error Rate rather than proxy metrics used in the state-of-the-art heuristics. In addition, we present a unified approach to handle constraints in coding data into DNA. Together these provide a practical toolbox that would allow a co-design of a storage medium and its accompanying coding scheme. Finally, we show that our toolbox requires little human expert intervention, which facilitates designing coding schemes in lockstep with rapid development.