Read-to-index mis-assignment

Go Back


The Oxford Genomics Centre (OGC) has been generating sequencing data using Illumina’s high-throughput technologies since 2008. Over these years the combined experience within the group and relationship with its end users has enabled the OGC to swiftly identify many issues relating to library preparation and sequencing and adapt procedures accordingly towards effective resolutions.

The assignment of a small proportion sequencing reads to the wrong index is a long-known artefact of multiplexed high-throughput sequencing with Illumina platforms. Although incompletely understood, this is most likely related to the way clusters of DNA molecules are generated and imaged on the flow cell during the sequencing process. Recent advances in patterned flow cell and exclusion-amplification technology have greatly increased the capacity of sequencing runs. However, it is apparent that the occurrence of read to index mis-assignment has also increased. Best-practice approaches to library preparation will impact the severity of this problem, and help avoid complications with downstream data analysis.

Growing concerns developed within the genomics community after Sinha et al posted their preprint on bioRXiv and Illumina published a white paper addressing read-to-index mis-assignment, explaining the primary mechanism (index hopping) responsible for the increased levels observed with the latest platforms.

In most settings, with most sequencing applications (or library types), low-level index mis-assignment is unlikely to significantly affect the interpretation of data. However, for some kinds of sequencing experiment the appearance of sequences from one library in the reads from another sample can lead to serious misinterpretations of the data.

Sinha et al report that the read to index mis-assignment artefact can affect 5 – 10% of sequencing reads, however this is not consistent with observations using standard approaches to library preparation and sequencing. Analysis of sequencing data we generated in 2017 pointed to a read to index mis-assignment potential of up to 1.5% on HiSeq 4000, and far less on HiSeq 2500. We now believe that reads generated on the NovaSeq 6000 are subject to a similar level of index mis-assignment as on the HiSeq 4000. Comments on the same issue have been posted at Enseqlopedia, QC Fail and the UCDAVIS Genome Center. It has also been the subject of an article in WIRED.

To compliment our meticulous attention to library preparation and QC, the OGC quickly implemented a unique double indexing strategy to maximise the potential of the HiSeq 4000 platform while minimising the rate of read-to-index mis-assignment for all projects. This strategy remains in place for use on the NovaSeq 6000. This means that each sample has two index sequences that are each unique within the sequenced pool and that reads will only be assigned to a sample if they are associated with matches to both these indexes.

This methodology was initially tested on small pools and was shown to substantially reduce read-to-index mis-assignment. Thereafter, a dual index methodology was implemented where possible.

Oxford Genomics Centre is committed to bringing together genomics services to support scientists wanting to exploit the latest in high-throughput genomics techniques in their research. Monitoring every aspect of operational performance allows the OGC to continually improve and rapidly adapt to ensure best-practice use of the latest technologies.


Author: Simon Engledow

If you have any questions relating to our protocols, or would like to contact us to discuss your next sequencing project, or any other enquiry please email us.