We prepare and sequence a high volume of libraries specifically tailored for transcriptional data analysis. We’ve tested and compared the leading kits on the market, and with a mind for consistency we have successfully automated much of the workflows. Through our unique understanding we’re able to interpret the stepwise QC metrics throughout the process and know exactly which factors will influence the quality of the resulting sequencing data.
The most common form of library preparation begins with the enrichment for mature poly-A transcripts by annealing to oligo-(dT) magnetic beads. The benefit with this approach is that the prepared libraries, and therefore data, predominantly relate to protein coding transcripts. This is ideal for comparative gene expression analysis, and makes good use of the capacity of each sequencing lane by focusing on the full length, mature poly-A transcripts alone.
For poly-A libraries the quality of the data is particularly influenced by the integrity of the purified total RNA.
Considering that the poly-A signal is located at the 3’ end of the transcripts, enrichment from material displaying higher levels of degradation (fragmented; lower RIN values) will result in data bias toward the 3’ region.
Guidelines for purified RNA QC can be found here. Poly-A enrichment is only suitable for eukaryotic species that use polyadenylation as a signal for translocation/translation (not bacteria). For the Human transcriptome we normally recommend 25 million raw reads per sample for polyA libraries.
Aside from species compatibility there are good reasons why you would choose not to enrich for poly-A prior to full-length transcript library preparation. One of these may be that you have some level of degradation present (FFPE derived RNA for example or even fresh frozen biopsies). Ribosomal RNA (rRNA) can make up ~80% of purified total RNA, so depleting for these using specifically designed probes is another way to focus the sequencing to analytically relevant transcripts. Enrichment for poly-A or depletion for rRNA transcripts is followed by fragmentation, random primed reverse transcription and cDNA generation. Therefore, degraded RNA passed through the rRNA depletion workflow will improve the data coverage across the length of the transcripts. Of course, depending on severity, there are some limitations to data quality when working with degraded RNA.
Another reason to choose rRNA depletion would be to include the information for non-polyadenylated transcripts. These may include all nascent pre-mRNA (unspliced), and a number of other functionally relevant long non-coding transcripts.
A major consideration when choosing to deplete for rRNA is that, due to the additional RNA species, you may need up to twice as much sequencing depth to reach the same level of detection sensitivity for expression analysis.
For example ~25 million raw reads per sample for poly-A libraries would mean ~50 million reads per sample for those depleted for rRNA.
Probes used to deplete for rRNA can also be combined with those that target other highly abundant transcript such as globin mRNA, a particularly important consideration when working with total RNA extracted from blood samples.
The two library types described above are the most popular approaches to analyse the full-length transcriptome of bulk material. Currently, the kit we use is suitable for total RNA inputs between 1 ug and 100 ng. The protocol also ensures the retention of directional information, meaning the end-specific adapters for priming the sequencing reactions enable the alignment of transcript RNA data to the originating DNA strand.
For lower input material, typically less than 10 ng, we employ the popular SMARTer amplification technology. This technique is suitable down to a single cell.
Basically, the 3’ poly-A signal is used to prime a reverse transcription combined to a 5’-mediated template extension. A uniform amplification reaction uses these first strands to effectively increase the double stranded cDNA levels suitable for downstream library preparation. Using our current SMARTer inspired workflows these libraries do not retain stranded information. As with our standard poly-A approach the integrity of the purified transcripts is an important factor in generating quality sequencing data. For SMARTer libraries from bulk material (not single cell) we also recommend 25 million raw reads per sample.
The major classes of small RNA are 21-25 nt in length. To generate good quality sequencing data of small RNA it is necessary to employ a separate protocol. This is often performed using the same purified total RNA in parallel to standard poly-A library prep.It is a common misconception that rRNA depletion will lead to the generation of sequencing data that will include information of small RNA transcripts (miRNA and the like). Although these species will be carried forward to subsequent steps in the protocol, their size range does not lend itself to efficient reverse transcription, amplification or recovery. Library insert sizes (corresponding to the sequenced transcript fragment) for the standard RNA prep average ~210 bp.
Our small RNA workflow begins with single stranded adapter ligation to the 3’ and 5’ ends using T4 RNA ligase, followed by 1st strand synthesis and then indexed amplification.
The final libraries are purified using an automated gel separation and size selection step, and then sequenced using a short, single read platform. Library success and sequence data quality will be affected by degraded material. This is because ligase efficiency is dependent on available 5’-P and 3’-OH groups, characteristic of the processing pathway for miRNA. However, these end groups are also present in degrading RNA. Following sequencing, due to the small library insert size, it is necessary to trim the data for all associated adapter sequences. This is performed automatically through our standard small RNA data processing pipeline. Due to the lack of randomly fragmented transcripts, and the limited number of unique small RNA sequences (highly dependent on organism), the complexity of a small RNA library is low. The sequencing depth required will depend on the investigation, but generally 10 million raw reads per sample is sufficient.
Relative gene expression analysis is based on ‘counting’ sequencing read alignments to annotated features of a reference genome, while compensating for the different transcript sizes. Good quality sequencing reads can be mapped to the full length of the reference feature and thus can also provide information for variant splicing events from the same gene. Another approach for expression analysis is to limit the sequenced read to the 3’ end of the transcript. The ‘counting’ is then based on the same region of each gene.
Our current protocol for 3’ libraries utilises the poly-A tail as an adapter-incorporating priming site for 1st strand synthesis, followed by RNA template removal. Stand specific sequencing data is ensured due to the adapter sequence of the primed 2nd strand. This way read 1 of the sequencing reaction corresponds to the 3’ region of the mRNA transcript sequence.
The main benefit with an approach limiting library inserts corresponding to the 3’ region is that the depth of sequencing required to achieve the same level of detection sensitivity as full-length transcript libraries is much reduced. Using the 3’ region also means that some level of degradation can be tolerated (FFPE for example). For 3’ RNA sequencing we would recommend 5 million reads per sample. Only single read data is used in the analysis.
Author: Simon Engledow