Data is normally retrieved from the Oxford Genomics Centre via an FTP link provided to you by your project manager. You can open the FTP link and view the folder in a browser, but we recommend a command-line client for downloading data. Please see the common questions section below for more information.
For each lane/index combination, data is named according to the ID in our tracking database, eg:
WTCHG_123456_123 refers to the sample with index “123” in lane ID “123456”.
For each sample there will be a pair of FASTQ (except for single-read data), also a BAM (which was aligned vs your selected genome) and a BAM index (.bai). Note: The FASTQ is complete sequence, ie not trimmed for adapters etc.
For miRNA and “Small RNA” samples we also deliver a “miRNA” subfolder, containing trimmed FASTQ for Read1, a BAM of trimmed R1 mapped (Bowtie2) vs your selected genome, and a file of counts of annotated miRNA for that genome (if available).
For “RNA-seq”, (if requested) we will identify samples by their LIMS ID (we cannot rely on the sanity of customer-supplied sample names for file naming – sorry!) and deliver a folder within REX/ of alignments for each sample (after trimming and merging if necessary). There will be a README to aid mapping LIMS ID to sample names.
We like “wget”. In it’s simplest form:
“wget -r URL” it’ll download everything (-r: recursive) and report issues.
If you don’t want to save the BAMs, exclude them:
wget -r -R “*bam*” URL
If you’re suffering from broken network connections and/or have to retry, use -c (continue) to pick up where you left off.
Use time stamps (-N): keep the same data/time on your files as the originals
So to retrieve *just* fastq, use wget -Nc -r -R “*bam*” URL
Save the md5sum.txt file and use this: md5sum -c md5sum.txt
You can do clever things with the bash shell to restrict what you check (and avoid complaints about missing files):\
md5sum -c <(grep fastq md5sum.txt) # look up “bash Process Substitution” for more details eg http://tldp.org/LDP/abs/html/process-sub.html
Author: John Broxholme