Data sharing
Globus
We use Globus network to share the data with external collaborators. It allows us to share data from e.g. a specific folder on the Sanger LFS cluster directly with the external world.
The sharing process consists of the following steps:
We share the data with the user’s personal/work email address
The user creates/logs in their Globus account using the sharing email
The user needs to create a personal Globus endpoint either on their Linux laptop / compute cluster or on their Mac laptop or on their Windows laptop.
The user activates their personal Globus endpoint by starting globus from the command line if on a cluster/Linux, or by starting the globus application if on Mac/Windows.
Once the users personal endpoint is setup they can transfer the data by simply logging in to their Globus account using the sharing email address and drag and dropping the data.
For more information please visit the Globus official documentation.
Note
If the user would like to check MD5 hash, the MD5 sum files will be located in the same sharing folder with the data files.
cram files
Sanger default file format for storing NGS data is CRAM
and this is what we provide to the user when share data with them. Typically CRAM
achieves 40-50% space saving over the alternative BAM
format and much more than that over the compressed fastq
files. For more information please visit this page.
Once the user obtained the data from Globus, the data can be converted from CRAM
to fastq
format using the following steps:
Install
samtools
with version >=1.8 (in this casesamtools
should automatically download the right genome reference if your local installation does not have it)Run the following commands (set NCPU to a number of CPUs, if you are on a multi-cpu machine). This will create paired fastq files
samplename_1.fastq.gz
andsamplename_2.fastq.gz
:
samtools collate -O -u -@ NCPU samplename.cram tmppfx | \
samtools fastq -N -F 0x900 -@ NCPU -1 samplename_1.fastq.gz -2 samplename_2.fastq.gz -
If this does not work, you could try running these first:
unset REF_PATH
unset REF_CACHE