Jupyter Hub

We support a Jupyter Hub server running on Sanger Cloud. Jupyter allows you to run your analysis in multiple environments (R, python, Julia, etc.) and also to create and share notebooks containing your analysis, code, equations and visualizations. We think this is an ideal environment for any kind of downstream analysis. For more details please refer to Jupyter Hub documentation.

How to get access

Our Jupyter Hub service is available in a web browser on any computer anywhere in the world. You will need to provide us with your GitHub ID to be able to login. Once we notify you that your account is created you can login using your GitHub credentials.

Note

Only Sanger employees and their collaborators are eligible for access. To get access email cellgeni [at] sanger.ac.uk and provide your GitHub username and Sanger Faculty.

Internal JupyterHub

Central IT now have their own JupyterHub that runs directly on the FARM. It requires a VPN connection to access. It is available here.

Resources

Our cluster allows for single user Jupyter instances to be spawned with up to 400GB RAM and 30 CPUs, but this depends on our cluster’s load at that time. By default we provide users with 20GB of RAM and a dynamic limit of 10 CPUs. We guarantee each jupyter will have at least 1 core and they can use up to 10 if they need to without changing their resource configuration.

If you require more resources, you’ll have to Restart your instance and request for more. Un-check the default resources and input what you need, if there are enough resources on the cluster they’ll be reserved for your Jupyter.

Jupyter’s storage is 100GB. Try no to keep big files on jupyter and instead read them from the farm. Only /home/jovyan is persistent, anything outside your home folder will be lost when your session is terminated.

Please have in mind the following considerations:

  • Only input custom CPU resources if you require more than 10 CPUs or if you’re performing a benchmark and you need exactly a specific number of CPUs.

  • After you’ve finished your analysis with custom resources, please Restart your instance and go back to the defaults to free those resources and allow other users to access them.

  • If you’re only going to be checking on your farm jobs please use custom 8GB RAM (default CPU is just fine for this).

  • Jupyter’s cull time is 24 hours, if you have not access your jupyter at least once in the past 24 hours your session will be terminated.

We provide open usage metrics of our Jupyter cluster using Graphana Dashboard.

Quick Start Guide

JupyterHub website is public, so you don’t need to turn on VPN to use it. However, it is only available to users who messaged us their GitHub usernames and have been whitelisted.

  1. In your browser go to https://jhub.cellgeni.sanger.ac.uk

  2. Use your GitHub credentials for authentication. It may take some time to load first time.

  3. Select your CPU number, RAM number and Image you would like to spawn your instance with (or use the defaults).

  4. Now you are ready to run your notebooks!

  5. RStudio is also available on JupyterHub. A new R session can be started from the Launcher or change the word lab in your address bar to the word rstudio: https://jhub.cellgeni.sanger.ac.uk/user/<your-username>/rstudio

  6. You can switch to a classic Jupyter interface by change the word lab in your address bar to the word tree: https://jhub.cellgeni.sanger.ac.uk/user/<your-username>/tree

Warning

JupyterHub environment and storage are not backed up. Please only use for computations and download your results (and notebooks) afterwards. You’ve been warned!

Warning

Keep your notebooks light. Notebooks over 100MB will give you unexpected errors.

Notebook templates

We provide some notebook templates with the pre-installed software. These are located in the notebooks folder in Jupyter or in our Notebook GitHub repositry. Corresponding example data is located in the data folder.

We recommend that before running your analysis, you make a copy of a notebook template, save it to your home folder and work with the copy.

Read more about our notebooks in the Notebook section

Installing packages

Conda

The default conda environment is not persistent across Jupyter sessions - you can install additional packages, but it will not be there next time you start Jupyter. To have a persistent conda environment create one inside /home/jovyan/ folder (if you’ve already got a conda environment activated jump to step 4):

  1. Open a new terminal (click on the Terminal icon in the Launcher)

  2. Create the environment and activate it (replace myenv with your environment name):

conda create --name myenv python=3.8
conda activate myenv
  1. Install ipython kernel to use as a python kernel inside your jupyter environment, --display-name is optional, if not provided the conda environment name will be used:

python -m ipykernel install --user --name myenv --display-name "Python (MyEnv)"
  1. Install all the packages you need, for example:

conda install numpy pandas matplotlib scipy scikit-learn
  1. Reload the main page. Now you will see your new environment in the Launcher. If you don’t see it at first, try restarting your instance.

Alternative

Instead of creating a new environment, you can also clone an existing one this will eliminate the need to install repeated packages:

conda create --clone old_name --name new_name

pip

pip defaults to installing Python packages to a system directory, to make sure your packages persist they need to be installed in your home directory use the --user option to do this or install them inside an active conda environment.

R

Packages can be installed with the install.packages() function in an RStudio console:

install.packages("packageName")

or multiple packages at once:

install.packages(c("packageOne", "packageTwo", "packageThree"))

From a terminal RScript can be used to install packages (don’t install packages as sudo):

Rscript -e 'install.packages("packageName")'

Warning

Try not to mix conda r-* packages with R CRAN packages. For example, if you’ve installed your own R using conda like this conda install r-recommended r-irkernel, install packages using conda conda install r-hdf5r instead of install.packages("hdf5r").

Kernels

Kernels are programming language specific processes that run independently and interact with Jupyter and their user interfaces. Kernels can be changed using the Kernel > Changer kernel menu.

Python Kernel

When the kernel list is located outside your home directory it can be reseted. If that happens, run this one-line command from your terminal to add every conda environment on your profile to the kernel list.

pip install -U ipykernel; ENVS=$(conda info --envs | grep '^\w' | cut -d' ' -f1); for env in $ENVS; do source activate $env; python -m ipykernel install --user --name $env; echo "$env"; conda deactivate; done

R Kernel

If you want to run R code straight from JupyterLab without using RStudio you can use the R kernel. If you don’t see it on the select list, you need to install the iRkernel package. Install the package and the spec:

install.packages('IRkernel')
IRkernel::installspec()

Managing your data

Note

Any data outside /home/jovyan will be lost when the environment is restarted. Make sure you keep the files you don’t want to lose somewhere inside the home folder.

Upload using GUI

You can copy files to and from Jupyter directly in a web interface (Menu and a button in the interface).

Copying data to/from other hosts

You can also copy data to/from other hosts, like the farm, using a terminal (click on the Terminal icon in the Launcher).

Using rsync

Copy from the farm to the local environment:

rsync -avzh USER@farm5-login:/nfs/users/nfs_u/USER/<some-path>/ farm/

Copy from the local environment to the farm:

rsync -avzh <some-path> USER@farm5-login:/nfs/users/nfs_u/USER/

Using scp

Copy from the farm to the local environment:

scp -r USER@farm5-login:/nfs/users/nfs_u/USER/<some-path>/ farm/

Copy from the local environment to the farm:

scp -r farm/ USER@farm5-login:/nfs/users/nfs_u/USER/<some-path>/

Mounting the farm on jupyter (sshfs)

To mount the farm’s base paths (/nfs, /lustre and /warehouse) on your jupyter instance:

  1. Open a new terminal on your Jupyter.

  2. Type mount-farm, then press Enter.

  3. When prompted for your username and password input them.

The three folders will be mounted on the root folder of your instance. Try opening a new terminal and change directory to your farm home cd /nfs/users/nfs_u/USER or your team’s lustre cd /lustre/scratch11X/team999 and then type ls to see the files. You can use the same paths in your notebooks.

Note

You will not see these folders in Jupyter’s File Browser because it only shows /home/jovyan, if you really want to see them on your File Browser you need to create symlinks from the mounted folders to your home folder. For example: ln -s /nfs /home/jovyan/nfs

Warning

Mounting folders with many files/folders inside them may affect Jupyter. We recommend to only link particular folders and not the whole mounting point.

Mounting other NFS storages

  1. Create a folder where to mount the share: mkdir -p ~/home/jovyan/shared

  2. Create credentials file /jovyan/.nfs-credentials:

username=YOUR_USER
password=YOUR_PASSWORD
domain=sanger
  1. Mount the storage:

sudo mount.cifs //network/path/to/share/ /home/jovyan/shared -o rw,file_mode=0777,dir_mode=0777,credentials=/jovyan/.nfs-credentials

Downloading data

By default, JupyterHub does not provide an ability to download folders, but you can create an archive:

tar cvfz <some-archive-name.tar> <target-directory>/

and download the resulting file with the right click Download option.

Exporting notebooks

Export as PDF

To export a notebook as PDF, install the following pre-requisite software:

sudo apt update && sudo apt-get install -y texlive-generic-recommended texlive-generic-recommended

Now you can export a notebook through File > Export notebook as... menu.

Knit to PDF

To export an Rnotebook as PDF, install the following pre-requisite software:

wget -qO- "https://yihui.org/gh/tinytex/tools/install-unx.sh" | bash

If that it is not enough, the easiest way is to install the whole texlive package, the downside is that it is 4.5G:

sudo apt update && sudo apt-get install -y texlive-full

Sharing notebooks

  1. Go to your API Tokens page or go to hub/home and then click “Token” on the top menu.

  2. Type in a note like “Shared with collaborator X”

  3. Click the orange button “Request new API token”

  4. Copy the token that shows up under “Your new API Token”. (i.e. ba5eba11b01dfaceca55e77ecacaca11)

  5. Go to your jupyter instance, but using the “tree” view instead of the “lab” view: https://jhub.cellgeni.sanger.ac.uk/user/<your username>/tree

  6. Find your notebook and open it. You should be on a link that looks like: https://jhub.cellgeni.sanger.ac.uk/user/<your username>/notebooks/some_notebook.ipynb

  7. Add this to the end of the link: ?token=<your API token> and copy that link. (i.e.: ?token=ba5eba11b01dfaceca55e77ecacaca11)

  8. Share what you have copied. It should be something like: https://jhub.cellgeni.sanger.ac.uk/user/<your username>/notebooks/some_notebook.ipynb?token=<your API token>

  9. Once you have finished the collaboration. Go to your API Tokens page and click “Revoke” to delete that access token.

iRODS

iRODS support is provided using a wrapper script and a singularity image already copied to your home profile. Before start using iRODS, you’ll need to copy your environment file from the farm to your jupyter. Open a Terminal and please follow this steps:

  1. Use mount-farm and input your credentials when prompted.

  2. Copy irods_environment.json from your home directory on the farm to your Jupyter instance:

cp /nfs/users/nfs_u/USER/.irods/* ~/.irods/
  1. Run irods iinit, it will ask for your PAM password (Sanger password, same as the one you use for the farm).

  2. Run all icommands avaiable using irods <icommand_name>. For example: irods ils or irods ihelp.

Note

“irods iinit” also asked for iRODS password? Go to the farm and type: head -1 ~/.irods/irods_password, the output is your password.

Warning

These instructions assume you already have an iRODS account setup on the farm, if you don’t please contact ServiceDesk.

Running containers

The jupyter environment includes Singularity, a container platform that allows creating and running tools in a portable and reproducible way. You can build a container using Singularity on your Jupyter instance, and then run it the farm. Your container is a single file, and you don’t have to worry about how to install all the software you need on each different operating system. Read more about building and running singularity containers on the official docs.

Building containers

You can build Singularity recipes in Jupyter, but for a better and more portable experience, we recommend using Dockerfiles instead. A Dockerfile is a script (a text document) that contains all the commands a user could call on the command line to assemble an image. Docker images are widely available and adopted as best practices everywhere. You can host your Docker images in Docker repositories like https://hub.docker.com or https://quay.io

However, the downside is that Docker requires sudo permissions to execute and interact with the Docker daemon that builds the images and starts/stops the containers. We’ve set up a service that allows you to log in with your Sanger credentials and enables you to build Docker images that can then be turned into Singularity images and copied over to the FARM to do your work.

The basic flow of commands looks like this:

# connect to the server
ssh USER@docker.cellgeni.sanger.ac.uk

# create a project folder for you to work on
mkdir myproject
cd myproject

# create appropiate Dockerfile
# and then build it into an image
docker build --tag image:tag .

# once you've got the image locaally on the docker ademon
# convert it to singularity
singularity build image_name.sif docker-daemon://image:tag

# copy the image off the machine into the farm
scp image_name.sif USER@farm5.internal.sanger.ac.uk:/path/on/the/farm

Troubleshooting

Restart your instance

Sometimes, a server restart might solve an issue. For that:

  1. Go to the menu “File” > “Hub Control Panel” or browse to your Hub Home

  2. Click Stop My Server

  3. Wait 2 minutes and reload the page.

  4. Access https://jhub.cellgeni.sanger.ac.uk/ to get your instance up and running again.

Check storage usage

  • Check your disk usage from a terminal using df -h /home/jovyan/ or du -ha -d 1 ~

  • Find large files in your instance. Check files larger than 1GB from a terminal using: find /home/jovyan -size +1G -ls.

  • Get usage of general folders under your home directory from a terminal du -h --max-depth=1 /home/jovyan/

RStudio errors

  • [Errno 111] Connection refused error, try restarting the server.

  • Rsession did not start in time, Error 500 or Error 504 that does not allow you to load RStudio: go to the lab interface, start terminal, and delete the last R session and then reload RStudio:

rm -rf ~/.rstudio/sessions                  # remove all sessions
rm -rf ~/.RData                             # remove stored session data
pkill /usr/lib/rstudio-server/bin/rsession  # kills the old RStudio session process
                                            # will be re-created once visit user/rstudio URL
  • Could not start RStudio in time error, it might be because you ran out of disk space. delete some files, move them to the farm or request more storage.

How to get help

For any Jupyter Hub related questions please use our Slack channel. There are lots of users there who can quickly answer your questions.