Jupyter Hub =========== We support a Jupyter Hub server running on Sanger Cloud. Jupyter allows you to run your analysis in multiple environments (``R``, ``python``, ``Julia``, etc.) and also to create and share notebooks containing your analysis, code, equations and visualizations. We think this is an ideal environment for any kind of downstream analysis. For more details please refer to `Jupyter Hub documentation `_. How to get access ----------------- Our Jupyter Hub service is available in a web browser on any computer anywhere in the world. You will need to provide us with your GitHub ID to be able to login. Once we notify you that your account is created you can login using your GitHub credentials. .. note:: **Only Sanger employees and their collaborators are eligible for access**. To get access email ``cellgeni [at] sanger.ac.uk`` and provide your GitHub username and Sanger Faculty. Internal JupyterHub ------------------- Central IT now have their own JupyterHub that runs directly on the FARM. It requires a VPN connection to access. It is available `here `_. Resources --------- Our cluster allows for single user Jupyter instances to be spawned with up to 400GB RAM and 30 CPUs, but this depends on our cluster's load at that time. By default we provide users with 20GB of RAM and a dynamic limit of 10 CPUs. We guarantee each jupyter will have at least 1 core and they can use up to 10 if they need to without changing their resource configuration. If you require more resources, you'll have to `Restart your instance `_ and request for more. Un-check the default resources and input what you need, if there are enough resources on the cluster they'll be reserved for your Jupyter. Jupyter's storage is 100GB. Try no to keep big files on jupyter and instead read them from the farm. Only ``/home/jovyan`` is persistent, anything outside your home folder will be lost when your session is terminated. Please have in mind the following considerations: - Only input custom CPU resources if you require more than 10 CPUs or if you're performing a benchmark and you need exactly a specific number of CPUs. - After you've finished your analysis with custom resources, please `Restart your instance `_ and go back to the defaults to free those resources and allow other users to access them. - If you're only going to be checking on your farm jobs please use custom 8GB RAM (*default CPU is just fine for this*). - Jupyter's cull time is 24 hours, if you have not access your jupyter at least once in the past 24 hours your session will be terminated. We provide open usage metrics of our Jupyter cluster using `Graphana Dashboard `_. Quick Start Guide ----------------- JupyterHub website is public, so you don't need to turn on VPN to use it. However, it is only available to users who messaged us their GitHub usernames and have been whitelisted. #. In your browser go to https://jhub.cellgeni.sanger.ac.uk #. Use your GitHub credentials for authentication. It may take some time to load first time. #. Select your CPU number, RAM number and Image you would like to spawn your instance with *(or use the defaults)*. #. Now you are ready to run your notebooks! #. **RStudio** is also available on JupyterHub. A new R session can be started from the Launcher or change the word `lab` in your address bar to the word `rstudio`: ``https://jhub.cellgeni.sanger.ac.uk/user//rstudio`` #. You can switch to a classic Jupyter interface by change the word `lab` in your address bar to the word `tree`: ``https://jhub.cellgeni.sanger.ac.uk/user//tree`` .. warning:: **JupyterHub environment and storage are not backed up**. Please only use for computations and download your results (and notebooks) afterwards. You've been warned! .. warning:: **Keep your notebooks light**. Notebooks over 100MB *will* give you unexpected errors. Notebook templates ------------------ We provide some notebook templates with the pre-installed software. These are located in the ``notebooks`` folder in Jupyter or in our `Notebook GitHub repositry `_. Corresponding example data is located in the ``data`` folder. We recommend that before running your analysis, you make a copy of a notebook template, save it to your home folder and work with the copy. Read more about our notebooks in the `Notebook section `_ Installing packages ------------------- JHub Conda ^^^^^^^^^^ The default conda environment is not persistent across Jupyter sessions - you can install additional packages, but it will not be there next time you start Jupyter. To have a persistent conda environment create one inside ``/home/jovyan/`` folder *(if you've already got a conda environment activated jump to step 4)*: 1. Open a new terminal (click on the ``Terminal`` icon in the Launcher) 2. Create the environment and activate it (replace ``myenv`` with your environment name): .. code-block:: bash conda create --name myenv python=3.8 conda activate myenv 3. Install ``ipython kernel`` to use as a python kernel inside your jupyter environment, ``--display-name`` is optional, if not provided the conda environment name will be used: .. code-block:: bash python -m ipykernel install --user --name myenv --display-name "Python (MyEnv)" 4. Install all the packages you need, for example: .. code-block:: bash conda install numpy pandas matplotlib scipy scikit-learn 5. Reload the main page. Now you will see your new environment in the Launcher. If you don't see it at first, try restarting your instance. **Alternative** Instead of creating a new environment, you can also clone an existing one this will eliminate the need to install repeated packages: .. code-block:: bash conda create --clone old_name --name new_name pip ^^^ ``pip`` defaults to installing Python packages to a system directory, to make sure your packages persist they need to be installed in your home directory use the ``--user`` option to do this or **install them inside an active conda environment**. R ^^^ Packages can be installed with the ``install.packages()`` function in an RStudio console: .. code-block:: r install.packages("packageName") or multiple packages at once: .. code-block:: r install.packages(c("packageOne", "packageTwo", "packageThree")) From a terminal ``RScript`` can be used to install packages **(don't install packages as sudo)**: .. code-block:: bash Rscript -e 'install.packages("packageName")' .. warning:: **Try not to mix conda r-* packages with R CRAN packages**. For example, if you've installed your own R using conda like this ``conda install r-recommended r-irkernel``, install packages using conda ``conda install r-hdf5r`` instead of ``install.packages("hdf5r")``. Kernels ------- Kernels are programming language specific processes that run independently and interact with Jupyter and their user interfaces. Kernels can be changed using the ``Kernel`` > ``Changer kernel`` menu. Python Kernel ^^^^^^^^^^^^^ When the kernel list is located outside your home directory it can be reseted. If that happens, run this one-line command from your terminal to add **every conda environment** on your profile to the kernel list. .. code-block:: bash pip install -U ipykernel; ENVS=$(conda info --envs | grep '^\w' | cut -d' ' -f1); for env in $ENVS; do source activate $env; python -m ipykernel install --user --name $env; echo "$env"; conda deactivate; done R Kernel ^^^^^^^^^ If you want to run R code straight from JupyterLab without using RStudio you can use the ``R`` kernel. If you don't see it on the select list, you need to install the ``iRkernel`` package. Install the package and the spec: .. code-block:: r install.packages('IRkernel') IRkernel::installspec() Managing your data ------------------ .. note:: Any data outside ``/home/jovyan`` will be lost when the environment is restarted. Make sure you keep the files you don't want to lose somewhere inside the home folder. Upload using GUI ^^^^^^^^^^^^^^^^ You can copy files to and from Jupyter directly in a web interface (Menu and a button in the interface). Copying data to/from other hosts ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ You can also copy data to/from other hosts, like the farm, using a terminal (click on the ``Terminal`` icon in the Launcher). **Using rsync** Copy from the farm to the local environment: .. code-block:: bash rsync -avzh USER@farm5-login:/nfs/users/nfs_u/USER// farm/ Copy from the local environment to the farm: .. code-block:: bash rsync -avzh USER@farm5-login:/nfs/users/nfs_u/USER/ **Using scp** Copy from the farm to the local environment: .. code-block:: bash scp -r USER@farm5-login:/nfs/users/nfs_u/USER// farm/ Copy from the local environment to the farm: .. code-block:: bash scp -r farm/ USER@farm5-login:/nfs/users/nfs_u/USER// Mounting the farm on jupyter (sshfs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ To mount the farm's base paths (``/nfs``, ``/lustre`` and ``/warehouse``) on your jupyter instance: #. Open a new terminal on your Jupyter. #. Type ``mount-farm``, then press Enter. #. When prompted for your username and password input them. The three folders will be mounted on the root folder of your instance. Try opening a new terminal and change directory to your farm home ``cd /nfs/users/nfs_u/USER`` or your team's lustre ``cd /lustre/scratch11X/team999`` and then type ``ls`` to see the files. You can use the same paths in your notebooks. .. note:: You will not see these folders in Jupyter's File Browser because it only shows ``/home/jovyan``, if you really want to see them on your File Browser you need to **create symlinks** from the mounted folders to your home folder. For example: ``ln -s /nfs /home/jovyan/nfs`` .. warning:: Mounting folders with many files/folders inside them may affect Jupyter. We recommend to only link particular folders and not the whole mounting point. Mounting other NFS storages ^^^^^^^^^^^^^^^^^^^^^^^^^^^ 1. Create a folder where to mount the share: ``mkdir -p ~/home/jovyan/shared`` 2. Create credentials file ``/jovyan/.nfs-credentials``: .. code-block:: bash username=YOUR_USER password=YOUR_PASSWORD domain=sanger 3. Mount the storage: .. code-block:: bash sudo mount.cifs //network/path/to/share/ /home/jovyan/shared -o rw,file_mode=0777,dir_mode=0777,credentials=/jovyan/.nfs-credentials Downloading data ^^^^^^^^^^^^^^^^ By default, JupyterHub does not provide an ability to download folders, but you can create an archive: .. code-block:: bash tar cvfz / and download the resulting file with the right click ``Download`` option. Exporting notebooks ^^^^^^^^^^^^^^^^^^^ Export as PDF """"""""""""" To export a notebook as PDF, install the following pre-requisite software: .. code-block:: bash sudo apt update && sudo apt-get install -y texlive-generic-recommended texlive-generic-recommended Now you can export a notebook through ``File`` > ``Export notebook as...`` menu. Knit to PDF """"""""""" To export an Rnotebook as PDF, install the following pre-requisite software: .. code-block:: bash wget -qO- "https://yihui.org/gh/tinytex/tools/install-unx.sh" | bash If that it is not enough, the easiest way is to install the whole texlive package, the downside is that it is **4.5G**: .. code-block:: bash sudo apt update && sudo apt-get install -y texlive-full Sharing notebooks ----------------- #. Go to your `API Tokens page `__ or go to `hub/home `__ and then click **"Token"** on the top menu. #. Type in a note like **"Shared with collaborator X"** #. Click the orange button **"Request new API token"** #. Copy the token that shows up under **"Your new API Token"**. (i.e. ``ba5eba11b01dfaceca55e77ecacaca11``) #. Go to your jupyter instance, but using the **"tree"** view instead of the "lab" view: ``https://jhub.cellgeni.sanger.ac.uk/user//tree`` #. Find your notebook and open it. You should be on a link that looks like: ``https://jhub.cellgeni.sanger.ac.uk/user//notebooks/some_notebook.ipynb`` #. Add this to the end of the link: ``?token=`` and copy that link. (i.e.: ``?token=ba5eba11b01dfaceca55e77ecacaca11``) #. Share what you have copied. It should be something like: ``https://jhub.cellgeni.sanger.ac.uk/user//notebooks/some_notebook.ipynb?token=`` #. Once you have finished the collaboration. Go to your `API Tokens page `_ and click **"Revoke"** to delete that access token. iRODS ----------------- iRODS support is provided using a wrapper script and a singularity image already copied to your home profile. Before start using iRODS, you'll need to copy your environment file from the farm to your jupyter. Open a Terminal and please follow this steps: 1. Use ``mount-farm`` and input your credentials when prompted. 2. Copy ``irods_environment.json`` from your home directory on the farm to your Jupyter instance: .. code-block:: bash cp /nfs/users/nfs_u/USER/.irods/* ~/.irods/ 3. Run ``irods iinit``, it will ask for your PAM password *(Sanger password, same as the one you use for the farm).* 4. Run all `icommands avaiable `__ using ``irods ``. For example: ``irods ils`` or ``irods ihelp``. .. note:: **"irods iinit" also asked for iRODS password?** Go to the farm and type: ``head -1 ~/.irods/irods_password``, the output is your password. .. warning:: These instructions assume you already have an iRODS account setup on the farm, if you don't please contact ServiceDesk. Running containers ------------------ The jupyter environment includes **Singularity**, a container platform that allows creating and running tools in a portable and reproducible way. You can build a container using Singularity on your Jupyter instance, and then run it the farm. Your container is a single file, and you don’t have to worry about how to install all the software you need on each different operating system. Read more about building and running singularity containers on the `official docs `__. Building containers ^^^^^^^^^^^^^^^^^^^ You can build `Singularity recipes `__ in Jupyter, but for a better and more portable experience, we recommend using `Dockerfiles `__ instead. A Dockerfile is a script (a text document) that contains all the commands a user could call on the command line to assemble an image. Docker images are widely available and adopted as best practices everywhere. You can host your Docker images in Docker repositories like https://hub.docker.com or https://quay.io However, the downside is that Docker requires sudo permissions to execute and interact with the Docker daemon that builds the images and starts/stops the containers. We've set up a service that allows you to log in with your Sanger credentials and enables you to build Docker images that can then be turned into Singularity images and copied over to the FARM to do your work. The basic flow of commands looks like this: .. code-block:: bash # connect to the server ssh USER@docker.cellgeni.sanger.ac.uk # create a project folder for you to work on mkdir myproject cd myproject # create appropiate Dockerfile # and then build it into an image docker build --tag image:tag . # once you've got the image locaally on the docker ademon # convert it to singularity singularity build image_name.sif docker-daemon://image:tag # copy the image off the machine into the farm scp image_name.sif USER@farm5.internal.sanger.ac.uk:/path/on/the/farm Troubleshooting --------------- Restart your instance ^^^^^^^^^^^^^^^^^^^^^ Sometimes, a server restart might solve an issue. For that: #. Go to the menu "File" > "Hub Control Panel" or browse to your `Hub Home `__ #. Click ``Stop My Server`` #. Wait 2 minutes and reload the page. #. Access `https://jhub.cellgeni.sanger.ac.uk/ `__ to get your instance up and running again. Check storage usage ^^^^^^^^^^^^^^^^^^^ - Check your disk usage from a terminal using ``df -h /home/jovyan/`` or ``du -ha -d 1 ~`` - Find large files in your instance. Check files larger than 1GB from a terminal using: ``find /home/jovyan -size +1G -ls``. - Get usage of general folders under your home directory from a terminal ``du -h --max-depth=1 /home/jovyan/`` RStudio errors ^^^^^^^^^^^^^^ - ``[Errno 111] Connection refused`` error, try restarting the server. - ``Rsession did not start in time``, ``Error 500`` or ``Error 504`` that does not allow you to load RStudio: go to the `lab` interface, start terminal, and delete the last R session and then reload RStudio: .. code-block:: bash rm -rf ~/.rstudio/sessions # remove all sessions rm -rf ~/.RData # remove stored session data pkill /usr/lib/rstudio-server/bin/rsession # kills the old RStudio session process # will be re-created once visit user/rstudio URL - ``Could not start RStudio in time`` error, it might be because you ran out of disk space. delete some files, move them to the farm or request more storage. How to get help --------------- For any Jupyter Hub related questions please use our `Slack channel `__. There are lots of users there who can quickly answer your questions.