{"id":42,"date":"2016-09-14T13:58:41","date_gmt":"2016-09-14T11:58:41","guid":{"rendered":"http:\/\/staff.um.edu.mt\/jebej02\/blog\/?p=42"},"modified":"2018-08-16T11:09:07","modified_gmt":"2018-08-16T09:09:07","slug":"setting-up-a-bioinformatics-summer-school","status":"publish","type":"post","link":"https:\/\/bitsilla.com\/blog\/2016\/09\/setting-up-a-bioinformatics-summer-school\/","title":{"rendered":"Setting up a Bioinformatics Summer School"},"content":{"rendered":"<p>As part of the <a href=\"http:\/\/www.um.edu.mt\/project\/trainmalta\">TrainMALTA<\/a> EU project activities, I volunteered\/was tasked with setting up the IT infrastructure for the HTS (or NGS) bioinformatics summer school. It has been quite an experience, and the whole setup is far from trivial &#8211; so I thought I&#8217;d document parts of it here. Habitually, I turned to google to search what others in my shoes have done and nothing turned up. Nothing on google &#8211; this setup must be worth documenting!<\/p>\n<p><!--more--><\/p>\n<p>We required\u00a0a rig which fulfils the following criteria:<\/p>\n<ul>\n<li>Supports ~40 participants<\/li>\n<li>Supports six\u00a0tutors, with their diverse software (and genomic storage) requirements<\/li>\n<li>Easy to propagate changes\/software across to everyone<\/li>\n<li>Cheap(ish) &#8211; doh!<\/li>\n<li>Requires minimal participant setup (ideally none). \u00a0We want to run the whole programme via a web browser.<\/li>\n<li>Is\u00a0tried and tested in such a classroom environment- so as not to get any surprises on the day (when it&#8217;s too late&#8230;)<\/li>\n<li>Have a backup system in case of hiccups<\/li>\n<\/ul>\n<p>Luckily our project partners at the University of Cambridge\u00a0and Ume\u00e5 (Sweden) have ample experience running these sort of courses (thanks Bastian!, all of the below is his setup). \u00a0They immediately suggested we go for a cloud solution (Amazon Web Services), which suited us fine as local resources for this kind of course are limited, at best. \u00a0The architecture of the system they suggested is shown pictorially below.<\/p>\n<figure id=\"attachment_45\" aria-describedby=\"caption-attachment-45\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/bitsilla.com\/blog\/wp-content\/uploads\/2016\/09\/setup.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-45\" src=\"http:\/\/bitsilla.com\/blog\/wp-content\/uploads\/2016\/09\/setup-300x256.png\" alt=\"TrainMALTA Summer school hardware setup (click to enlarge)\" width=\"300\" height=\"256\" srcset=\"https:\/\/bitsilla.com\/blog\/wp-content\/uploads\/2016\/09\/setup-300x256.png 300w, https:\/\/bitsilla.com\/blog\/wp-content\/uploads\/2016\/09\/setup-768x655.png 768w, https:\/\/bitsilla.com\/blog\/wp-content\/uploads\/2016\/09\/setup.png 912w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-45\" class=\"wp-caption-text\">TrainMALTA Summer school hardware setup (click to enlarge)<\/figcaption><\/figure>\n<p>There are two main machines &#8211; a web host or gateway which allows participants to easily access bioinformatics software and\u00a0materials, and a workhorse which does all the heavy lifting.<\/p>\n<p>The webhost is a t2.micro (free) AWS instance which runs a node.js web server. \u00a0A web page using\u00a0Jade templating engine give a list of participants with\u00a0links to the hosted services (shown below).<\/p>\n<figure id=\"attachment_56\" aria-describedby=\"caption-attachment-56\" style=\"width: 300px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/bitsilla.com\/blog\/wp-content\/uploads\/2016\/09\/webhost.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-medium wp-image-56\" src=\"http:\/\/bitsilla.com\/blog\/wp-content\/uploads\/2016\/09\/webhost-300x155.png\" alt=\"The web host is just a front-end to the services offered by the workhorse\" width=\"300\" height=\"155\" srcset=\"https:\/\/bitsilla.com\/blog\/wp-content\/uploads\/2016\/09\/webhost-300x155.png 300w, https:\/\/bitsilla.com\/blog\/wp-content\/uploads\/2016\/09\/webhost-768x397.png 768w, https:\/\/bitsilla.com\/blog\/wp-content\/uploads\/2016\/09\/webhost-1024x529.png 1024w, https:\/\/bitsilla.com\/blog\/wp-content\/uploads\/2016\/09\/webhost.png 1840w\" sizes=\"auto, (max-width: 300px) 100vw, 300px\" \/><\/a><figcaption id=\"caption-attachment-56\" class=\"wp-caption-text\">The web host is just a front-end to the services offered by the workhorse<\/figcaption><\/figure>\n<p>The workhorse machine is a m4.10xlarge AWS instance ($2.64\/hr) with 160 GiB RAM and 40 vCPUs &#8211; ample for our 35+ students. \u00a0Only the tutors have physical access to this machine (best to create\u00a0individual user accounts and a tutors group too). \u00a0The idea here is that we run a docker container for every participant, with services running on different ports for each user (details later).<\/p>\n<p>On the workhorse we create a <code>data\/<\/code> directory (in the root folder, <code>\/<\/code>) and subfolders for each day of the course (e.g. <code>day1\/<\/code>, <code>day2\/<\/code>, <code>day3\/<\/code>, <code>day4\/<\/code>, <code>day5\/<\/code>). \u00a0A further subdivision for each day could be <code>slides\/<\/code> and <code>practicals\/<\/code>, where the former contains the course materials and the latter contains the data used for hands on practicals and exercises. \u00a0This data directory is mounted directly (and in read-only mode) in each container instance, one per user using docker&#8217;s <code>-v<\/code> mounting option. \u00a0On the workhorse we also mount a 5000 GiB\u00a0amazon volume under <code>\/participants<\/code>. \u00a0Here we create as many home directories as we have participants (<code>01\/<\/code>, <code>02\/<\/code>, <code>03\/<\/code>, etc.).<\/p>\n<p>We start off with a docker image (Bastian&#8217;s) which is conveniently <a href=\"https:\/\/hub.docker.com\/r\/bschiffthaler\/ngs\/\">publicly available<\/a>, and which you can retrieve with\u00a0<code>docker pull bschiffthaler\/ngs<\/code>. \u00a0This image is used to start the\u00a0docker container. \u00a0We start by running a container, with port forwarding for ssh, http (an apache instance is running in the image), and RStudio. \u00a0The container is started using port suffix 00, in the following manner:<\/p>\n<pre class=\"toolbar:1 scroll:true lang:sh decode:true\">docker run -d -p 10000:80 -p 10100:443 -p 2200:22 -p 8800:8888 -v \/data:\/data -v \/participants\/00:\/home\/training 7f5a79c3410d\r\n<\/pre>\n<p>Port 10000 of the container now runs apache, 10100 runs Gate One (a browser-based ssh client), 10200 runs RStudio (training\/training login), and port 2200 runs an ssh service to the container. \u00a0In the command above note the mounting of <code>\/data<\/code> (from workhorse to container) and training user&#8217;s container home directory from\u00a0<code>\/participants\/00<\/code>. \u00a0On the workhorse, create a user participant_docker (with uid 2000 &#8212; this is a requirement) and use this as the home directory owner. \u00a0Note that user training&#8217;s uid is also 2000, so this enables the training user to write in his\/her home directory.<\/p>\n<p>As always, training requirements differ between courses so the tutors will almost certainly need to each add and install their own bits of kit in the container. \u00a0One way to do this is to let each of the tutors login (as root !) in the container (using something like <code>docker exec -it dcd97a21424e \/bin\/bash<\/code>). \u00a0This allows them to <code>apt-get install<\/code> or <code>pip install<\/code> any other software they would like. \u00a0Note that all tutors should be in group docker (to connect to the container). \u00a0After finalizing the software installs you need to save (commit) to the image &#8211; this is simple enough to achieve (e.g.\u00a0<code>docker commit dcd97a21424e trainmalta2016_v4<\/code>). \u00a0A more maintainable way to do this is to edit the docker file and make all software readily available in the image, but if you are in a hurry&#8230; When the tutors finish their software+data installs we copy the training user home directory to each of participant directories in\u00a0<code>01\/<\/code>, <code>02\/<\/code>, <code>03\/<\/code>, etc.<\/p>\n<p>You can launch a container\u00a0for each participant using a bash script like:<\/p>\n<pre class=\"toolbar:1 scroll:true lang:sh decode:true \">#!\/bin\/bash\r\n\r\nIMAGEID=7f5a79c3410d\r\nPARTICIPANTS=50\r\n\r\nfor f in $(eval echo \"{1..\"$PARTICIPANTS\"}\")\r\ndo\r\n    if [ $f -lt 10 ]\r\n        then\r\n        f=\"0$f\"\r\n    fi\r\n    echo \"Creating container for participant $f ... \"\r\n    docker run -d -p 100$f:80 -p 101$f:443 -p 102$f:8787 -p 22$f:22 -p 88$f:8888 -v \/data:\/data -v \/participants\/$f:\/home\/training $IMAGEID \/usr\/bin\/supervisord -c \/etc\/supervisor\/conf.d\/supervisord.conf\r\ndone\r\n<\/pre>\n<p>The supervisor daemon (supervisord) needs to be run because the list of startup programs has changed from the original image to include jupyter notebook.<\/p>\n<p>This worked well for most of our training needs with a few exceptions. \u00a0For example, we wanted to show the participants the <a href=\"http:\/\/software.broadinstitute.org\/software\/igv\/\">IGV<\/a>\u00a0browser. \u00a0However this launches its own GUI (workhorse runs in headless mode). \u00a0Participants were asked to install IGV locally (on their laptops or on the classroom&#8217;s permanent machines) and then just download the file of interest from the apache directory listing (via the browser). \u00a0More advanced participants used sshfs to mount the directory locally from the workhorse. \u00a0A workaround could be to launch ssh to the workhorse with X tunnelling (perhaps?).<\/p>\n<p>The summer school is mid-way through and (hopefully without jinxing it) has gone well so far.<\/p>\n<h1>Update, July 2018<\/h1>\n<p>We used pretty much a similar setting for the <a href=\"https:\/\/www.um.edu.mt\/project\/trainmalta\/resourcesopp#Schools\">TrainMALTA Summer School in epigenetics<\/a> (2018).\u00a0 I noticed that there were some important points missing, mainly:<\/p>\n<ul>\n<li><a href=\"http:\/\/modules.sourceforge.net\/\">modules<\/a> should be installed in <code>docker_training<\/code>&#8216;s home directory which will be mounted in the docker instance (i.e. <code>training<\/code> user home dir)<\/li>\n<li><code>docker_training<\/code> user in the workhorse has the same uid (2000) as the <code>training<\/code> user in the docker instance<\/li>\n<li>The <code>docker_training<\/code> home directory contents should be copied in each user&#8217;s dir (e.g. <code>\/participants\/00<\/code>,<code>\/participants\/01<\/code>, etc.)<\/li>\n<li>Code for the front end is downloaded from &#8211; <a href=\"https:\/\/microasp.upsc.se\/bastian\/course-web\">https:\/\/microasp.upsc.se\/bastian\/course-web<\/a><\/li>\n<li>The front-end is a node server with jade templating, you will need to edit pages for materials, programme, student lists, connect to AWS instance, etc<\/li>\n<li>To run the front end server you need to <code>run DEBUG=\"untitled1:server\" nodejs .\/bin\/www<\/code><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>As part of the TrainMALTA EU project activities, I volunteered\/was tasked with setting up the IT infrastructure for the HTS (or NGS) bioinformatics summer school. It has been quite an experience, and the whole setup is far from trivial &#8211; so I thought I&#8217;d document parts of it here. Habitually, I turned to google to [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-42","post","type-post","status-publish","format-standard","hentry","category-uncategorized"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/pap6Kd-G","_links":{"self":[{"href":"https:\/\/bitsilla.com\/blog\/wp-json\/wp\/v2\/posts\/42","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/bitsilla.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/bitsilla.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/bitsilla.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/bitsilla.com\/blog\/wp-json\/wp\/v2\/comments?post=42"}],"version-history":[{"count":3,"href":"https:\/\/bitsilla.com\/blog\/wp-json\/wp\/v2\/posts\/42\/revisions"}],"predecessor-version":[{"id":227,"href":"https:\/\/bitsilla.com\/blog\/wp-json\/wp\/v2\/posts\/42\/revisions\/227"}],"wp:attachment":[{"href":"https:\/\/bitsilla.com\/blog\/wp-json\/wp\/v2\/media?parent=42"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/bitsilla.com\/blog\/wp-json\/wp\/v2\/categories?post=42"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/bitsilla.com\/blog\/wp-json\/wp\/v2\/tags?post=42"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}