Author: jp

  • M.Sc. Dissertation Examples

    The best way for post-graduate students to visualize their final deliverable (i.e. their dissertation) is to actually get hold of two or three works from others who did brilliantly. I have been meaning to put these online for ages. Two important differences to your own volume could be the number of ECTS (this affects the level of detail) and the study programme (this affects the word count and general submission regulations).

    So here you go. Below please find three “distinction-quality”, 60 ECTS, M.Sc. in AI dissertations from Etienne, Kenneth, and Joseph (all brilliant ex-students of mine). Two of these dissertations (Joseph and Etienne) resulted in publications. Note that these dissertations were submitted for the M.Sc AI programme at the University of Malta, but will still help you get a feel for what is required (and excellency).

  • Standardizing a molecule using RDKit

    Cheminformatics is hard. That is a great quote from Prof. Paul Finn. I think part of it is due to the nature of chemistry (e.g. which is the correct tautomer for this molecule?), and part of it is because of the lack of “standard” process definitions.

    So I am revisiting the standardization (of the molecule)/normalization(of functional groups) pipeline for ML, and I had to post to the extremely helpful RDKit mailing list for help (here). Using the excellent sources they pointed to me, I ended up with the following (which will surely come in handy in a few months time when I go through the whole process again):

    def standardize(smiles):
        # follows the steps in
        # https://github.com/greglandrum/RSC_OpenScience_Standardization_202104/blob/main/MolStandardize%20pieces.ipynb
        # as described **excellently** (by Greg) in
        # https://www.youtube.com/watch?v=eWTApNX8dJQ
        mol = Chem.MolFromSmiles(smiles)
        
        # removeHs, disconnect metal atoms, normalize the molecule, reionize the molecule
        clean_mol = rdMolStandardize.Cleanup(mol) 
        
        # if many fragments, get the "parent" (the actual mol we are interested in) 
        parent_clean_mol = rdMolStandardize.FragmentParent(clean_mol)
            
        # try to neutralize molecule
        uncharger = rdMolStandardize.Uncharger() # annoying, but necessary as no convenience method exists
        uncharged_parent_clean_mol = uncharger.uncharge(parent_clean_mol)
        
        # note that no attempt is made at reionization at this step
        # nor at ionization at some pH (rdkit has no pKa caculator)
        # the main aim to to represent all molecules from different sources
        # in a (single) standard way, for use in ML, catalogue, etc.
        
        te = rdMolStandardize.TautomerEnumerator() # idem
        taut_uncharged_parent_clean_mol = te.Canonicalize(uncharged_parent_clean_mol)
        
        return taut_uncharged_parent_clean_mol
    
  • The 10+ Commandments of Undertaking Postgraduate Research

    This post is written with some tongue-in-cheek; I do not mean for you to take these points as unassailable. No doubt there are some omissions here, but I had to pick my top ten (plus). These are things that looking back at my M.Sc. (Imperial) and D.Phil. (University of Oxford) experiences have forged the way I do scientific research, and built my academic character. My B.Sc. was too long ago to recall anything accurately! I find the exercise of writing them down useful (see the second commandment…).

    Stop trying to be funny, JP
    (more…)
  • Content Tips for your Dissertation or Project Write-up

    These tips are mostly focused at writing an academic (and scientific) dissertation. But I think most of these suggestions are sensible enough they should feature in any lengthy document. Based on my experience as an examiner and a supervisor, these are the most common things I notice each time I pick a project write-up.

    Writing is hard, but the tips in this post can make it better
    (more…)
  • LaTeX Tips for your Dissertation or Project Write-up

    I see many students who struggle with LaTeX write-ups and who burn in typesetting hell for their mortal sins (hanging lists, anyone?).  This post will focus on some of the more sophisticated details to publish a perfectly set document.  This post is the first of a two-part series; focusing on LaTeX hints.  The second part will focus on the actual write-up/content.

    Typesetting your dissertation is a perilous but satisfactory journey
    (more…)
  • Presenting Your Work – Assignments/Dissertations

    Some presentation tips, based on my experience of the things which trouble students (and for which they lose marks) and the things which irked me in previous study-units’ presentations.

    JP Presenting @ valletta.ai
    Presenting is fun. But it is your responsibility to make it so for the audience as well.

    (more…)

  • Three Tips for a Successful Start to your Academic Project

    So, you have contacted me to undertake a project (Dissertation/FYP/Thesis) together.  That is Great!  I am enjoying it already. I always find myself repeating the same pointers to each student at the start (or throughout the project really).  Thought I would write them down for posterity.  So here comes my recipe for success, for the first few weeks at least.

    My projects are typically on the life sciences/computer science interface (but we’ve had many different ones – including intelligent automated sports betting).  But these three ingredients apply to any kind of project really.

    (more…)
  • Computer Aided Drug Design (CADD) – Reading Lists

    I am always sending the same canned response to students who would like to do an FYP or a dissertation with me on the subjects I dabble in, Computer-Aided Drug Design (Discovery), Virtual Screening (VS), Ligand-based and Structure-Based methods, Cheminformatics, Bioinformatics and Computational Chemistry. Perhaps the first step for any student is to realize the hierarchy of these fields (and the differences between them). I am including a reading list – which helps you  bootstrap the subject, and hopefully helps you determine if this is really something for you. The jargon will be daunting at first (especially if you are a computer scientist), but that is only an initial hurdle and hopefully you get familiar with the big words quickly. You do not need to understand everything, you just need to understand enough. Remember brick walls are there to show us how badly we want things! (Watch this: long and touching).

    (more…)
  • Bioinformatics Starter Pack – Getting started in Bioinformatics

    I often get the question “I’m interested in Bioinformatics but how do I get started?“.  That question can typically be answered with “A google search“, but that is true for most of the non-existential questions nowadays.  This post will give you pointers to material you need to cover to understand the basics of what this bioinformatics field is about.  Unfortunately, there currently is no undergraduate/postgraduate course with focus on Bioinformatics (in Malta*) – so attending a series of (local) lectures is not an option presently.  Do not despair, plenty of material to go around.

    Note: This material is not specific to one area in bioinformatics, e.g. genomics, but aims to give a general (soft) introduction to the field!

    (more…)