This is an old revision of the document!

Computational Life Sciences

This page lists all the talks/workshops/activities of the Computational Life Sciences (cls-uom) research group at the University of Malta.

This group is interested in applying computer science techniques to problems in molecular biology, chemistry, pharmacology, and drug-discovery. As biomedical and healthcare research becomes more interdisciplinary, we realize the need to bring computer scientists, wet-lab biologists, mathematicians, statisticians, geneticists, etc. together. We organize bimonthly talks and workshops, mostly in bioinformatics and cheminformatics, where academics and post-graduate students present their work in an informal and friendly setting.

For more information, and to receive notifications of future activities please subscribe to our Google Group.

Talks

The following is a (reverse) chronological ordered list of talks.

Date/Time	Speaker	Location	Title & Abstract
Tue 10th April 2018, 4-5pm	Mr Karl Pullicino	ICT Faculty Building, CS seminar room 38, Block B, 1st floor	A MapReduce approach to Genome Alignment Recent years brought an enormous growth in DNA sequencing capacity and speed, thanks to the application of next-generation sequencing (NGS) technologies. The alignment of read sequences to a given reference genome is crucial for further diagnostic downstream analysis. Finding the optimal alignment of short DNA reads from a biological sample to a reference human genome, requires big data techniques, since reads’ size are in the region of 200GB. In this dissertation we present three approaches to perform distributed sequence alignment of genomic data. The first one is based on an optimization of the Smith-Waterman algorithm. The other two approaches are based on the MapReduce programming paradigm. MR-BWA presents a novel approach in distributing BWA in a different manner than existing work. BWA is an industry standard software used for genomic reads alignment. MR-BWT-FM presents low level optimizations on suffix array and BWT creation which are used to create a custom FM-Index which in turn is used for distributed genome sequence alignment. Output generated by the application generates insights and charts about the results. We evaluate the performance and correctness of both approaches by comparing our output with that of similar tools, using standard datasets from the 1000 Genomes Project. Performance and correctness results for both distributed approaches are comparable with similar tools, whilst the final custom FM-Index size is smaller than the standard BWA index size. The source code of the software described in this dissertation is publicly available at https://github.com/kpullu/msc.
Mon 17th July 2017, 4pm	Mr Joseph Bonello	ICT Faculty Building, CS seminar room 38, Block B, 1st floor	Protein Function Prediction Using Homologues Slides Homology refers to the existence of a common origin between a pair of proteins in different organisms. Proteins consist of multiple domains – conserved regions of a sequence and structure that can function independently from the rest of the protein chain. Protein function prediction methods based on homology, take advantage of the many pairwise homology relationships between individual domain sequences. This project attempts to create a set of scores that can be used to predict the possible domain functions that a protein can possess. The study uses CATH Superfamilies and CATH Functional Families (FunFams) to generate the scores. CATH is a database that provides a hierarchical protein-domain classification for proteins obtained from PDB. The Superfamilies and FunFams provide a natural grouping for proteins that share the same evolutionary origin (homologous superfamilies). This grouping can be exploited to generate similarity scores between the domains and the families. Two methods have been developed for the purpose of function prediction based on these principles. The first method uses Set Theory, where the proteins belonging to a Superfamily or a FunFam are used to determine which GO Terms are more likely to occur in the group. The second method uses a statistical calculation to represent the presence of GO Terms in a family.
Thu 1st June 2017, 4pm	Dr Jean-Paul Ebejer	ICT Faculty Building, CS seminar room 38, Block B, 1st floor	Computer-Aided Drug Design Computer-Aided Drug Design (CADD) plays an increasingly critical role in the drug-discovery process. CADD involves the application of computer algorithms to improve pharmaceutical productivity. These include algorithms for the identification of the biological target involved in a disease, toxicity and side-effect prediction, and searching a database for molecules which exhibit a therapeutic effect against a particular protein of interest. The latter is known as Virtual Screening. In this talk I will give an overview of CADD with particular emphasis on virtual screening. I will describe the successes, challenges and limitations of the approach. Finally, I will briefly present a novel virtual screening method we have developed. This interdisciplinary talk is aimed at an audience of broad interest.