cls
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
cls [2019/03/14 09:07] – jp | cls [2019/05/23 08:49] (current) – jp | ||
---|---|---|---|
Line 3: | Line 3: | ||
{{ : | {{ : | ||
- | This group is interested in applying computer science techniques to problems in molecular biology, chemistry, pharmacology, | + | This group is interested in applying computer science techniques to problems in molecular biology, chemistry, pharmacology, |
- | For more information, | + | For more information, |
===== Talks ===== | ===== Talks ===== | ||
Line 12: | Line 12: | ||
^ Date/Time ^ Speaker | ^ Date/Time ^ Speaker | ||
- | | Fri 22nd March 2019, 2:30-3:30pm | Mr Kenneth Penza | Faculty of ICT, Room 9, Level -1, Block A | **GO term predictions in CATH: a machine learning approach** \\ Proteins are composed of amino acids chains that perform tasks such as regulation and signalling within an organism. The protein folds into a three-dimensional structure, giving it functionality. A protein function is characterised through laboratory experiments or computational methods. Physical laboratory experiments are expensive but more reliable with a low throughput. Two proteins are related if they have a common ancestor. The ancestry (homology) link is established through conserved regions in the protein sequence. Homology relationships can be utilised to transfer protein functions between related proteins. This research investigates machine learning techniques to predict protein function from protein properties and proportions from structural databases such as CATH and PFAM. This work used two Machine Learning (ML) approaches to tackle this problem. The first approach was the application of automatic feature selection using Support Vector Machines (SVM) and Random Forest (RF) techniques on species-specific datasets. The second approach applied the different species-specific datasets to Neural Networks with different hidden layer configurations. These techniques were evaluated on CAFA3 targets against the CAFA2 shared task. The RF models with the species-specific feature set performs at the same level of the best CAFA2 submission for Homo sapiens species and is superior to the best CAFA2 submission for E. coli.| | + | | Fri 24th May 2019, 12: |
+ | | Fri 15th March 2019, 2:30-3:30pm | Mr Kenneth Penza | Faculty of ICT, Room 9, Level -1, Block A | **GO term predictions in CATH: a machine learning approach** \\ Proteins are composed of amino acids chains that perform tasks such as regulation and signalling within an organism. The protein folds into a three-dimensional structure, giving it functionality. A protein function is characterised through laboratory experiments or computational methods. Physical laboratory experiments are expensive but more reliable with a low throughput. Two proteins are related if they have a common ancestor. The ancestry (homology) link is established through conserved regions in the protein sequence. Homology relationships can be utilised to transfer protein functions between related proteins. This research investigates machine learning techniques to predict protein function from protein properties and proportions from structural databases such as CATH and PFAM. This work used two Machine Learning (ML) approaches to tackle this problem. The first approach was the application of automatic feature selection using Support Vector Machines (SVM) and Random Forest (RF) techniques on species-specific datasets. The second approach applied the different species-specific datasets to Neural Networks with different hidden layer configurations. These techniques were evaluated on CAFA3 targets against the CAFA2 shared task. The RF models with the species-specific feature set performs at the same level of the best CAFA2 submission for Homo sapiens species and is superior to the best CAFA2 submission for E. coli.| | ||
| Wed 16th January 2019, 12-1pm | Mr Nicholas Mamo | BM402 (CMMB, Biomedical Sciences Building) | **Demystifying Blockchain Technology: The Blockchain for Non-Computer Scientists** \\ If you live in Malta, then you also inhabit the self-proclaimed Blockchain Island. What does that actually mean? What is the blockchain, and how will it change the way that we think about data? Blockchains are mentioned in the same breath as crypto-currencies, | | Wed 16th January 2019, 12-1pm | Mr Nicholas Mamo | BM402 (CMMB, Biomedical Sciences Building) | **Demystifying Blockchain Technology: The Blockchain for Non-Computer Scientists** \\ If you live in Malta, then you also inhabit the self-proclaimed Blockchain Island. What does that actually mean? What is the blockchain, and how will it change the way that we think about data? Blockchains are mentioned in the same breath as crypto-currencies, | ||
| Tue 10th April 2018, 4-5pm | Mr Karl Pullicino | ICT Faculty Building, CS seminar room 38, Block B, 1st floor | **A MapReduce approach to Genome Alignment** \\ Recent years brought an enormous growth in DNA sequencing capacity and speed, thanks to the application of next-generation sequencing (NGS) technologies. The alignment of read sequences to a given reference genome is crucial for further diagnostic downstream analysis. Finding the optimal alignment of short DNA reads from a biological sample to a reference human genome, requires big data techniques, since reads’ size are in the region of 200GB. In this dissertation we present three approaches to perform distributed sequence alignment of genomic data. The first one is based on an optimization of the Smith-Waterman algorithm. The other two approaches are based on the MapReduce programming paradigm. MR-BWA presents a novel approach in distributing BWA in a different manner than existing work. BWA is an industry standard software used for genomic reads alignment. MR-BWT-FM presents low level optimizations on suffix array and BWT creation which are used to create a custom FM-Index which in turn is used for distributed genome sequence alignment. Output generated by the application generates insights and charts about the results. We evaluate the performance and correctness of both approaches by comparing our output with that of similar tools, using standard datasets from the 1000 Genomes Project. Performance and correctness results for both distributed approaches are comparable with similar tools, whilst the final custom FM-Index size is smaller than the standard BWA index size. The source code of the software described in this dissertation is publicly available at {{https:// | | Tue 10th April 2018, 4-5pm | Mr Karl Pullicino | ICT Faculty Building, CS seminar room 38, Block B, 1st floor | **A MapReduce approach to Genome Alignment** \\ Recent years brought an enormous growth in DNA sequencing capacity and speed, thanks to the application of next-generation sequencing (NGS) technologies. The alignment of read sequences to a given reference genome is crucial for further diagnostic downstream analysis. Finding the optimal alignment of short DNA reads from a biological sample to a reference human genome, requires big data techniques, since reads’ size are in the region of 200GB. In this dissertation we present three approaches to perform distributed sequence alignment of genomic data. The first one is based on an optimization of the Smith-Waterman algorithm. The other two approaches are based on the MapReduce programming paradigm. MR-BWA presents a novel approach in distributing BWA in a different manner than existing work. BWA is an industry standard software used for genomic reads alignment. MR-BWT-FM presents low level optimizations on suffix array and BWT creation which are used to create a custom FM-Index which in turn is used for distributed genome sequence alignment. Output generated by the application generates insights and charts about the results. We evaluate the performance and correctness of both approaches by comparing our output with that of similar tools, using standard datasets from the 1000 Genomes Project. Performance and correctness results for both distributed approaches are comparable with similar tools, whilst the final custom FM-Index size is smaller than the standard BWA index size. The source code of the software described in this dissertation is publicly available at {{https:// |
cls.1552554439.txt.gz · Last modified: 2019/03/14 09:07 by jp