VanML 2019 參會記錄

緊接着 NeurIPS 2019 會議後面，就是 Vancouver Machine Learning: Genomics 會議。其實本次算是我第一次參加學術會議，
本科的時候也有一次機會，當時 Nature 的子會議 Agricultural Genomics 2017 在我農的作物遺傳改良國家重點實驗室開，我是可以去聽的（如果我想的話），
但最後還是沒有成行。大概是因爲當時忙於準備數學建模競賽，這兩件事正好一前一後，所以國賽已結束我直接累的不想幹任何事了。

言歸正傳，這個會議還是非常專業項的，而且是純研討性質。小而精的感覺。我學到了不少領域內別的大牛都在做什麼方向，用了什麼方法。收穫頗多吧（而且吃的是真的不錯 ԅ(¯﹃¯ԅ)）

行程

15 日中午到達港口，乘渡輪到達溫哥華。

去碼頭沿途風景

到達 Terminal
渡輪

途中

海上風景

到達溫哥華

晚上真冷啊，街上都沒幾個人

正式會議

First session

Dr. Jennifer Listgarten, Professor in the Electrical Engineering and Computer Sciences department and the Center for Computational Biology at Berkeley.

Theme: Machine Learning in protein detection.

Major Problems and Objectives

Protein we want to search
- Problem: massive search space, 10^L space, and when L ~ 50 the amount is close to atoms in outer space. And yet, L gets larger.
Protein we want to refine, e.g.
- Carbon fixation (RuBisCo)
- gene therapy virus-delivery
- genetic scissors (CRISPR)
Similar story for small molecules for drug desig(Cnopyright © https://blog.csdn.net/s_gy_zetrov. All Rights Reserved)

How exactly do we do protein design/optimization?

Traditionally, we use lab-based measurement, introducing Frances Arnold, winner of the Nobel Prize in Chemistry 2018. The process is as follows:

We can use machine learning to improve this greedy random search!

Replace lab measurement with predictive model
Replace random greedy search with intelligence search

A normal predictive model (a.k.a stochastic oracle):

Assumptions: ”Black Box” oracle (given the ability to predict a property from a sequence), we want a method that will tell you what sequence to choose to either maximum or specify the property. (Copyright © https://blog.csdn.net/s_gy_zetrov. All Rights Reserved)

Introducing solution based on `model-based optimization (MBO)`

Based on MBO, introducing `Design by adaptive sampling (DbAS)`

Problems of this all

We just assumes oracle is unbiased and has good uncertainty estimates. There is “black holes” in these models (Neural Networks), for example: adversarial noise made image misclassified.

Credit: https://www.ibm.com/developerworks/community/blogs/Analytics4Gov/entry/Adversarial_Robustness_Toolbox_How_IBM_protects_your_Neural_Network_against_adversarial_attacks?lang=en
(Copyright © https://blog.csdn.net/s_gy_zetrov. All Rights Reserved)
So, in fact, models are not intelligent at all! Just works on some problem better!

Second session

Dr. Sharon Browning, Professor in the Department of Biostatistics at the University of Washington. Co-developed the popular BEAGLE software in collaboration with Dr. Brian Browning. Recent work includes investigations of the contribution of archaic humans to current-day human genomics, and extensions to the BEAGLE software involving identity-by-descent.

Finding Identity by Descent (IBD) segment in population samples

Tested on the UK Biobank dataset (500K individuals)
(Copyright © https://blog.csdn.net/s_gy_zetrov. All Rights Reserved)
Identity by Descent (IBD): segments of DNA that descents share with one ancestor.

The smaller the population, the more IBD

Third session

Dr. David Aanensen, the Director of the Centre for Genomic Pathogen Surveillance at the Wellcome Sanger Institute, Senior Group Leader in Genomic Surveillance at the Big Data Institute at the University of Oxford, the Director of the NIHR funded Global Health Research Unit on Genomic Surveillance of Antimicrobial Resistance.

Online platform for visualizing and identifying pathogen

Example of disease outbreak: note that trees shape were similar, first there are some branches then there is an outbreak.

Online platform

https://microreact.org/showcase

https://pathogen.watch/

Forth session

Dr. Alexander Bouchard-Côté, Professor of Statistics at the University of British Columbia. Research focuses are computational statistics and machine learning, with applications to linguistics and biology.

Bayesian method to reconstruct single cell phylogenetic trees from copy number events such as those that arise in cancers with high genomic instability

The method is motivated by low-depth genome-wide data which can be obtained for increasingly large numbers of cells thanks to technologies such as Direct Library Preparation or 10x Single Cell Genomics. Computing the posterior distribution in this model at scale is challenging. Recent advances in the field of Bayesian computational statistics can be used to parallelize the posterior inference computation to an arbitrary number of cores, touching on topics such as non-reversible methods and change of measure approaches. The posterior inference methods described are available through an open source Bayesian modelling language called Blang, which can be used for a range of phylogenetic problems including more traditional phylogenetic models, as well as other Bayesian analysis problems. The motivating copy-number-based phylogenetic model is implemented in Blang and available in a cancer Bayesian phylogenetics and population genetics library which are actively developing. This library has been used to infer phylogenetic trees on over 4000 cells using over 60 cores.

https://www.stat.ubc.ca/~bouchard/blang/

visitor tracker

sgyzetrov

發佈了196 篇原創文章 · 獲贊 474 · 訪問量 61萬+

他的留言板關注

Vancouver Machine Learning 2019 參會記錄

VanML 2019 參會記錄

行程

正式會議

First session