Vancouver Machine Learning 2019 參會記錄

在這裏插入圖片描述

VanML 2019 參會記錄

緊接着 NeurIPS 2019 會議後面,就是 Vancouver Machine Learning: Genomics 會議。其實本次算是我第一次參加學術會議,
本科的時候也有一次機會,當時 Nature 的子會議 Agricultural Genomics 2017 在我農的作物遺傳改良國家重點實驗室開,我是可以去聽的(如果我想的話),
但最後還是沒有成行。大概是因爲當時忙於準備數學建模競賽,這兩件事正好一前一後,所以國賽已結束我直接累的不想幹任何事了。

言歸正傳,這個會議還是非常專業項的,而且是純研討性質。小而精的感覺。我學到了不少領域內別的大牛都在做什麼方向,用了什麼方法。收穫頗多吧(而且吃的是真的不錯 ԅ(¯﹃¯ԅ))

行程

15 日中午到達港口,乘渡輪到達溫哥華。

去碼頭沿途風景
去碼頭沿途風景
到達 Terminal到達 Terminal
渡輪
渡輪
途中
途中
海上風景
海上風景
到達溫哥華
到達溫哥華

晚上真冷啊,街上都沒幾個人
晚上真冷啊,街上都沒幾個人

正式會議

SFU開會

First session

Dr. Jennifer Listgarten, Professor in the Electrical Engineering and Computer Sciences department and the Center for Computational Biology at Berkeley.

Theme: Machine Learning in protein detection.

Major Problems and Objectives

  • Protein we want to search
    • Problem: massive search space, 10L space, and when L ~ 50 the amount is close to atoms in outer space. And yet, L gets larger.
  • Protein we want to refine, e.g.
    • Carbon fixation (RuBisCo)
    • gene therapy virus-delivery
    • genetic scissors (CRISPR)
  • Similar story for small molecules for drug desig(Cnopyright © https://blog.csdn.net/s_gy_zetrov. All Rights Reserved)

How exactly do we do protein design/optimization?

Traditionally, we use lab-based measurement, introducing Frances Arnold, winner of the Nobel Prize in Chemistry 2018. The process is as follows:

在這裏插入圖片描述

We can use machine learning to improve this greedy random search!

  1. Replace lab measurement with predictive model
  2. Replace random greedy search with intelligence search

A normal predictive model (a.k.a stochastic oracle):

在這裏插入圖片描述

Assumptions: ”Black Box” oracle (given the ability to predict a property from a sequence), we want a method that will tell you what sequence to choose to either maximum or specify the property. (Copyright © https://blog.csdn.net/s_gy_zetrov. All Rights Reserved)

Introducing solution based on model-based optimization (MBO)

Related paper:

在這裏插入圖片描述
https://arxiv.org/pdf/1905.10474.pdf

We replace search over x with search over 𝜃 in P(x|𝜃).

在這裏插入圖片描述

Advantages:

  • Model can sample broad areas of sequence space
  • Does not require gradients of f
  • Can incorporate uncertainty(Copyright © https://blog.csdn.net/s_gy_zetrov. All Rights Reserved)
  • Provide set of candidate solutions, not just one
  • Search is now augmented into the language of probability

Detail algorithmic solutions:

  1. sampling from “search model” P(x|𝜃)
  2. Evaluate samples on f(x)
  3. Adjust 𝜃t+1 <- 𝜃t s.t. the generative model favors sequences with large value for f(x)

Based on MBO, introducing Design by adaptive sampling (DbAS)

Related paper:

在這裏插入圖片描述
https://arxiv.org/pdf/1810.03714.pdf

This method aims to solve the MBO Objective:

在這裏插入圖片描述

The gree circle can be solved using Estimation of Distribution Algorithms (EDAs), which is a widely-used approach to solving the MBO objective. And one important aspect of any EDA is to choose search model. In this scenario, we can choose Variational Autoencoder (VAE), Hidden Markov model (HMM), etc.(Copyright © https://blog.csdn.net/s_gy_zetrov. All Rights Reserved)

In the blue circle, S is the desired set of property values (e.g. fluorescence > α)

The blue circle is known as the stochastic predictive model (“Oracle”) that maps input sequence to set of property values (e.g. CDF).

DbAS is intimately related to EDAs:

在這裏插入圖片描述

Problems of this all

We just assumes oracle is unbiased and has good uncertainty estimates. There is “black holes” in these models (Neural Networks), for example: adversarial noise made image misclassified.

在這裏插入圖片描述

Credit: https://www.ibm.com/developerworks/community/blogs/Analytics4Gov/entry/Adversarial_Robustness_Toolbox_How_IBM_protects_your_Neural_Network_against_adversarial_attacks?lang=en
(Copyright © https://blog.csdn.net/s_gy_zetrov. All Rights Reserved)
So, in fact, models are not intelligent at all! Just works on some problem better!

Second session

在這裏插入圖片描述

Dr. Sharon Browning, Professor in the Department of Biostatistics at the University of Washington. Co-developed the popular BEAGLE software in collaboration with Dr. Brian Browning. Recent work includes investigations of the contribution of archaic humans to current-day human genomics, and extensions to the BEAGLE software involving identity-by-descent.

Finding Identity by Descent (IBD) segment in population samples

Tested on the UK Biobank dataset (500K individuals)
(Copyright © https://blog.csdn.net/s_gy_zetrov. All Rights Reserved)
Identity by Descent (IBD): segments of DNA that descents share with one ancestor.

Related paper:

在這裏插入圖片描述
https://www.biorxiv.org/content/10.1101/2019.12.12.874685v1

The smaller the population, the more IBD

(Copyright © https://blog.csdn.net/s_gy_zetrov. All Rights Reserved)

Third session

Dr. David Aanensen, the Director of the Centre for Genomic Pathogen Surveillance at the Wellcome Sanger Institute, Senior Group Leader in Genomic Surveillance at the Big Data Institute at the University of Oxford, the Director of the NIHR funded Global Health Research Unit on Genomic Surveillance of Antimicrobial Resistance.

Online platform for visualizing and identifying pathogen

Example of disease outbreak: note that trees shape were similar, first there are some branches then there is an outbreak.

在這裏插入圖片描述

Online platform

在這裏插入圖片描述
在這裏插入圖片描述
https://microreact.org/showcase

在這裏插入圖片描述
https://pathogen.watch/

Forth session

(Copyright © https://blog.csdn.net/s_gy_zetrov. All Rights Reserved)

Dr. Alexander Bouchard-Côté, Professor of Statistics at the University of British Columbia. Research focuses are computational statistics and machine learning, with applications to linguistics and biology.

Bayesian method to reconstruct single cell phylogenetic trees from copy number events such as those that arise in cancers with high genomic instability

The method is motivated by low-depth genome-wide data which can be obtained for increasingly large numbers of cells thanks to technologies such as Direct Library Preparation or 10x Single Cell Genomics. Computing the posterior distribution in this model at scale is challenging. Recent advances in the field of Bayesian computational statistics can be used to parallelize the posterior inference computation to an arbitrary number of cores, touching on topics such as non-reversible methods and change of measure approaches. The posterior inference methods described are available through an open source Bayesian modelling language called Blang, which can be used for a range of phylogenetic problems including more traditional phylogenetic models, as well as other Bayesian analysis problems. The motivating copy-number-based phylogenetic model is implemented in Blang and available in a cancer Bayesian phylogenetics and population genetics library which are actively developing. This library has been used to infer phylogenetic trees on over 4000 cells using over 60 cores.

在這裏插入圖片描述
https://www.stat.ubc.ca/~bouchard/blang/

在這裏插入圖片描述

(Copyright © https://blog.csdn.net/s_gy_zetrov. All Rights Reserved)


visitor tracker
訪客追蹤插件


發佈了196 篇原創文章 · 獲贊 474 · 訪問量 61萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章