Pytorch 如何自动优化/调整模型超参

文章目录

背景

对于优化模型性能表现而言，主要可归纳为两种方式：

采用NAS的方式搜索，获得某个任务上最优的模型结构以及模型参数设置
优化模型参数

诚然，对于第一种方式来说，NAS对算力的消耗是巨大的，因而需要具备巨量计算资源才能够进行，因此具有较高的门槛；而第二种方式来说，消耗的资源要小很多，因而对于本钱小的用户来说，采用第二种是比较合理的方式；尤其是在诸如kaggle等比赛上，很多团队并不具备类似于google那样的算力，因而采用第二种方法提高模型表现是最重要的手段。

优化模型参数

首先需要搞清楚，这里所指的优化模型参数是指在深度学习时代优化模型的超参。什么是超参了？超参是指，必须由人工设定的模型参数，比如学习率，比如mini-batch 的batch size，比如衰减率等等。参数优化特指的是这一类超参。

通常而言，在传统机器学习时代，针对一些结构较为简单的模型，比如随机数森林；其超参数量有限，并且超参的取值范围一般来说比较小，且并非连续数字；比如随机森林的depth参数只能是取整数；因而在这种情况下主要采用的是网格搜索的方法即grid search 的方法进行合适超参的选取，即超参的优化。
很明显的是，采用grid search的方法不但浪费大量计算资源，并且效率无法保障。经过前人的验证，在传统机器学习时代形成了三种主要的调参方法：

grid search
random search（随机调参）
bayesian optimize （贝叶斯调参/贝叶斯优化）

其中普遍来说，后两者的效果要更好；尤其是bayesian optimzie 性能更好，更加优异；在当时广泛用在诸如kaggle等比赛上(我曾经见过有kaggle的比赛团队使用贝叶斯调参将模型accuracy从76% 调到95%)。

贝叶斯优化

贝叶斯优化的主体思想是基于贝叶斯原理，即根据现有发生的事情（一组超参下模型表现情况）来推断如果采用另一组超参模型会是什么样的表现；简而言之就是基于少量的现有超参组合来估计整个超参组合空间，从中挑选出最为合适的超参组合；这个过程常常采用基于高斯回归来做估计；具体的原理可以参考[1]链接。如果需要尝试下贝叶斯优化的效果，在使用传统算法时可以参考链接，这里有使用案例，可以配合sklearn 进行测试。我这里提供一段代码以供测试

# coding:utf-8

from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
from bayes_opt import BayesianOptimization

x,y = make_classification(n_samples=1000,n_features=10,n_classes=2)
rf = RandomForestClassifier()
#rf.fit(x,y)
print(np.mean(cross_val_score(rf,x,y,cv=20,scoring='roc_auc')))

def rf_cv(n_estimators,min_samples_split,max_features,max_depth):
    '''这里面输入的参量全是超参'''
    val = cross_val_score(
        RandomForestClassifier(n_estimators=int(n_estimators),
                               min_samples_split=int(min_samples_split),
                               max_features=min(max_features,0.999),
                               max_depth=int(max_depth),
                               random_state=2),
        x,
        y,
        scoring='roc_auc',cv=20
    ).mean()
    return val

rf_bo = BayesianOptimization(rf_cv,
                             {'n_estimators': (10, 250),
                              'min_samples_split': (2, 25),
                              'max_features': (0.1, 0.999),
                              'max_depth': (5, 15)},
                             )

rf_bo.maximize()

深度学习框架下的参数优化

那么现在问题来了，在深度学习框架下，一个模型往往比之前复杂的多，且计算资源消耗得多；即便是使用贝叶斯方法，理论上需要运行的次数更少，但是也难以承担这样的开销。
针对这样的问题，普遍的做法是针对特定的框架开发专门的平台来进行优化。

目前主流的优化方式有两种：1. google vizier 2.pytorch ax and Botorch [4]
其中前者是原先google内部进行调参所开发的工具，并于2017年发表相应论文[2]链接; 目前该方法仅仅公开了调用接口，优化过程需要在google的云平台上进行；当然google vizier实际上是一款强大的auto ML的框架，具有很多优良特性，但是针对普通开发者并不友好。
后者是facebook 推出的基于pytorch的贝叶斯调参工具以及适应性试验平台，适用性较为优良。一般来说针对普通需求的团队或者组织，出于快速开发的需要，还是建议使用pytorch ax botorch 为好。

平台安装

这里的平台安装部署非常简单，保持了pytorch 一贯的易于维护风格，直接使用pip 安装就可以了；安装过程可以参考
ax
botorch
不过需要注意的是，上面两个平台需要基于torch 1.5 版本

使用参考

使用AX 和 Botorch 优化我们自定义的模型是第一步，这里给出一段代码以供参考

class VAE(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 400)
        self.fc21 = nn.Linear(400, 20)
        self.fc22 = nn.Linear(400, 20)
        self.fc3 = nn.Linear(20, 400)
        self.fc4 = nn.Linear(400, 784)

    def encode(self, x):
        h1 = F.relu(self.fc1(x))
        return self.fc21(h1), self.fc22(h1)

    def reparameterize(self, mu, logvar):
        std = torch.exp(0.5*logvar)
        eps = torch.randn_like(std)
        return mu + eps*std

    def decode(self, z):
        h3 = F.relu(self.fc3(z))
        return torch.sigmoid(self.fc4(h3))

    def forward(self, x):
        mu, logvar = self.encode(x.view(-1, 784))
        z = self.reparameterize(mu, logvar)
        return self.decode(z), mu, logvar

vae_model = VAE().to(device)
vae_state_dict = torch.load(os.path.join(PRETRAINED_LOCATION, "mnist_vae.pt"), map_location=device)
vae_model.load_state_dict(vae_state_dict);

从上面可以看得出，这就是一个普通的pytorch 模型结构定义的方式,接下来就是构建贝叶斯优化要优化的目标函数，这当然是一个黑盒过程。这个函数其实也就是用于描述模型表现的一个函数

def score(y):
    """Returns a 'score' for each digit from 0 to 9. It is modeled as a squared exponential
    centered at the digit '3'.
    """
    return torch.exp(-2 * (y - 3)**2)


# Given the scoring function, we can now write our overall objective, which as discussed above, starts with an image and outputs a score. Let's say the objective computes the expected score given the probabilities from the classifier.

# In[6]:


def score_image_recognition(x):
    """The input x is an image and an expected score based on the CNN classifier and
    the scoring function is returned.
    """
    with torch.no_grad():
        probs = torch.exp(cnn_model(x))  # b x 10
        scores = score(torch.arange(10, device=device, dtype=dtype)).expand(probs.shape)
    return (probs * scores).sum(dim=1)


# Finally, we define a helper function `decode` that takes as input the parameters `mu` and `logvar` of the variational distribution and performs reparameterization and the decoding. We use batched Bayesian optimization to search over the parameters `mu` and `logvar`

# In[7]:


def decode(train_x):
    with torch.no_grad():
        decoded = vae_model.decode(train_x)
    return decoded.view(train_x.shape[0], 1, 28, 28)

然后就是使用Botorch 和Ax进行模型贝叶斯优化的过程，这个过程分为4个主体步骤，下面代码中一一介绍

# 1.  定义每一次的模型初始化
from botorch.models import SingleTaskGP
from gpytorch.mlls.exact_marginal_log_likelihood import ExactMarginalLogLikelihood


bounds = torch.tensor([[-6.0] * 20, [6.0] * 20], device=device, dtype=dtype)


def initialize_model(n=5):
    # generate training data  
    train_x = (bounds[1] - bounds[0]) * torch.rand(n, 20, device=device, dtype=dtype) + bounds[0]
    train_obj = score_image_recognition(decode(train_x))
    best_observed_value = train_obj.max().item()
    
    # define models for objective and constraint
    model = SingleTaskGP(train_X=train_x, train_Y=train_obj)
    model = model.to(train_x)
    
    mll = ExactMarginalLogLikelihood(model.likelihood, model)
    mll = mll.to(train_x)
    
    return train_x, train_obj, mll, model, best_observed_value

from botorch.optim import joint_optimize


BATCH_SIZE = 3

# 2. 定义要优化的参量
def optimize_acqf_and_get_observation(acq_func):
    """Optimizes the acquisition function, and returns a new candidate and a noisy observation"""
    
    # optimize 这里定义的调的超参是batch size
    candidates = joint_optimize(
        acq_function=acq_func,
        bounds=bounds,
        q=BATCH_SIZE,
        num_restarts=10,
        raw_samples=200,
    )

    # observe new values 
    new_x = candidates.detach()
    new_obj = score_image_recognition(decode(new_x))
    return new_x, new_obj
 
# 3. 定义优化过程
from botorch import fit_gpytorch_model
from botorch.acquisition.monte_carlo import qExpectedImprovement
from botorch.sampling.samplers import SobolQMCNormalSampler

seed=1
torch.manual_seed(seed)

N_BATCH = 50
MC_SAMPLES = 2000
best_observed = []

# call helper function to initialize model
train_x, train_obj, mll, model, best_value = initialize_model(n=5)
best_observed.append(best_value)

# 4. 开始进行 优化
import warnings
warnings.filterwarnings("ignore")

print(f"\nRunning BO ", end='')
from matplotlib import pyplot as plt

# run N_BATCH rounds of BayesOpt after the initial random batch
for iteration in range(N_BATCH):    

    # fit the model
    fit_gpytorch_model(mll)

    # define the qNEI acquisition module using a QMC sampler
    qmc_sampler = SobolQMCNormalSampler(num_samples=MC_SAMPLES, seed=seed)
    qEI = qExpectedImprovement(model=model, sampler=qmc_sampler, best_f=best_value)

    # optimize and get new observation
    new_x, new_obj = optimize_acqf_and_get_observation(qEI)

    # update training points
    train_x = torch.cat((train_x, new_x))
    train_obj = torch.cat((train_obj, new_obj))

    # update progress
    best_value = score_image_recognition(decode(train_x)).max().item()
    best_observed.append(best_value)

    # reinitialize the model so it is ready for fitting on next iteration
    model.set_train_data(train_x, train_obj, strict=False)
    
    print(".", end='')

基本来说，采用Botorch Ax进行优化就是上面的4个步骤，主要是要把优化的参数在candidates里面列好。
上述的步骤是针对一般性的优化函数，比距高斯回归优化函数的步骤，Botorch 还支持自定义的RBF优化函数；关于这一块的内容可以具体参考Botorch的相关文档。目前有一些研究结果[3]表明一些特定的RBF函数有较好的优化效果。

参考

https://www.cnblogs.com/marsggbo/p/9866764.html
Google Vizier: A Service for Black-Box Optimization
Efficient Hyperparameter Optimization of Deep Learning Algorithms Using Deterministic RBF Surrogates
botorch:programmable bayesian opt in pytorch

Pytorch 如何优化/调整模型参数

Pytorch 如何自动优化/调整模型超参

文章目录

背景

优化模型参数

贝叶斯优化

深度学习框架下的参数优化

平台安装

使用参考

参考

加權平均融合消除圖像拼接的拼縫（Python 代碼）

Python time 模塊time 函數的時間單位

tensorflow 報錯DLL load failed：找不到指定的模塊

Tensorflow 多個損失函數合成與多個損失函數多次操作的區別（Tensorflow: Multiple loss functions vs Multiple training ops）

Pycharm 報錯 buffer = _builtin_open(filename, 'rb')

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

Pytorch 如何 优化/调整 模型参数

Pytorch 如何自动优化/调整 模型超参

文章目录

背景

优化模型参数

贝叶斯优化

深度学习框架下的参数优化

平台安装

使用参考

参考

Pytorch 如何优化/调整模型参数

Pytorch 如何自动优化/调整模型超参