多gpu訓練使用keras
multi_gpu_model(model, gpus, cpu_merge=True, cpu_relocation=False)
這個函數有個極其重要的參數,cpu_merge和cpu_relocation,這個參數會影響到你訓練成功或失敗,因爲比如你用了否定的,然後你訓練卻用了cpu存模型,或者相反,則會導致你更新參數失敗。
正確的案例應該模仿官方,我的方式是
with tf.device('/cpu:0'): predict_model = keras.models.Model(inputs=input_images, outputs=cls_pred)
muti_gpu_model = keras.utils.multi_gpu_model(model=predict_model, gpus=2)
freeze測試:
現在有個問題就是我要凍結參數時,我是凍結多gpu模型還是凍結原來的模型還是都凍結?
查看設備名字:
tf.test.gpu_device_name()
Returns the name of a GPU device if available or the empty string.
tf.contrib.eager.list_devices()
Names of the available devices, as a list
.
使用具體設備的函數
# Place the operations on device "GPU:0" in the "ps" job.
device_spec = DeviceSpec(job="ps", device_type="GPU", device_index=0)
with tf.device(device_spec):
# Both my_var and squared_var will be placed on /job:ps/device:GPU:0.
my_var = tf.Variable(..., name="my_variable")
squared_var = tf.square(my_var)
If a DeviceSpec
is partially specified, it will be merged with other DeviceSpec
s according to the scope in which it is defined. DeviceSpec
components defined in inner scopes take precedence over those defined in outer scopes.
with tf.device(DeviceSpec(job="train", )):
with tf.device(DeviceSpec(job="ps", device_type="GPU", device_index=0):
# Nodes created here will be assigned to /job:ps/device:GPU:0.
with tf.device(DeviceSpec(device_type="GPU", device_index=1):
# Nodes created here will be assigned to /job:train/device:GPU:1.
A DeviceSpec
consists of 5 components -- each of which is optionally specified:
- Job: The job name.
- Replica: The replica index.
- Task: The task index.
- Device type: The device type string (e.g. "CPU" or "GPU").
- Device index: The device index.
__init__
__init__(
job=None,
replica=None,
task=None,
device_type=None,
device_index=None
)
Create a new DeviceSpec
object.
Args:
job
: string. Optional job name.replica
: int. Optional replica index.task
: int. Optional task index.device_type
: Optional device type string (e.g. "CPU" or "GPU")device_index
: int. Optional device index. If left unspecified, device represents 'any' device_index.
封裝好的一個分支
tf.distribute