大家看看这是什么问题呀？

横戈在马 · 发表于 2022-5-15 16:36:33

星级打分

1
2
3
4
5

平均分:NAN 参与人数:0 我的评分:未评

==================== Model Summary ====================
==                                                 ==
==          Model name: TestMuskIronMan_SAEHD    ==
==                                                 ==
==    Current iteration: 0                         ==
==                                                 ==
==------------------ Model Options ------------------==
==                                                 ==
==          resolution: 256                      ==
==          face_type: wf                      ==
==    models_opt_on_gpu: True                      ==
==                archi: liae-ud                   ==
==             ae_dims: 512                      ==
==             e_dims: 128                      ==
==             d_dims: 128                      ==
==          d_mask_dims: 128                      ==
==    masked_training: False                   ==
==    eyes_mouth_prio: False                   ==
==          uniform_yaw: False                   ==
==       blur_out_mask: False                   ==
==          adabelief: True                      ==
==          lr_dropout: n                         ==
==          random_warp: True                      ==
==    random_hsv_power: 0.01                      ==
==    true_face_power: 0.0                      ==
==    face_style_power: 0.0                      ==
==       bg_style_power: 0.0                      ==
==             ct_mode: rct                      ==
==             clipgrad: True                      ==
==             pretrain: False                   ==
==    autobackup_hour: 4                         ==
== write_preview_history: False                   ==
==          target_iter: 0                         ==
==    random_src_flip: False                   ==
==    random_dst_flip: False                   ==
==          batch_size: 16                      ==
==          gan_power: 0.1                      ==
==       gan_patch_size: 32                      ==
==             gan_dims: 32                      ==
==                                                 ==

=======================================================
Starting. Press "Enter" to stop training and save model.

Trying to do the first iteration. If an error occurs, reduce the model parameters.

!!!
Windows 10 users IMPORTANT notice. You should set this setting in order to work correctly.
https://i.imgur.com/B7cmDCB.jpg
!!!
You are training the model from scratch. It is strongly recommended to use a pretrained model to speed up the training and improve the quality.

Error: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[16,1024,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
      [[node Conv2D_26 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:101) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

      [[concat_15/concat/_151]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) Resource exhausted: OOM when allocating tensor with shape[16,1024,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
      [[node Conv2D_26 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:101) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node Conv2D_26:
Pad_26 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:87)
decoder/upscalem2/conv1/weight/read (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:61)

Input Source operations connected to node Conv2D_26:
Pad_26 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:87)
decoder/upscalem2/conv1/weight/read (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:61)

Original stack trace for 'Conv2D_26':
  File "threading.py", line 884, in _bootstrap
  File "threading.py", line 916, in _bootstrap_inner
  File "threading.py", line 864, in run
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
debug=debug)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 193, in __init__
self.on_initialize()
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 424, in on_initialize
gpu_pred_src_src, gpu_pred_src_srcm = self.decoder(gpu_src_code)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in __call__
return self.forward(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 243, in forward
m = self.upscalem2(m)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in __call__
return self.forward(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 71, in forward
x = self.conv1(x)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\LayerBase.py", line 14, in __call__
return self.forward(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py", line 101, in forward
x = tf.nn.conv2d(x, weight, strides, 'VALID', dilations=dilations, data_format=nn.data_format)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 2397, in conv2d
name=name)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 972, in conv2d
data_format=data_format, dilations=dilations, name=name)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal
op_def=op_def)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in __init__
self._traceback = tf_stack.extract_stack_for_node(self._c_op)

Traceback (most recent call last):
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1375, in _do_call
return fn(*args)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1360, in _run_fn
target_list, run_metadata)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1453, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[16,1024,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
      [[{{node Conv2D_26}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

      [[concat_15/concat/_151]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) Resource exhausted: OOM when allocating tensor with shape[16,1024,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
      [[{{node Conv2D_26}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\mainscripts\Trainer.py", line 129, in trainerThread
iter, iter_time = model.train_one_iter()
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 474, in train_one_iter
losses = self.onTrainOneIter()
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 774, in onTrainOneIter
src_loss, dst_loss = self.src_dst_train (warped_src, target_src, target_srcm, target_srcm_em, warped_dst, target_dst, target_dstm, target_dstm_em)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 584, in src_dst_train
self.target_dstm_em:target_dstm_em,
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 968, in run
run_metadata_ptr)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1191, in _run
feed_dict_tensor, options, run_metadata)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1369, in _do_run
run_metadata)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1394, in _do_call
raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[16,1024,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
      [[node Conv2D_26 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:101) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

      [[concat_15/concat/_151]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) Resource exhausted: OOM when allocating tensor with shape[16,1024,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
      [[node Conv2D_26 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:101) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node Conv2D_26:
Pad_26 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:87)
decoder/upscalem2/conv1/weight/read (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:61)

Input Source operations connected to node Conv2D_26:
Pad_26 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:87)
decoder/upscalem2/conv1/weight/read (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:61)

Original stack trace for 'Conv2D_26':
  File "threading.py", line 884, in _bootstrap
  File "threading.py", line 916, in _bootstrap_inner
  File "threading.py", line 864, in run
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
debug=debug)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 193, in __init__
self.on_initialize()
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 424, in on_initialize
gpu_pred_src_src, gpu_pred_src_srcm = self.decoder(gpu_src_code)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in __call__
return self.forward(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 243, in forward
m = self.upscalem2(m)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in __call__
return self.forward(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 71, in forward
x = self.conv1(x)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\LayerBase.py", line 14, in __call__
return self.forward(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py", line 101, in forward
x = tf.nn.conv2d(x, weight, strides, 'VALID', dilations=dilations, data_format=nn.data_format)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 2397, in conv2d
name=name)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 972, in conv2d
data_format=data_format, dilations=dilations, name=name)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal
op_def=op_def)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in __init__
self._traceback = tf_stack.extract_stack_for_node(self._c_op)

横戈在马 · 发表于 2022-5-15 16:37:23

我百度了下，好像是显存问题，但是我这些参数都不大呀，不知道为啥

saimon · 发表于 2022-5-15 16:43:25

什么显卡啊，BS竟然开到16

横戈在马 · 发表于 2022-5-15 16:52:46

saimon 发表于 2022-5-15 16:43
什么显卡啊，BS竟然开到16

我bs改小点，试试

WinKK · 发表于 2022-5-15 17:25:27

==             e_dims: 128                      ==
==             d_dims: 128                      ==
==          d_mask_dims: 128                      ==

这几个参数很奇特啊，你这模型哪里来的呀？

来路不明的模型还是还要浪费时间也妙。

zzz上海 · 发表于 2022-5-15 17:34:06

爆显存了，bs开小点啊。

benlei · 发表于 2022-5-15 17:42:22

一般操作都是调整bs

1450898298 · 发表于 2022-5-15 20:18:34

BS竟然开到16太高了

		自动登录	找回密码
密码			立即注册（仅限QQ邮箱）