大家看看这是什么问题呀？

横戈在马 · 发表于 2022-5-15 16:36:32

星级打分

1
2
3
4
5

平均分:NAN 参与人数:0 我的评分:未评

==================== Model Summary ====================
==                                                 ==
==          Model name: TestMuskIronMan_SAEHD    ==
==                                                 ==
==    Current iteration: 0                         ==
==                                                 ==
==------------------ Model Options ------------------==
==                                                 ==
==          resolution: 256                      ==
==          face_type: wf                      ==
==    models_opt_on_gpu: True                      ==
==                archi: liae-ud                   ==
==             ae_dims: 512                      ==
==             e_dims: 128                      ==
==             d_dims: 128                      ==
==          d_mask_dims: 128                      ==
==    masked_training: False                   ==
==    eyes_mouth_prio: False                   ==
==          uniform_yaw: False                   ==
==       blur_out_mask: False                   ==
==          adabelief: True                      ==
==          lr_dropout: n                         ==
==          random_warp: True                      ==
==    random_hsv_power: 0.01                      ==
==    true_face_power: 0.0                      ==
==    face_style_power: 0.0                      ==
==       bg_style_power: 0.0                      ==
==             ct_mode: rct                      ==
==             clipgrad: True                      ==
==             pretrain: False                   ==
==    autobackup_hour: 4                         ==
== write_preview_history: False                   ==
==          target_iter: 0                         ==
==    random_src_flip: False                   ==
==    random_dst_flip: False                   ==
==          batch_size: 16                      ==
==          gan_power: 0.1                      ==
==       gan_patch_size: 32                      ==
==             gan_dims: 32                      ==
==                                                 ==

=======================================================
Starting. Press "Enter" to stop training and save model.

Trying to do the first iteration. If an error occurs, reduce the model parameters.

!!!
Windows 10 users IMPORTANT notice. You should set this setting in order to work correctly.
https://i.imgur.com/B7cmDCB.jpg
!!!
You are training the model from scratch. It is strongly recommended to use a pretrained model to speed up the training and improve the quality.

Error: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[16,1024,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
      [[node Conv2D_26 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:101) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

      [[concat_15/concat/_151]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) Resource exhausted: OOM when allocating tensor with shape[16,1024,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
      [[node Conv2D_26 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:101) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node Conv2D_26:
Pad_26 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:87)
decoder/upscalem2/conv1/weight/read (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:61)

Input Source operations connected to node Conv2D_26:
Pad_26 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:87)
decoder/upscalem2/conv1/weight/read (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:61)

Original stack trace for 'Conv2D_26':
  File "threading.py", line 884, in _bootstrap
  File "threading.py", line 916, in _bootstrap_inner
  File "threading.py", line 864, in run
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
debug=debug)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 193, in __init__
self.on_initialize()
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 424, in on_initialize
gpu_pred_src_src, gpu_pred_src_srcm = self.decoder(gpu_src_code)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in __call__
return self.forward(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 243, in forward
m = self.upscalem2(m)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in __call__
return self.forward(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 71, in forward
x = self.conv1(x)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\LayerBase.py", line 14, in __call__
return self.forward(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py", line 101, in forward
x = tf.nn.conv2d(x, weight, strides, 'VALID', dilations=dilations, data_format=nn.data_format)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 2397, in conv2d
name=name)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 972, in conv2d
data_format=data_format, dilations=dilations, name=name)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal
op_def=op_def)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in __init__
self._traceback = tf_stack.extract_stack_for_node(self._c_op)

Traceback (most recent call last):
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1375, in _do_call
return fn(*args)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1360, in _run_fn
target_list, run_metadata)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1453, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[16,1024,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
      [[{{node Conv2D_26}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

      [[concat_15/concat/_151]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) Resource exhausted: OOM when allocating tensor with shape[16,1024,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
      [[{{node Conv2D_26}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\mainscripts\Trainer.py", line 129, in trainerThread
iter, iter_time = model.train_one_iter()
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 474, in train_one_iter
losses = self.onTrainOneIter()
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 774, in onTrainOneIter
src_loss, dst_loss = self.src_dst_train (warped_src, target_src, target_srcm, target_srcm_em, warped_dst, target_dst, target_dstm, target_dstm_em)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 584, in src_dst_train
self.target_dstm_em:target_dstm_em,
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 968, in run
run_metadata_ptr)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1191, in _run
feed_dict_tensor, options, run_metadata)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1369, in _do_run
run_metadata)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1394, in _do_call
raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[16,1024,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
      [[node Conv2D_26 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:101) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

      [[concat_15/concat/_151]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) Resource exhausted: OOM when allocating tensor with shape[16,1024,64,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
      [[node Conv2D_26 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:101) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node Conv2D_26:
Pad_26 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:87)
decoder/upscalem2/conv1/weight/read (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:61)

Input Source operations connected to node Conv2D_26:
Pad_26 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:87)
decoder/upscalem2/conv1/weight/read (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py:61)

Original stack trace for 'Conv2D_26':
  File "threading.py", line 884, in _bootstrap
  File "threading.py", line 916, in _bootstrap_inner
  File "threading.py", line 864, in run
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
debug=debug)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 193, in __init__
self.on_initialize()
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 424, in on_initialize
gpu_pred_src_src, gpu_pred_src_srcm = self.decoder(gpu_src_code)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in __call__
return self.forward(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 243, in forward
m = self.upscalem2(m)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in __call__
return self.forward(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\archis\DeepFakeArchi.py", line 71, in forward
x = self.conv1(x)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\LayerBase.py", line 14, in __call__
return self.forward(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2D.py", line 101, in forward
x = tf.nn.conv2d(x, weight, strides, 'VALID', dilations=dilations, data_format=nn.data_format)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 2397, in conv2d
name=name)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 972, in conv2d
data_format=data_format, dilations=dilations, name=name)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal
op_def=op_def)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in __init__
self._traceback = tf_stack.extract_stack_for_node(self._c_op)

asd123aaa · 发表于 2022-5-15 16:44:47

win10 要开硬件加速GPU计划，在显示设置-》图形设置里面

横戈在马 · 发表于 2022-5-15 17:20:17

asd123aaa 发表于 2022-5-15 16:44
win10 要开硬件加速GPU计划，在显示设置-》图形设置里面

好的，我试试啊

zzz上海 · 发表于 2022-5-15 17:34:42

爆显存了，bs开小点啊。

左骏 · 发表于 2022-5-15 19:16:42

学习学习经验

echo999 · 发表于 2022-5-18 16:47:05

显卡内存不足导致的OOM

烟鬼 · 发表于 2022-5-18 17:08:31

哥你这个参数3090估计都难

横戈在马 · 发表于 2022-5-21 06:04:14

谢谢大家的回复，改小Patch Size之后，就好了

		自动登录	找回密码
密码			立即注册（仅限QQ邮箱）