新手小白第一帖，求助!!

softglow · 发表于 2023-2-23 12:04:43

星级打分

1
2
3
4
5

平均分:NAN 参与人数:0 我的评分:未评

本人白小纯，笔记本3050，4g显存，quick96没问题，跑的挺快的，有几个问题求助大神：

1quick96是不是跑多少万次面部都不清晰锐利，总感觉做了模糊。

2我用saehd训练不了，什么问题?

======================== Model Summary ========================
==                                                          ==
==          Model name: hhh_SAEHD                         ==
==                                                          ==
==    Current iteration: 0                               ==
==                                                          ==
==---------------------- Model Options ----------------------==
==                                                          ==
==          resolution: 128                               ==
==          face_type: f                               ==
==    models_opt_on_gpu: True                            ==
==                archi: liae-ud                         ==
==             ae_dims: 256                               ==
==             e_dims: 64                               ==
==             d_dims: 64                               ==
==          d_mask_dims: 22                               ==
==    masked_training: True                            ==
==    eyes_mouth_prio: True                            ==
==          uniform_yaw: False                            ==
==       blur_out_mask: True                            ==
==          adabelief: True                            ==
==          lr_dropout: n                               ==
==          random_warp: True                            ==
==    random_hsv_power: 0.0                               ==
==    true_face_power: 0.0                               ==
==    face_style_power: 0.0                               ==
==       bg_style_power: 0.0                               ==
==             ct_mode: none                            ==
==             clipgrad: False                            ==
==             pretrain: False                            ==
==    autobackup_hour: 1                               ==
== write_preview_history: False                            ==
==          target_iter: 0                               ==
==    random_src_flip: False                            ==
==    random_dst_flip: True                            ==
==          batch_size: 4                               ==
==          gan_power: 0.0                               ==
==       gan_patch_size: 16                               ==
==             gan_dims: 16                               ==
==                                                          ==
==----------------------- Running On ------------------------==
==                                                          ==
==       Device index: 0                               ==
==                Name: NVIDIA GeForce RTX 3050 Laptop GPU ==
==                VRAM: 1.63GB                            ==
==                                                          ==
===============================================================
Starting. Press "Enter" to stop training and save model.

Trying to do the first iteration. If an error occurs, reduce the model parameters.

!!!
Windows 10 users IMPORTANT notice. You should set this setting in order to work correctly.
https://i.imgur.com/B7cmDCB.jpg
!!!
You are training the model from scratch. It is strongly recommended to use a pretrained model to speed up the training and improve the quality.

Error: 2 root error(s) found.
  (0) Resource exhausted: failed to allocate memory
      [[node mul_229 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:63) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

      [[concat_8/concat/_123]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) Resource exhausted: failed to allocate memory
      [[node mul_229 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:63) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node mul_229:
src_dst_opt/ms_decoder/upscalem0/conv1/weight_0/read (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:37)

Input Source operations connected to node mul_229:
src_dst_opt/ms_decoder/upscalem0/conv1/weight_0/read (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:37)

Original stack trace for 'mul_229':
  File "threading.py", line 884, in _bootstrap
  File "threading.py", line 916, in _bootstrap_inner
  File "threading.py", line 864, in run
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
debug=debug)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 193, in __init__
self.on_initialize()
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 564, in on_initialize
src_dst_loss_gv_op = self.src_dst_opt.get_update_op (nn.average_gv_list (gpu_G_loss_gvs))
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py", line 63, in get_update_op
m_t = self.beta_1*ms + (1.0-self.beta_1) * g
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 1076, in _run_op
return tensor_oper(a.value(), *args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1400, in r_binary_op_wrapper
return func(x, y, name=name)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1710, in _mul_dispatch
return multiply(x, y, name=name)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py", line 530, in multiply
return gen_math_ops.mul(x, y, name)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 6245, in mul
"Mul", x=x, y=y, name=name)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal
op_def=op_def)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in __init__
self._traceback = tf_stack.extract_stack_for_node(self._c_op)

Traceback (most recent call last):
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1375, in _do_call
return fn(*args)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1360, in _run_fn
target_list, run_metadata)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1453, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: failed to allocate memory
      [[{{node mul_229}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

      [[concat_8/concat/_123]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) Resource exhausted: failed to allocate memory
      [[{{node mul_229}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\mainscripts\Trainer.py", line 129, in trainerThread
iter, iter_time = model.train_one_iter()
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 474, in train_one_iter
losses = self.onTrainOneIter()
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 774, in onTrainOneIter
src_loss, dst_loss = self.src_dst_train (warped_src, target_src, target_srcm, target_srcm_em, warped_dst, target_dst, target_dstm, target_dstm_em)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 584, in src_dst_train
self.target_dstm_em:target_dstm_em,
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 968, in run
run_metadata_ptr)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1191, in _run
feed_dict_tensor, options, run_metadata)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1369, in _do_run
run_metadata)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1394, in _do_call
raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: failed to allocate memory
      [[node mul_229 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:63) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

      [[concat_8/concat/_123]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

  (1) Resource exhausted: failed to allocate memory
      [[node mul_229 (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:63) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node mul_229:
src_dst_opt/ms_decoder/upscalem0/conv1/weight_0/read (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:37)

Input Source operations connected to node mul_229:
src_dst_opt/ms_decoder/upscalem0/conv1/weight_0/read (defined at D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py:37)

Original stack trace for 'mul_229':
  File "threading.py", line 884, in _bootstrap
  File "threading.py", line 916, in _bootstrap_inner
  File "threading.py", line 864, in run
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
debug=debug)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 193, in __init__
self.on_initialize()
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 564, in on_initialize
src_dst_loss_gv_op = self.src_dst_opt.get_update_op (nn.average_gv_list (gpu_G_loss_gvs))
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\optimizers\AdaBelief.py", line 63, in get_update_op
m_t = self.beta_1*ms + (1.0-self.beta_1) * g
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\variables.py", line 1076, in _run_op
return tensor_oper(a.value(), *args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1400, in r_binary_op_wrapper
return func(x, y, name=name)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py", line 1710, in _mul_dispatch
return multiply(x, y, name=name)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\math_ops.py", line 530, in multiply
return gen_math_ops.mul(x, y, name)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_math_ops.py", line 6245, in mul
"Mul", x=x, y=y, name=name)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal
op_def=op_def)
  File "D:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in __init__
self._traceback = tf_stack.extract_stack_for_node(self._c_op)

万分感谢

softglow · 发表于 2023-2-24 07:54:32

本帖最后由 softglow 于 2023-2-24 09:04 编辑

求助各位大神，是显存不足么

bigball · 发表于 2023-2-25 13:50:58

只要报错OOM就是显存不足了，quick96默认分辨率比较低，所以能跑起来

bigball · 发表于 2023-2-25 13:54:21

quick96不是迭代次数越多越好，看你素材了，有的就是迭代越多，机器学的越来越糟糕，也跟参数有关，一般情况quick96训练还可以，但要是正八经做一个效果，quick96不行，没法调参数。你这个可能就是硬件问题了……

softglow · 发表于 2023-2-25 19:17:11

非常感谢，应该是显卡不行，把models_opt_on_gpu该成no就跑起来了，万分感谢朋友们的帮助

softglow · 发表于 2023-2-25 19:23:30

quick96确实不行，我用的原始素材都是1920=1080的近景人脸，跑20万动作都跟的上，但脸部模糊遮罩生硬，只能体验下

come3002 · 发表于 2023-3-3 10:38:23

本帖最后由 come3002 于 2023-3-3 10:43 编辑

楼主。dfl 显存是生产力。你的笔记本的 3050显卡可用的运存只有1g 太少了。一般都是3G左右
建议看看教程，把显存利用率提高一些，至少提升到2.7 以上。
另外还有一种策略，开启虚拟内存。把虚拟内存设置到跟 dfl 相同的盘。大小调成最小70，最大100g

我的笔记本还是 1650 比楼主还弱，可用的3.2g，虚拟内存 70-100g。可用运行猫的 224 也可用运行256.
远远好于128的效果
1650 显存.jpg

		自动登录	找回密码
密码			立即注册（仅限QQ邮箱）

新手小白第一帖，求助!!

万事如意节日勋章

开心娱乐节日勋章