求求各位老大帮忙，都练不了下去了

sanchez0120 · 发表于 2023-2-27 21:45:01

星级打分

1
2
3
4
5

平均分:NAN 参与人数:0 我的评分:未评

我用默认参数来训练，结果出来都是错误的，麻烦各位帮忙看下到底是哪里的原因

感谢大佬帮助

Running trainer.

[new] No saved models found. Enter a name of a new model :
new

Model first run.

Choose one or several GPU idxs (separated by comma).

[CPU] : CPU
  [0] : NVIDIA GeForce RTX 3060 Laptop GPU

[0] Which GPU indexes to choose? :
0

[0] Autobackup every N hour ( 0..24 ?:help ) :
0
[n] Write preview history ( y/n ?:help ) :
n
[0] Target iteration :
0
[n] Flip SRC faces randomly ( y/n ?:help ) :
n
[y] Flip DST faces randomly ( y/n ?:help ) :
y
[4] Batch_size ( ?:help ) :
4
[128] Resolution ( 64-640 ?:help ) :
128
[f] Face type ( h/mf/f/wf/head ?:help ) : wf
wf
[liae-ud] AE architecture ( ?:help ) : df-udt
df-udt
[256] AutoEncoder dimensions ( 32-1024 ?:help ) :
256
[64] Encoder dimensions ( 16-256 ?:help ) :
64
[64] Decoder dimensions ( 16-256 ?:help ) :
64
[22] Decoder mask dimensions ( 16-256 ?:help ) :
22
[y] Masked training ( y/n ?:help ) :
y
[n] Eyes and mouth priority ( y/n ?:help ) : y
[n] Uniform yaw distribution of samples ( y/n ?:help ) : y
[n] Blur out mask ( y/n ?:help ) : y
[y] Place models and optimizer on GPU ( y/n ?:help ) : y
[y] Use AdaBelief optimizer? ( y/n ?:help ) : y
[n] Use learning rate dropout ( n/y/cpu ?:help ) : y
y
[y] Enable random warp of samples ( y/n ?:help ) : n
[0.0] Random hue/saturation/light intensity ( 0.0 .. 0.3 ?:help ) :
0.0
[0.0] GAN power ( 0.0 .. 5.0 ?:help ) : 0.1
0.1
[16] GAN patch size ( 3-640 ?:help ) :
16
[16] GAN dimensions ( 4-512 ?:help ) :
16
[0.0] 'True face' power. ( 0.0000 .. 1.0 ?:help ) :
0.0
[0.0] Face style power ( 0.0..100.0 ?:help ) :
0.0
[0.0] Background style power ( 0.0..100.0 ?:help ) :
0.0
[none] Color transfer for src faceset ( none/rct/lct/mkl/idt/sot ?:help ) : rct
rct
[n] Enable gradient clipping ( y/n ?:help ) :
n
[n] Enable pretraining mode ( y/n ?:help ) :
n
Initializing models: 100%|###############################################################| 7/7 [00:02<00:00,  3.41it/s]
Loaded 14256 packed faces from C:\DeepFaceLab_NVIDIA_RTX3000_series\workspace\data_src\aligned
Sort by yaw: 100%|##################################################################| 128/128 [00:00<00:00, 462.57it/s]
Loaded 10229 packed faces from C:\DeepFaceLab_NVIDIA_RTX3000_series\workspace\data_dst\aligned
Sort by yaw: 100%|##################################################################| 128/128 [00:00<00:00, 675.91it/s]
======================== Model Summary ========================
==                                                          ==
==          Model name: new_SAEHD                         ==
==                                                          ==
==    Current iteration: 0                               ==
==                                                          ==
==---------------------- Model Options ----------------------==
==                                                          ==
==          resolution: 128                               ==
==          face_type: wf                               ==
==    models_opt_on_gpu: True                            ==
==                archi: df-udt                            ==
==             ae_dims: 256                               ==
==             e_dims: 64                               ==
==             d_dims: 64                               ==
==          d_mask_dims: 22                               ==
==    masked_training: True                            ==
==    eyes_mouth_prio: True                            ==
==          uniform_yaw: True                            ==
==       blur_out_mask: True                            ==
==          adabelief: True                            ==
==          lr_dropout: y                               ==
==          random_warp: False                            ==
==    random_hsv_power: 0.0                               ==
==    true_face_power: 0.0                               ==
==    face_style_power: 0.0                               ==
==       bg_style_power: 0.0                               ==
==             ct_mode: rct                               ==
==             clipgrad: False                            ==
==             pretrain: False                            ==
==    autobackup_hour: 0                               ==
== write_preview_history: False                            ==
==          target_iter: 0                               ==
==    random_src_flip: False                            ==
==    random_dst_flip: True                            ==
==          batch_size: 4                               ==
==          gan_power: 0.1                               ==
==       gan_patch_size: 16                               ==
==             gan_dims: 16                               ==
==                                                          ==
==----------------------- Running On ------------------------==
==                                                          ==
==       Device index: 0                               ==
==                Name: NVIDIA GeForce RTX 3060 Laptop GPU ==
==                VRAM: 3.41GB                            ==
==                                                          ==
===============================================================
Starting. Press "Enter" to stop training and save model.

Trying to do the first iteration. If an error occurs, reduce the model parameters.

!!!
Windows 10 users IMPORTANT notice. You should set this setting in order to work correctly.
https://i.imgur.com/B7cmDCB.jpg
!!!
You are training the model from scratch. It is strongly recommended to use a pretrained model to speed up the training and improve the quality.

Error: OOM when allocating tensor with shape[4,32,65,65] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
      [[node conv2d_transpose_1 (defined at C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2DTranspose.py:81) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

Errors may have originated from an input operation.
Input Source operations connected to node conv2d_transpose_1:
stack_1 (defined at C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2DTranspose.py:74)
D_src/upconvs_1/weight/read (defined at C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2DTranspose.py:43)
concat_4 (defined at C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\models\PatchDiscriminator.py:184)

Original stack trace for 'conv2d_transpose_1':
  File "threading.py", line 884, in _bootstrap
  File "threading.py", line 916, in _bootstrap_inner
  File "threading.py", line 864, in run
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
debug=debug)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 193, in __init__
self.on_initialize()
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 518, in on_initialize
gpu_pred_src_src_d2          = self.D_src(gpu_pred_src_src_masked_opt)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in __call__
return self.forward(*args, **kwargs)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\models\PatchDiscriminator.py", line 183, in forward
x = tf.nn.leaky_relu( upconv(x), 0.2 )
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\LayerBase.py", line 14, in __call__
return self.forward(*args, **kwargs)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2DTranspose.py", line 81, in forward
x = tf.nn.conv2d_transpose(x, weight, output_shape, strides, padding=self.padding, data_format=nn.data_format)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 2613, in conv2d_transpose
name=name)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 2698, in conv2d_transpose_v2
name=name)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 1291, in conv2d_backprop_input
name=name)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal
op_def=op_def)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in __init__
self._traceback = tf_stack.extract_stack_for_node(self._c_op)

Traceback (most recent call last):
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1375, in _do_call
return fn(*args)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1360, in _run_fn
target_list, run_metadata)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1453, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[4,32,65,65] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
      [[{{node conv2d_transpose_1}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\mainscripts\Trainer.py", line 129, in trainerThread
iter, iter_time = model.train_one_iter()
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 474, in train_one_iter
losses = self.onTrainOneIter()
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 774, in onTrainOneIter
src_loss, dst_loss = self.src_dst_train (warped_src, target_src, target_srcm, target_srcm_em, warped_dst, target_dst, target_dstm, target_dstm_em)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 584, in src_dst_train
self.target_dstm_em:target_dstm_em,
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 968, in run
run_metadata_ptr)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1191, in _run
feed_dict_tensor, options, run_metadata)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1369, in _do_run
run_metadata)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\client\session.py", line 1394, in _do_call
raise type(e)(node_def, op, message)  # pylint: disable=no-value-for-parameter
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[4,32,65,65] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
      [[node conv2d_transpose_1 (defined at C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2DTranspose.py:81) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info. This isn't available when running in Eager mode.

Errors may have originated from an input operation.
Input Source operations connected to node conv2d_transpose_1:
stack_1 (defined at C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2DTranspose.py:74)
D_src/upconvs_1/weight/read (defined at C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2DTranspose.py:43)
concat_4 (defined at C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\models\PatchDiscriminator.py:184)

Original stack trace for 'conv2d_transpose_1':
  File "threading.py", line 884, in _bootstrap
  File "threading.py", line 916, in _bootstrap_inner
  File "threading.py", line 864, in run
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\mainscripts\Trainer.py", line 58, in trainerThread
debug=debug)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\ModelBase.py", line 193, in __init__
self.on_initialize()
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\models\Model_SAEHD\Model.py", line 518, in on_initialize
gpu_pred_src_src_d2          = self.D_src(gpu_pred_src_src_masked_opt)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\models\ModelBase.py", line 117, in __call__
return self.forward(*args, **kwargs)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\models\PatchDiscriminator.py", line 183, in forward
x = tf.nn.leaky_relu( upconv(x), 0.2 )
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\LayerBase.py", line 14, in __call__
return self.forward(*args, **kwargs)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\DeepFaceLab\core\leras\layers\Conv2DTranspose.py", line 81, in forward
x = tf.nn.conv2d_transpose(x, weight, output_shape, strides, padding=self.padding, data_format=nn.data_format)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 2613, in conv2d_transpose
name=name)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\util\dispatch.py", line 206, in wrapper
return target(*args, **kwargs)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\nn_ops.py", line 2698, in conv2d_transpose_v2
name=name)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\ops\gen_nn_ops.py", line 1291, in conv2d_backprop_input
name=name)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 750, in _apply_op_helper
attrs=attr_protos, op_def=op_def)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 3569, in _create_op_internal
op_def=op_def)
  File "C:\DeepFaceLab_NVIDIA_RTX3000_series\_internal\python-3.6.8\lib\site-packages\tensorflow\python\framework\ops.py", line 2045, in __init__
self._traceback = tf_stack.extract_stack_for_node(self._c_op)

by100 · 发表于 2023-2-27 21:58:30

多看看论坛教程，OOM一般是显存不够

come3002 · 发表于 2023-2-28 14:19:40

Device index: 0                               ==
==                Name: NVIDIA GeForce RTX 3060 Laptop GPU ==
==                VRAM: 3.41GB                            ==
==                                                          ==
==========================
dfl这个软件对显卡的显存要求高。显存就是生产力

3060 可用显存只有3.41G啊。3.4G太少了。跑个一般模型就爆显存。

3050 8G 可用的显存都可以达到7.1G

ronld · 发表于 2023-2-28 17:26:51

笔记本的3060？显存报错，要么换个小模型，要么换个大显卡

lknet · 发表于 2023-3-1 11:15:20

虚拟内存怼高点试试,然后Place models and optimizer on GPU这个选n,把模型放进内存,会慢很多不过能跑起来.

sapphireshi · 发表于 2023-3-5 11:30:11

试一试软件下载区置顶的ice版本dfl，应该可以

		自动登录	找回密码
密码			立即注册（仅限QQ邮箱）

求求各位老大帮忙，都练不了下去了

万事如意节日勋章

开心娱乐节日勋章