各位大神，这个是不是显存不够，能不能在那设置下。

cdf2dfl · 发表于 2021-1-17 12:49:04

星级打分

1
2
3
4
5

平均分:NAN 参与人数:0 我的评分:未评

Running trainer.

Choose one of saved models, or enter a name to create a new model.
[r] : rename
[d] : delete

[0] : 256liae - latest
: 0
0
Loading 256liae_SAEHD model...

Choose one or several GPU idxs (separated by comma).

[CPU] : CPU
  [0] : Quadro K1200

[0] Which GPU indexes to choose? : 0
0

Press enter in 2 seconds to override model settings.
[3] Autobackup every N hour ( 0..24 ?:help ) : 3
3
[n] Write preview history ( y/n ?:help ) : n
[0] Target iteration : 0
0
[n] Flip faces randomly ( y/n ?:help ) :
n
[8] Batch_size ( ?:help ) : 8
8
[n] Masked training ( y/n ?:help ) : y
[y] Eyes priority ( y/n ?:help ) : y
[y] Uniform yaw distribution of samples ( y/n ?:help ) :
y
[y] Place models and optimizer on GPU ( y/n ?:help ) :
y
[y] Use learning rate dropout ( n/y/cpu ?:help ) :
y
[n] Enable random warp of samples ( y/n ?:help ) :
n
[0.1] GAN power ( 0.0 .. 10.0 ?:help ) :
0.1
[0.0] Face style power ( 0.0..100.0 ?:help ) :
0.0
[0.0] Background style power ( 0.0..100.0 ?:help ) :
0.0
[none] Color transfer for src faceset ( none/rct/lct/mkl/idt/sot ?:help ) :
none
[n] Enable gradient clipping ( y/n ?:help ) :
n
[n] Enable pretraining mode ( y/n ?:help ) :
n
Initializing models: 100%|###############################################################| 7/7 [00:06<00:00,  1.08it/s]
Loading samples: 100%|##############################################################| 799/799 [00:02<00:00, 389.62it/s]
Sort by yaw: 100%|#################################################################| 128/128 [00:00<00:00, 3767.04it/s]
Loading samples: 100%|##############################################################| 464/464 [00:01<00:00, 462.90it/s]
Sort by yaw: 100%|#################################################################| 128/128 [00:00<00:00, 6404.28it/s]
============= Model Summary ==============
==                                     ==
==          Model name: 256liae_SAEHD ==
==                                     ==
==    Current iteration: 1041024    ==
==                                     ==
==----------- Model Options ------------==
==                                     ==
==          resolution: 256          ==
==          face_type: wf          ==
==    models_opt_on_gpu: True       ==
==                archi: liae-ud    ==
==             ae_dims: 256          ==
==             e_dims: 64          ==
==             d_dims: 64          ==
==          d_mask_dims: 22          ==
==    masked_training: True       ==
==          eyes_prio: True       ==
==          uniform_yaw: True       ==
==          lr_dropout: y          ==
==          random_warp: False       ==
==          gan_power: 0.1          ==
==    true_face_power: 0.0          ==
==    face_style_power: 0.0          ==
==       bg_style_power: 0.0          ==
==             ct_mode: none       ==
==             clipgrad: False       ==
==             pretrain: False       ==
==    autobackup_hour: 3          ==
== write_preview_history: False       ==
==          target_iter: 0          ==
==          random_flip: False       ==
==          batch_size: 8          ==
==                                     ==
==------------- Running On -------------==
==                                     ==
==       Device index: 0          ==
==                Name: Quadro K1200  ==
==                VRAM: 4.00GB       ==
==                                     ==
==========================================
Starting. Press "Enter" to stop training and save model.
Error: OOM when allocating tensor with shape[3,3,512,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[3,3,512,2048] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
      [[node src_dst_opt_1/Select_18 (defined at D:\AI2\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\core\leras\ops\__init__.py:207) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

      [[node concat_46 (defined at D:\AI2\DeepFaceLab_NVIDIA\_internal\DeepFaceLab\models\Model_SAEHD\Model.py:484) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

ps2wwttq · 发表于 2021-1-17 14:02:37

是，你只4G的显存，你把batch_size调到8，显存当然溢了，你的显卡最多只能调到4，搞不好还要调到2

cdf2dfl · 发表于 2021-1-17 14:32:04

受教了，1.0时经常为8

xiaoxin · 发表于 2021-1-17 17:08:54

192的尺寸都够你跑了，256的配置不太行。

Buer · 发表于 2021-1-17 23:40:08

看标题盲猜要调小bs

cdf2dfl · 发表于 2021-1-18 09:46:56

的确，BS调到4，能正常跑了。谢谢大家。

solobabbit · 发表于 2021-1-19 08:51:39

P1000表示256奔不起来

jiaoxiangyu · 发表于 2021-1-19 23:41:49

oom。。。嗯

linker666 · 发表于 2021-2-2 11:04:49

我也遇到类似的问题了回去试试

wanghao · 发表于 2021-2-24 21:34:16

有过类似的问题，果然到处转转会有收获，回去慢慢调试

		自动登录	找回密码
密码			立即注册（仅限QQ邮箱）

各位大神，这个是不是显存不够，能不能在那设置下。

万事如意节日勋章