deepfacelab中文网

 找回密码
 立即注册(仅限QQ邮箱)
查看: 1109|回复: 5

求大佬别离开帮忙看下LORA训练遇到的问题!!

[复制链接]

10

主题

41

帖子

469

积分

初级丹师

Rank: 3Rank: 3

积分
469
 楼主| 发表于 2023-3-26 17:32:57 | 显示全部楼层 |阅读模式
星级打分
  • 1
  • 2
  • 3
  • 4
  • 5
平均分:NAN  参与人数:0  我的评分:未评
prepare tokenizer
update token length: 225
Use DreamBooth method.
ignore directory without repeats / 繰り返し回数のないディレクトリを無視します: <built-in function dir>
ignore directory without repeats / 繰り返し回数のないディレクトリを無視します: <built-in function dir>
prepare images.
found directory train\wang\3_wang contains 300 image files
900 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
  batch_size: 1
  resolution: (512, 512)
  enable_bucket: True
  min_bucket_reso: 256
  max_bucket_reso: 512
  bucket_reso_steps: 64
  bucket_no_upscale: False

  [Subset 0 of Dataset 0]
    image_dir: "train\wang\3_wang"
    image_count: 300
    num_repeats: 3
    shuffle_caption: True
    keep_tokens: 0
    caption_dropout_rate: 0.0
    caption_dropout_every_n_epoches: 0
    caption_tag_dropout_rate: 0.0
    color_aug: False
    flip_aug: False
    face_crop_aug_range: None
    random_crop: False
    is_reg: False
    class_tokens: wang
    caption_extension: .txt


[Dataset 0]
loading image sizes.
100%|██████████████████████████████████████████████████████████████████████████████| 150/150 [00:00<00:00, 3123.46it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数(繰り返し回数を含む)
bucket 0: resolution (512, 512), count: 450
mean ar error (without repeats): 0.0
prepare accelerator
Using accelerator 0.15.0 or above.
load StableDiffusion checkpoint
loading u-net: <All keys matched successfully>
loading vae: <All keys matched successfully>
loading text encoder: <All keys matched successfully>
Replace CrossAttention.forward to use xformers
[Dataset 0]
caching latents.
100%|████████████████████████████████████████████████████████████████████████████████| 150/150 [01:59<00:00,  1.26it/s]
import network module: networks.lora
create LoRA network. base dim (rank): 32, alpha: 32.0
create LoRA for Text Encoder: 72 modules.
create LoRA for U-Net: 192 modules.
enable LoRA for text encoder
enable LoRA for U-Net
prepare optimizer, data loader etc.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/ ... iewform?usp=sf_link
================================================================================
CUDA SETUP: Loading binary F:\AI\lora-scripts\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll...
use 8-bit AdamW optimizer | {}
override steps. steps for 10 epochs is / 指定エポックまでのステップ数: 4500
Traceback (most recent call last):
  File "F:\AI\lora-scripts\sd-scripts\train_network.py", line 699, in <module>
    train(args)
  File "F:\AI\lora-scripts\sd-scripts\train_network.py", line 216, in train
    unet, text_encoder, network, optimizer, train_dataloader, lr_scheduler = accelerator.prepare(
  File "F:\AI\lora-scripts\venv\lib\site-packages\accelerate\accelerator.py", line 876, in prepare
    result = tuple(
  File "F:\AI\lora-scripts\venv\lib\site-packages\accelerate\accelerator.py", line 877, in <genexpr>
    self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
  File "F:\AI\lora-scripts\venv\lib\site-packages\accelerate\accelerator.py", line 741, in _prepare_one
    return self.prepare_model(obj, device_placement=device_placement)
  File "F:\AI\lora-scripts\venv\lib\site-packages\accelerate\accelerator.py", line 912, in prepare_model
    model = model.to(self.device)
  File "F:\AI\lora-scripts\venv\lib\site-packages\transformers\modeling_utils.py", line 1749, in to
    return super().to(*args, **kwargs)
  File "F:\AI\lora-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 927, in to
    return self._apply(convert)
  File "F:\AI\lora-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
    module._apply(fn)
  File "F:\AI\lora-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
    module._apply(fn)
  File "F:\AI\lora-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
    module._apply(fn)
  [Previous line repeated 3 more times]
  File "F:\AI\lora-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 602, in _apply
    param_applied = fn(param)
  File "F:\AI\lora-scripts\venv\lib\site-packages\torch\nn\modules\module.py", line 925, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 3.43 GiB already allocated; 0 bytes free; 3.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Traceback (most recent call last):
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "F:\AI\lora-scripts\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "F:\AI\lora-scripts\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "F:\AI\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
    simple_launcher(args)
  File "F:\AI\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['F:\\AI\\lora-scripts\\venv\\Scripts\\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train/wang', '--output_dir=./output', '--logging_dir=./logs', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=wangyuchun', '--train_batch_size=1', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=512', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--use_8bit_adam']' returned non-zero exit status 1.
Train finished

回复

使用道具 举报

11

主题

685

帖子

4101

积分

高级丹圣

Rank: 13Rank: 13Rank: 13Rank: 13

积分
4101
发表于 2023-3-26 17:55:37 | 显示全部楼层
CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 3.43 GiB already allocated; 0 bytes free; 3.50 GiB reserved in total by PyTorch) If reserved memory 看清楚训练需要的配置,你这都爆显存了吧
回复 支持 反对

使用道具 举报

0

主题

10

帖子

513

积分

高级丹师

Rank: 5Rank: 5

积分
513
发表于 2023-3-26 17:56:00 | 显示全部楼层
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 3.43 GiB already allocated; 0 bytes free; 3.50 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.

显存炸了应该是
回复 支持 反对

使用道具 举报

10

主题

41

帖子

469

积分

初级丹师

Rank: 3Rank: 3

积分
469
 楼主| 发表于 2023-3-26 17:59:28 | 显示全部楼层
lisen 发表于 2023-3-26 17:56
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 3.43  ...

哦哦感谢大佬
回复 支持 反对

使用道具 举报

10

主题

41

帖子

469

积分

初级丹师

Rank: 3Rank: 3

积分
469
 楼主| 发表于 2023-3-26 18:00:35 | 显示全部楼层
AKERSHUS 发表于 2023-3-26 17:55
CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 4.00 GiB total capacity; 3.43 GiB already al ...

哦哦感谢大佬,大佬之前你说租GPU怎么搞?在哪里租?
回复 支持 反对

使用道具 举报

11

主题

685

帖子

4101

积分

高级丹圣

Rank: 13Rank: 13Rank: 13Rank: 13

积分
4101
发表于 2023-3-28 00:24:50 | 显示全部楼层
546078000 发表于 2023-3-26 18:00
哦哦感谢大佬,大佬之前你说租GPU怎么搞?在哪里租?

腾讯云,每月可以秒杀gpu服务器。45元T4
回复 支持 反对

使用道具 举报

QQ|Archiver|手机版|deepfacelab中文网 |网站地图

GMT+8, 2024-9-23 09:28 , Processed in 0.092004 second(s), 10 queries , Redis On.

Powered by Discuz! X3.4

Copyright © 2001-2020, Tencent Cloud.

快速回复 返回顶部 返回列表