|
发表于 2023-4-8 16:36:56
|
显示全部楼层
本帖最后由 xhuaeo_o 于 2023-4-8 16:47 编辑
提示错误后,增加虚拟内存到100G,还是错误,一般需要多大的虚拟内存呀?
===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/ ... iewform?usp=sf_link
================================================================================
CUDA SETUP: Loading binary D:\Program Files\lora-scripts\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll...
use 8-bit AdamW optimizer | {}
override steps. steps for 10 epochs is / 指定エポックまでのステップ数: 2400
running training / 学習開始
num train images * repeats / 学習画像の数×繰り返し回数: 240
num reg images / 正則化画像の数: 0
num batches per epoch / 1epochのバッチ数: 240
num epochs / epoch数: 10
batch size per device / バッチサイズ: 1
gradient accumulation steps / 勾配を合計するステップ数 = 1
total optimization steps / 学習ステップ数: 2400
steps: 0%| | 0/2400 [00:00<?, ?it/s]epoch 1/10
Error no kernel image is available for execution on the device at line 167 in file D:\ai\tool\bitsandbytes\csrc\ops.cu
Traceback (most recent call last):
File "C:\Users\xXx\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\xXx\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code
exec(code, run_globals)
File "D:\Program Files\lora-scripts\venv\Scripts\accelerate.exe\__main__.py", line 7, in <module>
File "D:\Program Files\lora-scripts\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "D:\Program Files\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "D:\Program Files\lora-scripts\venv\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\\Program Files\\lora-scripts\\venv\\Scripts\\python.exe', './sd-scripts/train_network.py', '--enable_bucket', '--pretrained_model_name_or_path=./sd-models/model.ckpt', '--train_data_dir=./train/cyy', '--output_dir=./output', '--logging_dir=./logs', '--log_prefix=cyy', '--resolution=512,512', '--network_module=networks.lora', '--max_train_epochs=10', '--learning_rate=1e-4', '--unet_lr=1e-4', '--text_encoder_lr=1e-5', '--lr_scheduler=cosine_with_restarts', '--lr_warmup_steps=0', '--lr_scheduler_num_cycles=1', '--network_dim=32', '--network_alpha=32', '--output_name=cyy', '--train_batch_size=1', '--save_every_n_epochs=2', '--mixed_precision=fp16', '--save_precision=fp16', '--seed=1337', '--cache_latents', '--clip_skip=2', '--prior_loss_weight=1', '--max_token_length=225', '--caption_extension=.txt', '--save_model_as=safetensors', '--min_bucket_reso=256', '--max_bucket_reso=1024', '--keep_tokens=0', '--xformers', '--shuffle_caption', '--use_8bit_adam']' returned non-zero exit status 1.
Train finished
D:\FFOutput\capture_20230408154239361.jpg
D:\FFOutput\capture_20230408160841725.jpg
|
|