视觉理解模型微调的lora推理不正确的问题 #3000
zhuchen1109
started this conversation in
General
Replies: 3 comments 1 reply
-
mlp.0 mlp.1 这俩应该不用 tp。 |
Beta Was this translation helpful? Give feedback.
0 replies
-
我排查发现,在vision的mlp.fc1层,其推理结果的tensor里包含了大量nan值。想请教下,这可能是什么原因呢?我使用transformer推理没有出现这样的问题。推理代码对应位置: |
Beta Was this translation helpful? Give feedback.
1 reply
-
我梳理了继承于BaseLinear所有layer的is_tp和all_reduce,都修改为False。还是有nan值,想请教下,这个可能是什么原因导致的呢? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
我使用swift微调Qwen2-VL-7B-Instruct模型,微调的 "target_modules": [ "up_proj", "attn.proj", "qkv", "down_proj", "mlp.0", "gate_proj", "k_proj","o_proj", "fc2", "q_proj", "mlp.2", "v_proj", "fc1" ],包含了vision部分的attn.proj、mlp.0、mlp.2。


遇到第一个问题是,patch.py add_adapters方法里,mod.lora_adapters[target_name] = lora,这里target_name不能包含".",我这里修改代码逻辑绕过的,这个逻辑修改能在后面load_lora_weights时正确的加载权重,修改如下截图所示:
遇到第二问题是,visual.merger.mlp这层因没有实现BaseLinear,mlp.0和mlp.2这二层不能加载lora权重,我将原来的nn.Linear修改为BaseLinear实现,修改如下截图所示:
经过上述修改后,我能正常的初始化模型并正常工作,但在我跑验证集的时候,发现结果都是错的。
想请教下,我这修改是有什么问题吗,我还需要做什么工作才能正常工作呢?
Beta Was this translation helpful? Give feedback.
All reactions