-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Segmentation Fault #105
Comments
Yeah that produces the same issue I was able to test it with a different device as the root node and it worked fine, might be an issue with some corporate setup. The new root node also has more memory, could be the reason. If I test it with another different device with memory similar to my mac I'll update you guys. Thanks for the help though. This is a really awesome tool btw, wonder how it will perform for llama3.1 401b 👀 |
I met the same issue days ago. And segmentation fault is usually caused by non-correct pointer. So you may use gdb and ulimit setting to locate the wrong pointer's pos. Or you can use printf to locate it also. It's not a problem caused by a specific error, but any wrong pointers. My error is a pointer out of buffer. :-) Use ulimit -c 1000 and some settings will produce a core file when meet segmentation fault and you can use gdb to read this file to find the pos of wrong pointer. Good luck |
I wanted to check if this problem occurs on Linux, but it seems no.
Llama 3.1 8B Q40:
Llama 3 8B Q40:
Definitely it's not a simple case. |
I suspect the
EDIT: Meantime I found a tiny bug, maybe it was related. |
Hello. Frankly, I think that is because I have disabled IPv6. |
@lipere123 could you please provide a bit more context? What model are you trying to run, and how much RAM does each device have? |
Hello. Thank you very much for your quick answer. Network is a passthrouh like that : The distributed storage is ceph, mounted via hostpath LXC, ceph-cli will comme with an update version of my infrastructure code when I have the time. I have a minimum install on the master. End of scriptexit 0 install_dllama.sh End of scriptexit 0 dllama-models.sh dllama-inference-llama3_1_8b.sh dllama-run.sh ./dllama-inference-llama3_1_8b.sh On my workers log : Thanks again. |
@lipere123 in this case I think the reason is that you are trying to run 7 nodes:
Distriubted Llama supports |
Hello. Okay, now it is working for 8b. Also for 405b : #!/bin/bash #!/bin/bash A few questions :
Thanks in advance. |
@lipere123 try to run 405b with smaller context:
Yes, this could be improved. Now the main goal for the
Not started.
Feel free to create PR.
I don't know, but
Nop. |
With Llama38b, inference works, however api and chat do not. It produces a segmentation fault.
Workers terminate like:
I'm using 3 linux machines with 8gb of ram each, and an intel mac with 16gb as the root node.
This all works fine with tiny llama though.
The text was updated successfully, but these errors were encountered: