-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Run on Apple Mac Silicon chip #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey! This code was only tested using NVIDIA GPUs, I believe they have a guide for PyTorch here: https://developer.apple.com/metal/pytorch/ :) |
Created a pull request to address this |
I checked out the code from the pull request and tried to run it. It's been going most of the weekend on my Mac. :) Specifically on these commands # Prepare NanoGPT data
python data/enwik8/prepare.py
python data/shakespeare_char/prepare.py
python data/text8/prepare.py Do these even need to be run? I am curious what is the actual minimal command set of most of the AI work is actual done through OpenAI, etc... |
These prepare the data for the nanogpt runs. The remaining commands run the baseline which should be machine dependent for things like training speed! |
Right, but I am not using a machine with a GPU. I want to offload as much of this as possible to a third-party tool (Colab? Some other service that has a GPU?). So the fact that this dependency exists is kind of a blocker for usage. Maybe there's something else I am missing about what this is for? |
I think modern machine learning is quite hard without a GPU. Later parts of the pipeline will attempt dozens of runs which could take hours each without a GPU. I would recommend services like lambda where you can rent GPUs per hour. This component that you are referring to is comparatively extremely cheap compared to what the AI scientist could choose to run. |
Why not use GPT4o-mini/Claude instead of a local nanogpt? Not totally sure of the value here for a hybrid approach given the cost of cutting off mac users since we're already providing API keys. . |
Additionally, is it REALLY a dependency? It looks like it creates an artifact that is used later when you’re using the GPT4o-mini. Is that artifact actually a requirement for subsequent steps? |
GPT4o/Claude is the foundation model that proposes ideas. NanoGPT is the actual model that is modified and trained. This is a different model for different templates. |
The preparation steps create the training data and baseline for comparison. Very much essential. Different templates have different preparation steps. |
I found that the current problem can only run on CPU model on Apple silicon chip. I don't know if the GPU method which is 'mps' can be added. I set the parameters as mps, but it didn't work.
The text was updated successfully, but these errors were encountered: