Skip to content

Run on Apple Mac Silicon chip #43

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
ncamcl opened this issue Aug 18, 2024 · 10 comments
Open

Run on Apple Mac Silicon chip #43

ncamcl opened this issue Aug 18, 2024 · 10 comments

Comments

@ncamcl
Copy link

ncamcl commented Aug 18, 2024

I found that the current problem can only run on CPU model on Apple silicon chip. I don't know if the GPU method which is 'mps' can be added. I set the parameters as mps, but it didn't work.

@conglu1997
Copy link
Collaborator

Hey! This code was only tested using NVIDIA GPUs, I believe they have a guide for PyTorch here: https://developer.apple.com/metal/pytorch/ :)

@junhua
Copy link

junhua commented Aug 22, 2024

Hey! This code was only tested using NVIDIA GPUs, I believe they have a guide for PyTorch here: https://developer.apple.com/metal/pytorch/ :)

Created a pull request to address this

@conglu1997

@cvanvlack
Copy link

I checked out the code from the pull request and tried to run it. It's been going most of the weekend on my Mac. :)

Specifically on these commands

# Prepare NanoGPT data
python data/enwik8/prepare.py
python data/shakespeare_char/prepare.py
python data/text8/prepare.py

Do these even need to be run? I am curious what is the actual minimal command set of most of the AI work is actual done through OpenAI, etc...

@conglu1997
Copy link
Collaborator

conglu1997 commented Aug 26, 2024

These prepare the data for the nanogpt runs.

The remaining commands run the baseline which should be machine dependent for things like training speed!

@cvanvlack
Copy link

Right, but I am not using a machine with a GPU. I want to offload as much of this as possible to a third-party tool (Colab? Some other service that has a GPU?).

So the fact that this dependency exists is kind of a blocker for usage.

Maybe there's something else I am missing about what this is for?

@conglu1997
Copy link
Collaborator

conglu1997 commented Aug 26, 2024

I think modern machine learning is quite hard without a GPU. Later parts of the pipeline will attempt dozens of runs which could take hours each without a GPU. I would recommend services like lambda where you can rent GPUs per hour.

This component that you are referring to is comparatively extremely cheap compared to what the AI scientist could choose to run.

@mruckman1
Copy link

Why not use GPT4o-mini/Claude instead of a local nanogpt? Not totally sure of the value here for a hybrid approach given the cost of cutting off mac users since we're already providing API keys. .

@cvanvlack
Copy link

Additionally, is it REALLY a dependency? It looks like it creates an artifact that is used later when you’re using the GPT4o-mini.

Is that artifact actually a requirement for subsequent steps?

@conglu1997
Copy link
Collaborator

Why not use GPT4o-mini/Claude instead of a local nanogpt? Not totally sure of the value here for a hybrid approach given the cost of cutting off mac users since we're already providing API keys. .

GPT4o/Claude is the foundation model that proposes ideas. NanoGPT is the actual model that is modified and trained. This is a different model for different templates.

@conglu1997
Copy link
Collaborator

conglu1997 commented Aug 28, 2024

Additionally, is it REALLY a dependency? It looks like it creates an artifact that is used later when you’re using the GPT4o-mini.

Is that artifact actually a requirement for subsequent steps?

The preparation steps create the training data and baseline for comparison. Very much essential. Different templates have different preparation steps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants