Skip to content

Restoring from VM Backups on Google Cloud

Isabelle Guyon edited this page Aug 6, 2019 · 17 revisions

See https://github.com/codalab/codalab-competitions/wiki/Developers-Instance-Hosts---Commands-and-Reference-Guide for common debug commands + docker container information.

Restoring from Snapshots on Google Cloud Storage

  • Start from your Google Cloud Storage Control Panel, and click VM Instances under Compute Engine in the side menu
img
  • Select create new instance on the top menu
img
  • Configure your machine options and name. We recommend for a production level Codalab server with heavy traffic between 4-8 vCpus/Cpus and between 16-32GB of RAM.
img
  • Under boot disk, select change
img
  • Select the snapshots tab.

  • Select the snapshot you wish to restore from. They will display the date they were created here as well.

img
  • Ensure the disk size and type is correct and submit the form. Note: We recommend at least 50+ gb of disk space.
img
  • Under firewall configuration, enable HTTP/HTTPS traffic
img
  • Expand Management, Security, Disks, Networking, Sole Tenancy

image

  • Add the tag allow-rabbitmq-and-flower under network tags. (This allows RabbitMQ/Celery/Etc to communicate with workers)

image

  • Submit the form and create your new instance from a snapshot backup.

  • SSH into the instance via the SSH button from the instances menu

img
  • Change directory to the Ubuntu user's codalab-competitions directory. (Note if you SSH in under the user Ubuntu it's in the same directory as you will be)
img
  • !OPTIONAL!
  • If the DNS is not yet configured to point to the instance IP (Ex: autodl.lri.fr is not pointing to the new IP) then you need to edit the Docker environment configuration file for the project if you would like to use it in the mean time. (.env) Sudo is not needed if SSH'ed in as Ubuntu. (To do that you need an editor, if you hate vim, run sudo apt install emacs to get emacs).
img
  • Change all references of your domain name, to the instance IP. The most important setting is CADDY_DOMAIN as this is what Caddy will try to serve. If the DNS/IP it's trying to serve doesn't match the IP it's being served from it will not work. You will see a message like autodl.lri.fr is not served from this instance. Make sure if using an IP to append :80 on the end to specify not to use SSL. Otherwise you will receive an SSL error as it will try to run with SSL enabled, but not be able to retrieve a certificate.
img
  • WARNING: Google cloud creates a NEW DYNAMIC IP every time you restore your VM, so unless you had a static IP assigned, you need to redo this procedure every time you restart your VM.

  • If you want to assign a new URL, then instead of the IP address, put the new URL in CADDY_DOMAIN. Do not forget to create an A record to make your domain point to your IP address at your ISP. Here is how it is done at Moniker (it may take a while for the DNS to propagate):

Arecord
  • Run docker-compose up -d to update all containers (Note: Your output should show the containers as having been recreated)
img
  • Verify you can connect via web browser to the instance

  • If you get the below error after following these steps, make sure you're not trying to use https.

img
  • If you get a message similar to: "The Codalab site is not currently available" then Codalab is probably still starting up. Check the django container logs with docker-compose logs -f django. The last few steps should be checking static files and running any migrations.

*OPTIONAL: If not restoring a specific domain or doing a test it is very likely the instance will not have the default worker enabled, and no compute workers will be attached. To re-enable the default worker you can simply rename or delete docker-compose.override.yaml. It may also be necessary to ensure the correct worker version is specified.

  • Verify all services are running with docker ps
img
  • Access the logs with the command docker-compose logs -f (Use docker-compose logs -f <container> to view a specific container's logs)
img
  • Upload test competitions + submissions and verify everything is working correctly. If your submissions get stuck, make sure you're submitting to the default queue.
Clone this wiki locally