Welcome to the BU Data Science Association Docker Workshop! This workshop demonstrates the power of Docker and Dev Containers for creating reproducible data science environments.
By the end of this workshop, you'll understand:
- How Docker containers provide consistent, reproducible environments
- The benefits of using Dev Containers for data science projects
- How to set up a complete Python data science stack with Docker
- Best practices for containerized development workflows
- Docker Desktop installed and running
- Visual Studio Code with the Dev Containers extension
- Basic familiarity with Python and Jupyter notebooks
-
Clone this repository:
git clone <repository-url> cd Docker
-
Open in VS Code:
code . -
Reopen in Container:
- Press
Ctrl+Shift+P(Windows/Linux) orCmd+Shift+P(Mac) - Type "Dev Containers: Reopen in Container"
- Select the command and wait for the container to build
- Press
-
Start exploring:
- Open
docker_workshop_demo.ipynb - Run the cells to see Docker in action!
- Open
-
Build the Docker image:
docker build -t docker-workshop . -
Run the container:
docker run -it -p 8888:8888 -v $(pwd):/workspace docker-workshop -
Start Jupyter Lab:
jupyter lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root
docker_workshop_demo.ipynb- Main demonstration notebook showcasing multiple Python librariesDockerfile- Custom Docker image definition with optimized Python environmentrequirements.txt- Python dependencies for reproducible environment.devcontainer/devcontainer.json- VS Code Dev Container configuration
The workshop notebook includes hands-on examples of:
-
Data Manipulation (pandas, numpy)
- Creating and processing datasets
- Statistical analysis
- Data cleaning and transformation
-
Data Visualization (matplotlib, seaborn, plotly)
- Static plots and charts
- Interactive visualizations
- 3D plotting and dashboards
-
Web & API Integration (requests, BeautifulSoup)
- HTTP requests and API consumption
- Web scraping techniques
- Data fetching from external sources
-
Machine Learning (scikit-learn)
- Classification algorithms
- Model training and evaluation
- Feature importance analysis
- ** Environment Isolation**: No conflicts with your local setup
- ** Reproducibility**: Same environment on every machine
- ** Easy Setup**: One command to get started
- ** Consistency**: Everyone uses identical library versions
- ** Portability**: Runs anywhere Docker runs
-
Containerization Fundamentals
- What are containers and why use them?
- Docker vs. virtual machines
- Container lifecycle management
-
Dev Containers
- VS Code integration
- Configuration best practices
- Extension management in containers
-
Data Science Workflows
- Dependency management
- Jupyter integration
- Version control with containers
- Base Image: Python 3.11 slim
- Key Libraries: pandas 2.0+, matplotlib, seaborn, plotly, scikit-learn
- Development Tools: Jupyter Lab, black formatter, flake8 linter
- Container Features: Non-root user, optimized caching, security best practices
- Port Forwarding: Jupyter Lab (8888), HTTP server (8080), Dev server (3000)
- Volume Mounting: Workspace folder mounted for persistent changes
- Extension Auto-Install: Python, Jupyter, and development extensions
- Environment Variables: Pre-configured for optimal development experience
Container won't start:
- Ensure Docker Desktop is running
- Check available disk space (>2GB recommended)
- Restart Docker Desktop if needed
Jupyter won't connect:
- Verify port 8888 is forwarded
- Check the container logs for errors
- Try restarting the container
Extensions not working:
- Reload VS Code window
- Check if extensions are installed in container
- Try rebuilding the container
# Check container status
docker ps
# View container logs
docker logs docker-workshop-container
# Execute commands in running container
docker exec -it docker-workshop-container bash
# Rebuild container (if needed)
# In VS Code: Ctrl+Shift+P -> "Dev Containers: Rebuild Container"Found an issue or have suggestions? We welcome contributions!
- Fork the repository
- Create a feature branch
- Make your changes
- Test in the containerized environment
- Submit a pull request
This workshop is open source and available under the MIT License.
After completing this workshop:
- Experiment: Modify the notebook and add your own data analysis
- Extend: Add new libraries to
requirements.txtand rebuild - Share: Use this setup as a template for your own projects
- Learn More: Explore Docker Compose for multi-container applications
Happy Dockerizing!
BU Data Science Association - Fall 2025