Skip to content

Commit

Permalink
Merge pull request #4 from hcp-uw/dev
Browse files Browse the repository at this point in the history
added some pages
  • Loading branch information
elimelt authored Jan 12, 2024
2 parents 36d14a1 + e5c87a0 commit ac60f68
Show file tree
Hide file tree
Showing 7 changed files with 243 additions and 0 deletions.
8 changes: 8 additions & 0 deletions docs/tech/info/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"label": "Information",
"position": 3,
"link": {
"type": "generated-index",
"description": "The following documentation contains relevant technical information that might help you while developing your project."
}
}
88 changes: 88 additions & 0 deletions docs/tech/info/ai.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Helpful Info for AI/ML Projects

Below is a list of helpful resources for AI/ML projects. Whether you are starting from the ground up and training your own models, or using a pre-configured solution, these resources will help you get started.

## Commonly Used AI/ML Frameworks and Libraries

### TensorFlow

TensorFlow is an open-source software library for machine learning across a range of tasks, and developed by Google to meet their needs for systems capable of building and training neural networks to detect and decipher patterns and correlations, analogous to the learning and reasoning which humans use. It is currently used for both research and production at Google products, including speech recognition, Gmail, Google Photos, and search, many of which were previously performed by standard pattern recognition algorithms.

### PyTorch

PyTorch is an open-source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing, primarily developed by Facebook's AI Research lab (FAIR). It is free and open-source software released under the Modified BSD license. Although the Python interface is more polished and the primary focus of development, PyTorch also has a C++ interface.

### Keras

Keras is an open-source software library that provides a Python interface for artificial neural networks. Keras acts as an interface for the TensorFlow library. Keras was developed to enable deep learning engineers to build and experiment with different models very quickly. Just as TensorFlow is a higher-level framework than Python, Keras is an even higher-level framework and provides additional abstractions. It was developed as part of the research effort of project ONEIROS (Open-ended Neuro-Electronic Intelligent Robot Operating System), and its primary author and maintainer is François Chollet, a Google engineer. Chollet also is the author of the XCeption deep neural network model.

### Scikit-learn

Scikit-learn is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support vector machines, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

### NumPy

NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

### Pandas

Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. It is free software released under the three-clause BSD license. The name is derived from the term "panel data", an econometrics term for data sets that include observations over multiple time periods for the same individuals.

### Matplotlib

Matplotlib is a plotting library for the Python programming language and its numerical mathematics extension NumPy. It provides an object-oriented API for embedding plots into applications using general-purpose GUI toolkits like Tkinter, wxPython, Qt, or GTK+. There is also a procedural "pylab" interface based on a state machine (like OpenGL), designed to closely resemble that of MATLAB, though its use is discouraged. SciPy makes use of Matplotlib.

### SciPy

SciPy is a free and open-source Python library used for scientific computing and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering.

### OpenCV

OpenCV is a library of programming functions mainly aimed at real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage then Itseez (which was later acquired by Intel). The library is cross-platform and free for use under the open-source BSD license.

### Jupyter Notebook

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more.

## AI/ML Datasets

### Kaggle

Kaggle is an online community of data scientists and machine learners, owned by Google LLC. Kaggle allows users to find and publish data sets, explore and build models in a web-based data-science environment, work with other data scientists and machine learning engineers, and enter competitions to solve data science challenges. Kaggle got its start by offering machine learning competitions and now also offers a public data platform, a cloud-based workbench for data science, and short form AI education.

### UCI Machine Learning Repository

The UCI Machine Learning Repository is a collection of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. The archive was created as an ftp archive in 1987 by David Aha and fellow graduate students at UC Irvine. Since that time, it has been widely used by students, educators, and researchers all over the world as a primary source of machine learning data sets. As an indication of the impact of the archive, it has been cited over 1000 times, making it one of the top 100 most cited "papers" in all of computer science. The current version of the web site was designed in 2007 by Arthur Asuncion and David Newman, and was developed by Arthur Asuncion, UC Irvine.

### Google Dataset Search

Google Dataset Search is a search engine from Google that helps researchers locate online data that is freely available for use. The company launched the service on September 5, 2018, and stated that the product was targeted at scientists and data journalists. The service indexes data from government agency databases, public sources, and digital libraries. It was inspired by the Fake News Challenge and the work of Altmetric.

### AWS Public Datasets

AWS hosts a variety of public datasets that anyone can access for free. This includes datasets from the U.S. Census Bureau, NASA, NOAA, and many other organizations and companies.

### Microsoft Research Open Data

Microsoft Research Open Data is a collection of free datasets from Microsoft Research to advance state-of-the-art research in areas such as natural language processing, computer vision, and domain specific sciences.

### Stanford Large Network Dataset Collection

The Stanford Large Network Dataset Collection (SNAP) is a collection of datasets from a variety of domains and disciplines. SNAP is designed to facilitate empirical research in network science and network mining. SNAP is being developed by Jure Leskovec and collaborators at Stanford University, with the help of many contributors.

### Google Cloud Public Datasets

Google Cloud Public Datasets provide a playground for those new to big data and data analysis and offers a powerful data repository of more than 100 public datasets from different industries, allowing you to join these with your own to produce new insights. Google Cloud Public Datasets are hosted on Google Cloud Storage and can be accessed by anyone.

### Data.gov

Data.gov is a U.S. government website launched in late May 2009 by the then Federal Chief Information Officer (CIO) of the United States, Vivek Kundra. According to its website, The purpose of Data.gov is to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government. The site is a repository for federal, state, local, and tribal government information, made available to the public. Its datasets are available in nine languages, including English, French, German, and Spanish. It contains data from a range of federal agencies, covering agriculture, business, climate, consumer, ecosystems, education, energy, finance, health, local government, manufacturing, ocean, public safety, and science and research.

### Datahub

Datahub is a community-run catalogue of useful sets of data on the Internet. You can collect links here to data from around the web for yourself and others to use, or search for data that others have collected.

### Data.world

Data.world is a social network for data people. It's a platform for data scientists and analysts to find and share data, connect with other users, and work together to solve data problems.

Binary file added docs/tech/info/assets/git-diagram.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
58 changes: 58 additions & 0 deletions docs/tech/info/databases.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# Databases

This is a collection of resources for learning about databases.

## Introduction

A database is a collection of data that is organized so that it can be easily accessed, managed, and updated. Databases are used in a wide variety of applications, and are crucial to persisting data in web applications.

## Database Management Systems

A database management system (DBMS) is a software system that allows users to define, create, and maintain a database. It provides an interface for users to interact with the database, and allows users to perform operations such as querying, updating, and deleting data.

### Relational Databases

A relational database is a type of database that stores data in tables. Each table consists of rows and columns, and each row represents a single record. Each column represents a single attribute of the record.

Relational databases are the most common type of database, and are used in a wide variety of applications. They are typically used for storing structured data, such as user information, product information, and financial records.

#### MySQL

MySQL is an open-source relational database management system (RDBMS) based on SQL. It is one of the most popular databases in the world, and is used by many large companies, including Facebook, Google, and Twitter.

#### PostgreSQL

PostgreSQL is an open-source relational database management system (RDBMS) based on SQL. It is another one of the most popular databases in the world, and is used by many large companies, including Apple, Netflix, and Spotify.

#### Supabase

Supabase is a feature rich open source Firebase alternative. It is a hosted database that provides a variety of features, including authentication, authorization, and real-time updates. It is built on top of PostgreSQL, and is designed to be easy to use and integrate with existing applications. It also has a free tier that includes a fully managed Postgres instance, making it a great choice for small projects.

#### SQLite

SQLite is an open-source relational database management system (RDBMS) based on SQL. It is one of the most widely deployed databases in the world, and is comparatively lightweight and easy to use, making it a popular choice for embedded systems, mobile applications, and beginner developers.


### Non-Relational Databases

A non-relational database is a type of database that does not store data in tables. Instead, it stores data in a variety of different formats, such as key-value pairs, documents, and graphs.

Non-relational databases are typically used for storing unstructured data, such as text, images, and videos. They are also used for storing data that is not easily represented in a relational database.

#### MongoDB

MongoDB is an open-source document database based on JSON. It is one of the most popular non-relational databases, and is a great choice for storing unstructured data, such as text, images, and videos.

#### Redis

Redis is an open-source key-value store. It is one of the most popular non-relational databases, and is a great tool for caching data, storing session data, and implementing queues because of its speed and simplicity.

#### Firebase

Firebase is a Backend-as-a-Service (BaaS) that provides developers with a variety of tools and service. It is a great choice for small projects because it is easy to use and has a generous free tier. It has a simple API, and provides a variety of features, including query-able JSON storage, blob storage, authentication, and real-time updates.

#### DynamoDB

DynamoDB is a fully managed NoSQL database service that provides fast and extremely scalable storage. It is a great choice for applications that require high performance and low latency, but often requires a lot of upfront schema design to ensure your use case is supported. Although it is primarily a key-value store, there are a variety of data types that can be stored in DynamoDB, including strings, numbers, lists, and maps.


38 changes: 38 additions & 0 deletions docs/tech/info/git.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
# Using Git in a Team Project

## Introduction

Git is a version control system that allows you to track changes to your code over time. It is a powerful tool that allows you to collaborate with others on a project, and is a standard tool in the software development industry.

This is not a comprehensive guide to Git, but rather a quick introduction to the most common commands you will use when working on a team project.

Feel free to use this as a reference, but you will likely need to do additional research to learn more about Git.

## Git Basics

The basic workflow of Git can be summed up in the following graphic:

![Git Workflow](./assets/git-diagram.jpeg)


## Git Commands

| Command | Description |
| --- | --- |
| `git init` | Initializes a new Git repository in the current directory. You shouldn't need to do this for HCP |
| `git clone <url>` | Clones a remote Git repository to your local machine. You will use this to create a local copy of your repo |
| `git add <file>` | Adds a file to the staging area. You will use this to add files before you commit them |
| `git commit -m <message>` | Commits the staged files to the local repository. You will use this to save your changes to your local repo |
| `git push` | Pushes your local commits to the remote repository. You will use this to share your changes with your team |
| `git pull` | Pulls the latest changes from the remote repository. You will use this to get the latest changes from your team |
| `git status` | Shows the current status of your local repository. You will use this to see which files have been changed, added, or deleted |
| `git log` | Shows the commit history of your local repository. You will use this to see the commit messages and commit hashes of your commits |
| `git branch` | Shows the current branch you are on. You will use this to see which branch you are working on |
| `git checkout <branch>` | Switches to the specified branch. You will use this to switch between branches |
| `git checkout -b <branch>` | Creates a new branch and switches to it. You will use this to create a new branch |
| `git reset --hard <commit>` | Resets the repository to the specified commit. You will use this to undo changes to your local repository |
| `git reset --hard origin/<branch>` | Resets the repository to the latest commit on the specified branch. You will use this to undo changes to your local repository |
| `git reset --hard HEAD~<number>` | Resets the repository to the specified number of commits ago. You will use this to undo changes to your local repository |
| `git reset --hard` | Resets the repository to the last commit. You will use this to undo changes to your local repository |
| `git reset --hard origin/master` | Resets the repository to the latest commit on the master branch. You will use this to undo changes to your local repository |

8 changes: 8 additions & 0 deletions docs/tech/templates/_category_.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"label": "Project Templates",
"position": 2,
"link": {
"type": "generated-index",
"description": "The following documentation will cover the various project templates that were created for projects teams to use as a starting point."
}
}
Loading

0 comments on commit ac60f68

Please sign in to comment.