diff --git a/.DS_Store b/.DS_Store
index 96d0776..0095a44 100644
Binary files a/.DS_Store and b/.DS_Store differ
diff --git a/final-project/.gitignore b/final-project/.gitignore
new file mode 100644
index 0000000..0e31486
--- /dev/null
+++ b/final-project/.gitignore
@@ -0,0 +1,16 @@
+# Ignore the dataset folder
+# IMPORTANT: make sure that your dataset is stored in a
+# folder with the same name as the one specified below!
+
+# data
+*.png
+*.jpg
+food_data/*
+hw1_data/*
+# train related log
+*.pth
+*.log
+*.mat
+*.csv
+# binary files
+*.pyc
diff --git a/final-project/1101_DLCV_final_report_NO_QQ_NO_LIFE.pdf b/final-project/1101_DLCV_final_report_NO_QQ_NO_LIFE.pdf
new file mode 100644
index 0000000..8df9f91
Binary files /dev/null and b/final-project/1101_DLCV_final_report_NO_QQ_NO_LIFE.pdf differ
diff --git a/final-project/README.md b/final-project/README.md
new file mode 100644
index 0000000..333ed7f
--- /dev/null
+++ b/final-project/README.md
@@ -0,0 +1,144 @@
+# DLCV Final Project ( Food-classification-Challenge )
+
+# How to run this code? (for TAs)
+
+    # Download all our models and unzip (~5GB)
+    bash get_checkpoints.sh
+    
+    # Train our ensemble models (22, 33, 38, 40), it may take long long time...
+    # You can skip this!
+    # bash train.sh <top food data folder>
+    bash train.sh <top food data folder> 
+    
+    # Generate outputs our ensemble models with test time augmentation on 4 test types.
+    # However it may takes 12 hours up, we have already save these results as .mat files.
+    # You can also skip this!
+    # bash test_TTA.sh <top food data folder> <ensemble models folder>
+    bash test_TTA.sh <top food data folder> checkpoints
+    
+    # Do the Ensembles and generate for 4 logs for kaggle in the folder "checkpoints"
+    # bash test_ensemble.sh <top food data folder> <ensemble models folder>
+    bash test_ensemble.sh <top food data folder> checkpoints
+
+## Train
+
+    python3 train_template.py 
+## BBN Train [ViT/ResNet/SWIN]
+
+    python3 train_template_BBN.py -mode [UNIFORM/BALANCED]
+
+[2020 CVPR] BBN is UNIFORM,our proposed model is BALANCED
+## Long-Tail Train
+
+    python3 train_template_LT.py -LT_EXP RESAMPLE/REWEIGHT/TODO
+|LT_EXP  |Feature|
+|-----   |--------|
+|LDAM(default)|LDAM Loss[2019NIPS]|
+|RESAMPLE|Balanced DataLoader with CrossEntropy|
+|REVERSE|Reversed DataLoader with CrossEntropy|
+|TODO|...|
+
+Ref: https://github.com/robotframework/RIDE.git
+## Test
+    
+    python3 test_template.py
+## Test with TTA module
+    
+    python3 test_template_TTA.py -mode TEST/VALID
+#####  Transforms
+  
+| Transform      | Parameters                | Values                            |
+|----------------|:-------------------------:|:---------------------------------:|
+| HorizontalFlip(good in our task) | -                         | -                                 |
+| VerticalFlip(bad in our task QQ)   | -                         | -                                 |
+| Rotate90(bad in our task QQ)       | angles                    | List\[0, 90, 180, 270]            |
+| Scale          | scales<br>interpolation   | List\[float]<br>"nearest"/"linear"|
+| Resize         | sizes<br>original_size<br>interpolation   | List\[Tuple\[int, int]]<br>Tuple\[int,int]<br>"nearest"/"linear"|
+| Add            | values                    | List\[float]                      |
+| Multiply       | factors                   | List\[float]                      |
+| FiveCrops(bad in our task QQ)       | crop_height<br>crop_width | int<br>int                        |
+ 
+#####  Aliases
+
+  - flip_transform (horizontal + vertical flips)
+  - hflip_transform (horizontal flip)
+  - d4_transform (flips + rotation 0, 90, 180, 270)
+  - multiscale_transform (scale transform, take scales as input parameter)
+  - five_crop_transform (corner crops + center crop)
+  - ten_crop_transform (five crops + five crops on horizontal flip)
+  
+#####  Merge modes
+ - mean
+ - gmean (geometric mean)
+ - sum
+ - max
+ - min
+ - tsharpen ([temperature sharpen](https://www.kaggle.com/c/severstal-steel-defect-detection/discussion/107716#latest-624046) with t=0.5)
+ 
+More details refer to this repo,please.
+Ref: https://github.com/qubvel/ttach
+## Download our all models
+- Download model from [here](https://drive.google.com/drive/u/3/folders/1XuJa60KacC_cbu-2Xphb3m0rAK2W4AGj)
+ 
+## Load Swin model 
+- Download model from [here](https://drive.google.com/file/d/1HFuSt0OEQzbMC65E4GmRxLlxelPL1DRT/view?usp=sharing)
+- Put the file **swin_large_patch4_window12_384_22kto1k.pth** under ./model_zoo/swin/
+- Remember to set the img_size to 384 for the model
+- Download the **fine-tuned model (reversed sampler & gradaccum 16)** [here](https://drive.google.com/file/d/1Ee_rOaq4OpNFndOxRDoN195M87BpE6JE/view?usp=sharing)
+
+## Load ResNeSt50/ResNeSt269 model 
+- Download model from [here](https://drive.google.com/drive/u/3/folders/1XuJa60KacC_cbu-2Xphb3m0rAK2W4AGj)
+- Put the file **resnest50_v1.pth** / **resnest269_v1.pth** under ./model_zoo/pytorch_resnest/
+- Remember to set the img_size to 224 for the resnest50 model
+- Remember to set the img_size to 320 for the resnest269 model
+- Remember to pip install fvcore
+
+## Automatic Submission to Kaggle
+
+    export KAGGLE_USERNAME=datadinosaur
+    export KAGGLE_KEY=xxxxxxxxxxxxxx
+    bash test_kaggle.sh $1 $2 ($1:model_path(e.g., baseline/ ) $2:commit message)
+## File structure
+```
+final-project-challenge-3-no_qq_no_life/
+│
+├── train_template.py - main script to start training
+├── test_template.py - evaluation of trained model
+├── test_template_TTA.py - evaluation of trained model with TTA module
+├── base/ 
+│   ├── dataset.py
+│   ├── trainer.py
+│   └── tester.py
+└── model_zoo/ 
+    ├── swin/*
+    ├── vgg16.py
+    ├── BBN/* 
+    └── pytorch_pretrained_vit/* 
+```
+# Usage
+To start working on this final project, you should clone this repository into your local machine by using the following command:
+
+    git clone https://github.com/DLCV-Fall-2021/final-project-challenge-3-<team_name>.git
+Note that you should replace `<team_name>` with your own team name.
+
+For more details, please click [this link](https://drive.google.com/drive/folders/13PQuQv4dllmdlA7lJNiLDiZ7gOxge2oJ?usp=sharing) to view the slides of Final Project - Food image classification. **Note that video and introduction pdf files for final project can be accessed in your NTU COOL.**
+
+### Dataset
+In the starter code of this repository, we have provided a shell script for downloading and extracting the dataset for this assignment. For Linux users, simply use the following command.
+
+    bash ./get_dataset.sh
+The shell script will automatically download the dataset and store the data in a folder called `food_data`. Note that this command by default only works on Linux. If you are using other operating systems, you should download the dataset from [this link](https://drive.google.com/file/d/1IYWPK8h9FWyo0p4-SCAatLGy0l5omQaw/view?usp=sharing) and unzip the compressed file manually.
+
+> ⚠️ ***IMPORTANT NOTE*** ⚠️  
+> 1. Please do not disclose the dataset! Also, do not upload your get_dataset.sh to your (public) Github.
+> 2. You should keep a copy of the dataset only in your local machine. **DO NOT** upload the dataset to this remote repository. If you extract the dataset manually, be sure to put them in a folder called `food_data` under the root directory of your local repository so that it will be included in the default `.gitignore` file.
+
+> 🆕 ***NOTE***  
+> For the sake of conformity, please use the `python3` command to call your `.py` files in all your shell scripts. Do not use `python` or other aliases, otherwise your commands may fail in our autograding scripts.
+
+# Q&A
+If you have any problems related to Final Project, you may
+- Use TA hours
+- Contact TAs by e-mail ([ntudlcv@gmail.com](mailto:ntudlcv@gmail.com))
+- Post your question under Final Project FAQ section in NTU Cool Discussion
+
diff --git a/final-project/attn_vis.py b/final-project/attn_vis.py
new file mode 100644
index 0000000..50cbc6e
--- /dev/null
+++ b/final-project/attn_vis.py
@@ -0,0 +1,72 @@
+import os
+import torch
+import argparse
+from sklearn.manifold import TSNE
+from tqdm import tqdm
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+import matplotlib as mpl
+from PIL import Image
+import torchvision.transforms as tranforms
+
+from model_zoo.swin.swin_transformer_vis import get_swin
+from base_vis.dataset import FoodDataset,ChunkSampler,P1_Dataset
+from util import *
+
+if __name__ == '__main__':
+    # print(model)
+    # layers[3].blocks.mlp.fc1
+    parser = argparse.ArgumentParser()
+    parser.add_argument("-load", "--load",default='',type=str , help='')
+    parser.add_argument("-model_path", "--model_path",default="baseline",type=str , help='')
+    
+    parser.add_argument("-img_size", "--img_size", default=384,type=int , help='')
+    parser.add_argument("-batch_size", "--batch_size", default=1,type=int , help='')
+    parser.add_argument("-val_data_dir","--val_data_dir", default = "../final-project-challenge-3-no_qq_no_life/food_data/val",type=str, help ="Validation images directory")
+    args = parser.parse_args()
+
+    device = model_setting()
+    fix_seeds(87)
+
+    if not os.path.exists(os.path.join(args.model_path, 'attn')):
+        os.makedirs(os.path.join(args.model_path, 'attn'))
+
+    raw_class_list = [558, 925, 945, 827, 880, 800, 929, 633, 515, 326]
+    confuse_class_list = [610, 294, 485, 866, 88, 759, 809, 297, 936, 33]
+    class_list = raw_class_list + confuse_class_list
+    num_per_class = 1
+
+
+    val_dataset = FoodDataset(args.val_data_dir,img_size=args.img_size,mode = "val", class_list=class_list, num_per_class=num_per_class)
+    val_loader = torch.utils.data.DataLoader(val_dataset,
+                                                batch_size=args.batch_size,
+                                                shuffle=False,
+                                                num_workers=8)
+    model = get_swin(ckpt='./model_zoo/swin/swin_large_patch4_window12_384_22kto1k.pth')
+    # print(model)
+    if args.load:
+        model.load_state_dict(torch.load(args.load))
+        print("model loaded from {}".format(args.load))
+    model.to(device)
+    model.eval()
+    resize = tranforms.Resize((384,384))
+    with torch.no_grad():
+      for i, (data, label) in enumerate(tqdm(val_loader)):
+        data = data.to(device)
+        output, attn = model(data) # attn: 1, 48, 144, 144
+        attn = attn.squeeze(0).cpu().numpy() # (48, 144, 144)
+        avg_attn_map = attn[1, :, :]
+        # avg_attn_map = np.mean(attn, axis=0) # (144, 144)
+        avg_attn_map = np.mean(avg_attn_map, axis=0)
+        avg_attn_map = np.reshape(avg_attn_map, (12,12))
+
+        original_image = val_dataset.getOriginalImage(i)
+        avg_attn_map = np.array(resize(Image.fromarray(avg_attn_map)))
+        print(attn.shape, original_image.shape)
+        plt.cla()
+        plt.clf()
+        plt.axis('off')
+        plt.imshow(original_image)
+        plt.imshow(avg_attn_map, alpha=0.5, cmap='rainbow')
+        plt.savefig(os.path.join(args.model_path, 'attn, '{}.png'.format(label.item())))
\ No newline at end of file
diff --git a/final-project/base/dataset.py b/final-project/base/dataset.py
new file mode 100644
index 0000000..30edc80
--- /dev/null
+++ b/final-project/base/dataset.py
@@ -0,0 +1,319 @@
+from PIL import Image
+import os
+import re
+import glob
+import numpy as np
+import pandas as pd
+import random
+import torch
+from torch.utils.data import sampler,Dataset,DataLoader
+import torch.nn.functional as F
+from torchvision import transforms
+import torchvision 
+filenameToPILImage = lambda x: Image.open(x)
+
+
+##############
+# Dataset
+##############
+class FoodDataset(Dataset):
+	def __init__(self,data_path,mode,img_size):
+		"""
+		for training and validation data
+		return data with label
+		"""
+		self.data_path = data_path
+		self.img_size = img_size
+		self.mode = mode
+		self.file_list,self.label_list = self.parse_folder()
+		self.num = len(self.file_list)
+		print("load %d images from %s"%(self.num,self.data_path))
+		if mode == "train":
+			self.transform = transforms.Compose([
+				filenameToPILImage,
+				transforms.Resize((self.img_size,self.img_size)),
+				transforms.RandomHorizontalFlip(p=0.5),
+				transforms.RandomRotation(10),
+				torchvision.transforms.ColorJitter(),
+				transforms.CenterCrop(self.img_size),
+				transforms.ToTensor(),
+				transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
+				])
+		else:
+			self.transform = transforms.Compose([
+				filenameToPILImage,
+				transforms.Resize((self.img_size,self.img_size)),
+				transforms.ToTensor(),
+				transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
+				])
+		self.freq_list = []
+		f = open(os.path.join(self.data_path,"../label2name.txt"), encoding='utf8')
+		for line in f.readlines():
+			if (line.find("f") != -1):
+			   self.freq_list.append(0)
+			elif (line.find("c") != -1):
+			   self.freq_list.append(1)
+			else:
+			   self.freq_list.append(2)			
+		f.close
+	def parse_folder(self):
+		'''
+		output : file _dict 
+		'''
+		file_list = []
+		label_list = []
+		for class_id in range(0,1000):
+			str_id = str(class_id)
+			sub_folder = os.path.join(self.data_path,str_id)
+			sub_list = sorted([name for name in os.listdir(sub_folder) if os.path.isfile(os.path.join(sub_folder, name))])
+			file_list.extend(sub_list)
+			label_list.extend([str_id]*len(sub_list))
+		return file_list,label_list
+
+	def __len__(self) -> int:
+		return self.num
+	def __getitem__(self, index):		
+		img_path = os.path.join(self.data_path,self.label_list[index],self.file_list[index])
+		label = int(self.label_list[index])
+		# Preprocessing -> normalize image
+		img = self.transform(img_path)
+		return img,label
+class FoodTestDataset(Dataset):
+	def __init__(self,csv_path,data_path,img_size):
+		"""
+		for training and validation data
+		return data without label
+		"""
+		self.data_df = pd.read_csv(csv_path)
+		self.data_path = data_path
+		self.img_size = img_size
+		self.num = len(self.data_df)
+		print("load %d images from %s"%(self.num,self.data_path))
+		self.transform = transforms.Compose([
+				filenameToPILImage,
+				transforms.Resize((self.img_size,self.img_size)),
+				transforms.ToTensor(),
+				transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
+				])
+	def __getitem__(self, index):		
+		img_path = os.path.join(self.data_path,"{:06d}.jpg".format(self.data_df.loc[index, "image_id"]))
+		img = self.transform(img_path)
+		return img
+	def __len__(self) -> int:
+		return self.num
+
+#############################
+# Long tail food dataloader #
+#############################
+
+class FoodLTDataLoader(DataLoader):
+	"""
+	2021 ICLR RIDE
+	modified from ImageNetLT Data Loader
+	counting statistics of data,and construct a list of class_num
+	base on this list of class_num,we can do reweight/resample
+	"""
+	def __init__(self, data_dir, img_size, batch_size, shuffle=True, num_workers=1, training=True, balanced=False, reversed=False, retain_epoch_size=True):
+		if training:
+			dataset = FoodDataset(os.path.join(data_dir,"train"),"train",img_size = img_size)
+			val_dataset = FoodDataset(os.path.join(data_dir,"val"),"val",img_size = img_size)
+		else: # test
+			dataset = FoodTestDataset(os.path.join(data_dir,"test"),img_size = img_size)
+			val_dataset = None
+
+		self.dataset = dataset
+		self.val_dataset = val_dataset
+
+		self.n_samples = len(self.dataset)
+
+		num_classes = len(np.unique(dataset.label_list))
+		assert num_classes == 1000
+
+		cls_num_list = [0] * num_classes
+		for label in dataset.label_list:
+			cls_num_list[int(label)] += 1
+
+		self.cls_num_list = cls_num_list
+
+		if balanced:
+			if training:
+				print("Use balanced sampler for data loader")
+				buckets = [[] for _ in range(num_classes)]
+				for idx, label in enumerate(dataset.label_list):
+					buckets[int(label)].append(idx)
+				sampler = BalancedSampler(buckets, retain_epoch_size)
+				shuffle = False
+			else:
+				print("Test set will not be evaluated with balanced sampler, nothing is done to make it balanced")
+		elif reversed:
+			if training:
+				print("Use reversed sampler for data loader")
+				max_num = max(self.cls_num_list)
+				class_weight = [max_num / i for i in self.cls_num_list]
+				buckets = [[] for _ in range(num_classes)]
+				for idx, label in enumerate(dataset.label_list):
+					buckets[int(label)].append(idx)
+				sampler = ReversedSampler(buckets, retain_epoch_size, class_weight)
+				shuffle = False	
+			else:
+				print("Test set will not be evaluated with reversed sampler")
+		else:
+			# uniform sampler 
+			print("Use uniform sampler for data loader")
+			if training:
+				sampler = None
+				shuffle = True
+			else:
+				print("Test set will not be evaluated with shuffle order")   
+		
+		self.shuffle = shuffle
+		self.init_kwargs = {
+			'batch_size': batch_size,
+			'shuffle': self.shuffle,
+			'num_workers': num_workers
+		}
+		self.val_init_kwargs = {
+			'batch_size': batch_size,
+			'shuffle': False, # For validation data,always false
+			'num_workers': num_workers
+		}
+		super().__init__(dataset=self.dataset, **self.init_kwargs, sampler=sampler) # Note that sampler does not apply to validation set
+
+	def split_validation(self):
+		# If you do not want to validate:
+		#return None
+		# If you want to validate:
+		return DataLoader(dataset=self.val_dataset, **self.val_init_kwargs)
+
+
+#######################
+# hw1 dataloader
+# Sanity check workload
+# use hw1 data to verify correctness of training code
+#######################
+class P1_Dataset(Dataset):
+	def __init__(self,data_path,img_size,val_mode):
+		self.data_path = data_path
+		self.val_mode = val_mode
+		self.file_list=sorted([name for name in os.listdir(self.data_path) if os.path.isfile(os.path.join(self.data_path, name)) ])
+		self.num = len(self.file_list)
+		print("load %d images from %s"%(self.num,self.data_path))
+		self.transform = transforms.Compose([
+			filenameToPILImage,
+			transforms.Resize((img_size,img_size)),
+			transforms.RandomRotation(degrees=10, resample=Image.BILINEAR),
+			transforms.RandomHorizontalFlip(),
+			transforms.ToTensor(),
+			transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
+			])
+	def __len__(self) -> int:
+		return self.num
+	def __getitem__(self, index):
+		label_idx = int(re.findall(r'(\d+)_\d+.png',self.file_list[index])[0])
+		# Preprocessing -> normalize image
+		image_data = self.transform(os.path.join(self.data_path,self.file_list[index]))#.unsqueeze(0)
+		return image_data,label_idx
+##############
+# Sampler
+##############
+###################################
+# 2021 CVPR RIDE Balanced sampler #
+###################################
+class BalancedSampler(sampler.Sampler):
+	def __init__(self, buckets, retain_epoch_size=False):
+		for bucket in buckets:
+			random.shuffle(bucket)
+
+		self.bucket_num = len(buckets)
+		self.buckets = buckets
+		self.bucket_pointers = [0 for _ in range(self.bucket_num)]
+		self.retain_epoch_size = retain_epoch_size
+	
+	def __iter__(self):
+		count = self.__len__()
+		while count > 0:
+			yield self._next_item()
+			count -= 1
+
+	def _next_item(self):
+		bucket_idx = random.randint(0, self.bucket_num - 1)
+		bucket = self.buckets[bucket_idx]
+		item = bucket[self.bucket_pointers[bucket_idx]]
+		self.bucket_pointers[bucket_idx] += 1
+		if self.bucket_pointers[bucket_idx] == len(bucket):
+			self.bucket_pointers[bucket_idx] = 0
+			random.shuffle(bucket)
+		return item
+
+	def __len__(self):
+		if self.retain_epoch_size:
+			return sum([len(bucket) for bucket in self.buckets]) # Acrually we need to upscale to next full batch
+		else:
+			return max([len(bucket) for bucket in self.buckets]) * self.bucket_num # Ensures every instance has the chance to be visited in an epoch
+
+class ReversedSampler(sampler.Sampler):
+	def __init__(self, buckets, retain_epoch_size=False, class_weight=None):
+		for bucket in buckets:
+			random.shuffle(bucket)
+		self.class_weight = class_weight
+		self.sum_weight = sum(self.class_weight)
+		self.bucket_num = len(buckets)
+		self.buckets = buckets
+		self.bucket_pointers = [0 for _ in range(self.bucket_num)]
+		self.retain_epoch_size = retain_epoch_size
+	
+	def sample_class_index_by_weight(self):
+		rand_number, now_sum = random.random() * self.sum_weight, 0
+		for i in range(len(self.class_weight)):
+			now_sum += self.class_weight[i]
+			if rand_number <= now_sum:
+				return i
+
+	def __iter__(self):
+		count = self.__len__()
+		while count > 0:
+			yield self._next_item()
+			count -= 1
+
+	def _next_item(self):
+		bucket_idx = self.sample_class_index_by_weight()
+		bucket = self.buckets[bucket_idx]
+		item = bucket[self.bucket_pointers[bucket_idx]]
+		self.bucket_pointers[bucket_idx] += 1
+		if self.bucket_pointers[bucket_idx] == len(bucket):
+			self.bucket_pointers[bucket_idx] = 0
+			random.shuffle(bucket)
+		return item
+
+	def __len__(self):
+		if self.retain_epoch_size:
+			return sum([len(bucket) for bucket in self.buckets]) # Acrually we need to upscale to next full batch
+		else:
+			return max([len(bucket) for bucket in self.buckets]) * self.bucket_num # Ensures every instance has the chance to be visited in an epoch
+
+class ChunkSampler(sampler.Sampler):
+	"""Samples elements sequentially from some offset for sanity check 
+	Arguments:
+		num_samples: # of desired datapoints
+		start: offset where we should start selecting from
+	"""
+	def __init__(self, num_samples, start = 0):
+		self.num_samples = num_samples
+		self.start = start
+		
+
+	def __iter__(self):
+		return iter(range(self.start, self.start + self.num_samples))
+
+	def __len__(self):
+		return self.num_samples
+if __name__ == "__main__":
+	import time 
+	start = time.time()
+	#D = FoodDataset("food_data/train","train",img_size = 384 )
+	D = FoodTestDataset("food_data/testcase/sample_submission_comm_track.csv","food_data/test",img_size = 384 )
+	d = D.__getitem__(0)
+	print(d[0])
+
+	print(time.time()-start)
+
diff --git a/final-project/base/loss.py b/final-project/base/loss.py
new file mode 100644
index 0000000..2ea7068
--- /dev/null
+++ b/final-project/base/loss.py
@@ -0,0 +1,288 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import numpy as np
+
+import random
+
+eps = 1e-7
+
+# CE and LDAM are supported
+
+# If you would like to add other losses, please have a look at:
+# Focal Loss: https://github.com/kaidic/LDAM-DRW
+# CRD, PKT, and SP Related Part: https://github.com/HobbitLong/RepDistiller
+
+def focal_loss(input_values, gamma):
+    """Computes the focal loss"""
+    p = torch.exp(-input_values)
+    loss = (1 - p) ** gamma * input_values
+    return loss.mean()
+
+class FocalLoss(nn.Module):
+    def __init__(self, cls_num_list=None, weight=None, gamma=0.):
+        super(FocalLoss, self).__init__()
+        assert gamma >= 0
+        self.gamma = gamma
+        self.weight = weight
+    
+    def _hook_before_epoch(self, epoch):
+        pass
+
+    def forward(self, output_logits, target):
+        return focal_loss(F.cross_entropy(output_logits, target, reduction='none', weight=self.weight), self.gamma)
+
+class CrossEntropyLoss(nn.Module):
+    def __init__(self, cls_num_list=None, reweight_CE=False):
+        super().__init__()
+        if reweight_CE:
+            idx = 1 # condition could be put in order to set idx
+            betas = [0, 0.9999]
+            effective_num = 1.0 - np.power(betas[idx], cls_num_list)
+            per_cls_weights = (1.0 - betas[idx]) / np.array(effective_num)
+            per_cls_weights = per_cls_weights / np.sum(per_cls_weights) * len(cls_num_list)
+            self.per_cls_weights = torch.tensor(per_cls_weights, dtype=torch.float, requires_grad=False)
+        else:
+            self.per_cls_weights = None
+
+    def to(self, device):
+        super().to(device)
+        if self.per_cls_weights is not None:
+            self.per_cls_weights = self.per_cls_weights.to(device)
+        
+        return self
+
+    def forward(self, output_logits, target): # output is logits
+        return F.cross_entropy(output_logits, target, weight=self.per_cls_weights)
+
+class LDAMLoss(nn.Module):
+    def __init__(self, cls_num_list=None, max_m=0.5, s=30, reweight_epoch=-1):
+        super().__init__()
+        if cls_num_list is None:
+            # No cls_num_list is provided, then we cannot adjust cross entropy with LDAM.
+            self.m_list = None
+        else:
+            self.reweight_epoch = reweight_epoch
+            m_list = 1.0 / np.sqrt(np.sqrt(cls_num_list))
+            m_list = m_list * (max_m / np.max(m_list))
+            m_list = torch.tensor(m_list, dtype=torch.float, requires_grad=False)
+            self.m_list = m_list
+            assert s > 0
+            self.s = s
+            if reweight_epoch != -1:
+                idx = 1 # condition could be put in order to set idx
+                betas = [0, 0.9999]
+                effective_num = 1.0 - np.power(betas[idx], cls_num_list)
+                per_cls_weights = (1.0 - betas[idx]) / np.array(effective_num)
+                per_cls_weights = per_cls_weights / np.sum(per_cls_weights) * len(cls_num_list)
+                self.per_cls_weights_enabled = torch.tensor(per_cls_weights, dtype=torch.float, requires_grad=False)
+            else:
+                self.per_cls_weights_enabled = None
+                self.per_cls_weights = None
+
+    def to(self, device):
+        super().to(device)
+        if self.m_list is not None:
+            self.m_list = self.m_list.to(device)
+
+        if self.per_cls_weights_enabled is not None:
+            self.per_cls_weights_enabled = self.per_cls_weights_enabled.to(device)
+
+        return self
+
+    def _hook_before_epoch(self, epoch):
+        if self.reweight_epoch != -1:
+            self.epoch = epoch
+
+            if epoch > self.reweight_epoch:
+                self.per_cls_weights = self.per_cls_weights_enabled
+            else:
+                self.per_cls_weights = None
+
+    def get_final_output(self, output_logits, target):
+        x = output_logits
+
+        index = torch.zeros_like(x, dtype=torch.uint8, device=x.device)
+        index.scatter_(1, target.data.view(-1, 1), 1)
+        
+        index_float = index.float()
+        batch_m = torch.matmul(self.m_list[None, :], index_float.transpose(0,1))
+        
+        batch_m = batch_m.view((-1, 1))
+        x_m = x - batch_m * self.s
+
+        final_output = torch.where(index, x_m, x)
+        return final_output
+
+    def forward(self, output_logits, target):
+        if self.m_list is None:
+            return F.cross_entropy(output_logits, target)
+        
+        final_output = self.get_final_output(output_logits, target)
+        return F.cross_entropy(final_output, target, weight=self.per_cls_weights)
+
+"""
+2021 ICLR RIDE
+Don't support plug-and play feature to our repo QQ
+"""
+
+class RIDELoss(nn.Module):
+    def __init__(self, cls_num_list=None, base_diversity_temperature=1.0, max_m=0.5, s=30, reweight=True, reweight_epoch=-1, 
+        base_loss_factor=1.0, additional_diversity_factor=-0.2, reweight_factor=0.05):
+        super().__init__()
+        self.base_loss = F.cross_entropy
+        self.base_loss_factor = base_loss_factor
+        if not reweight:
+            self.reweight_epoch = -1
+        else:
+            self.reweight_epoch = reweight_epoch
+
+        # LDAM is a variant of cross entropy and we handle it with self.m_list.
+        if cls_num_list is None:
+            # No cls_num_list is provided, then we cannot adjust cross entropy with LDAM.
+            self.m_list = None
+            self.per_cls_weights_enabled = None
+            self.per_cls_weights_enabled_diversity = None
+        else:
+            # We will use LDAM loss if we provide cls_num_list.
+            print("Class distributuion",cls_num_list)
+            m_list = 1.0 / np.sqrt(np.sqrt(cls_num_list))
+            m_list = m_list * (max_m / np.max(m_list))
+            m_list = torch.tensor(m_list, dtype=torch.float, requires_grad=False)
+            self.m_list = m_list
+            self.s = s
+            assert s > 0
+            
+            if reweight_epoch != -1:
+                idx = 1 # condition could be put in order to set idx
+                betas = [0, 0.9999]
+                effective_num = 1.0 - np.power(betas[idx], cls_num_list)
+                per_cls_weights = (1.0 - betas[idx]) / np.array(effective_num)
+                per_cls_weights = per_cls_weights / np.sum(per_cls_weights) * len(cls_num_list)
+                self.per_cls_weights_enabled = torch.tensor(per_cls_weights, dtype=torch.float, requires_grad=False)
+            else:
+                self.per_cls_weights_enabled = None
+
+            cls_num_list = np.array(cls_num_list) / np.sum(cls_num_list)
+            C = len(cls_num_list)
+            per_cls_weights = C * cls_num_list * reweight_factor + 1 - reweight_factor
+
+            # Experimental normalization: This is for easier hyperparam tuning, the effect can be described in the learning rate so the math formulation keeps the same.
+            # At the same time, the 1 - max trick that was previously used is not required since weights are already adjusted.
+            per_cls_weights = per_cls_weights / np.max(per_cls_weights)
+
+            assert np.all(per_cls_weights > 0), "reweight factor is too large: out of bounds"
+            # save diversity per_cls_weights
+            self.per_cls_weights_enabled_diversity = torch.tensor(per_cls_weights, dtype=torch.float, requires_grad=False).cuda()
+
+        self.base_diversity_temperature = base_diversity_temperature
+        self.additional_diversity_factor = additional_diversity_factor
+
+    def to(self, device):
+        super().to(device)
+        if self.m_list is not None:
+            self.m_list = self.m_list.to(device)
+        
+        if self.per_cls_weights_enabled is not None:
+            self.per_cls_weights_enabled = self.per_cls_weights_enabled.to(device)
+
+        if self.per_cls_weights_enabled_diversity is not None:
+            self.per_cls_weights_enabled_diversity = self.per_cls_weights_enabled_diversity.to(device)
+
+        return self
+
+    def _hook_before_epoch(self, epoch):
+        if self.reweight_epoch != -1:
+            self.epoch = epoch
+
+            if epoch > self.reweight_epoch:
+                self.per_cls_weights_base = self.per_cls_weights_enabled
+                self.per_cls_weights_diversity = self.per_cls_weights_enabled_diversity
+            else:
+                self.per_cls_weights_base = None
+                self.per_cls_weights_diversity = None
+
+    def get_final_output(self, output_logits, target):
+        x = output_logits
+
+        index = torch.zeros_like(x, dtype=torch.uint8, device=x.device)
+        index.scatter_(1, target.data.view(-1, 1), 1)
+        
+        index_float = index.float()
+        batch_m = torch.matmul(self.m_list[None, :], index_float.transpose(0,1))
+        
+        batch_m = batch_m.view((-1, 1))
+        x_m = x - batch_m * self.s
+
+        final_output = torch.where(index, x_m, x)
+        return final_output
+
+    def forward(self, output_logits, target, extra_info=None):
+        if extra_info is None:
+            return self.base_loss(output_logits, target)
+
+        loss = 0
+
+        # Adding RIDE Individual Loss for each expert
+        for logits_item in extra_info['logits']:
+            ride_loss_logits = output_logits if self.additional_diversity_factor == 0 else logits_item
+            if self.m_list is None:
+                loss += self.base_loss_factor * self.base_loss(ride_loss_logits, target)
+            else:
+                final_output = self.get_final_output(ride_loss_logits, target)
+                loss += self.base_loss_factor * self.base_loss(final_output, target, weight=self.per_cls_weights_base)
+            
+            base_diversity_temperature = self.base_diversity_temperature
+
+            if self.per_cls_weights_diversity is not None:
+                diversity_temperature = base_diversity_temperature * self.per_cls_weights_diversity.view((1, -1))
+                temperature_mean = diversity_temperature.mean().item()
+            else:
+                diversity_temperature = base_diversity_temperature
+                temperature_mean = base_diversity_temperature
+            
+            output_dist = F.log_softmax(logits_item / diversity_temperature, dim=1)
+            with torch.no_grad():
+                # Using the mean takes only linear instead of quadratic time in computing and has only a slight difference so using the mean is preferred here
+                mean_output_dist = F.softmax(output_logits / diversity_temperature, dim=1)
+            
+            loss += self.additional_diversity_factor * temperature_mean * temperature_mean * F.kl_div(output_dist, mean_output_dist, reduction='batchmean')
+        
+        return loss
+
+class RIDELossWithDistill(nn.Module):
+    def __init__(self, cls_num_list=None, additional_distill_loss_factor=1.0, distill_temperature=1.0, ride_loss_factor=1.0, **kwargs):
+        super().__init__()
+        self.ride_loss = RIDELoss(cls_num_list=cls_num_list, **kwargs)
+        self.distill_temperature = distill_temperature
+
+        self.ride_loss_factor = ride_loss_factor
+        self.additional_distill_loss_factor = additional_distill_loss_factor
+
+    def to(self, device):
+        super().to(device)
+        self.ride_loss = self.ride_loss.to(device)
+        return self
+
+    def _hook_before_epoch(self, epoch):
+        self.ride_loss._hook_before_epoch(epoch)
+
+    def forward(self, student, target=None, teacher=None, extra_info=None):
+        output_logits = student
+        if extra_info is None:
+            return self.ride_loss(output_logits, target)
+
+        loss = 0
+        num_experts = len(extra_info['logits'])
+        for logits_item in extra_info['logits']:
+            loss += self.ride_loss_factor * self.ride_loss(output_logits, target, extra_info)
+            distill_temperature = self.distill_temperature
+
+            student_dist = F.log_softmax(student / distill_temperature, dim=1)
+            with torch.no_grad():
+                teacher_dist = F.softmax(teacher / distill_temperature, dim=1)
+            
+            distill_loss = F.kl_div(student_dist, teacher_dist, reduction='batchmean')
+            distill_loss = distill_temperature * distill_temperature * distill_loss
+            loss += self.additional_distill_loss_factor * distill_loss
+        return loss
diff --git a/final-project/base/tester.py b/final-project/base/tester.py
new file mode 100644
index 0000000..6bc3d10
--- /dev/null
+++ b/final-project/base/tester.py
@@ -0,0 +1,122 @@
+import torch
+import os
+from scipy.io import savemat
+from tqdm import tqdm 
+from util import gen_logger
+import csv
+import pandas as pd
+
+class BaseTester():
+	def __init__(self,
+				 device,
+				 model,
+				 test_loader,
+				 val_loader,
+				 load_model_path,
+				 mat_file,
+				 kaggle_file,
+     			 criterion):
+		self.device = device 
+		self.model = model.to(self.device) 
+		self.test_loader = test_loader
+		self.val_loader = val_loader
+		self.model_path = load_model_path
+		self.logger = gen_logger(os.path.join(load_model_path,"valid.log"))
+		#self.logger = gen_logger(kaggle_file)
+		self.mat_file = mat_file
+		self.kaggle_file = kaggle_file
+		self.criterion = criterion
+		self.output_list= {"out":[],
+						   "out_softmax":[],
+						   "pred":[],
+						   "label":[]
+						 }
+	def test(self):
+		'''
+		log kaggle submission file
+		'''
+		label_list = []
+		self.model.eval()
+		output_list_out = []
+		output_list_out_softmax = []
+		output_list_pred = []
+		with torch.no_grad():
+			for i, t_imgs in enumerate(tqdm(self.test_loader)):
+				t_imgs = t_imgs.to(self.device)
+				P_pred = self.model(t_imgs)
+				_,pred_label=torch.max(P_pred,1)
+				label_list.extend(P_pred.argmax(dim=-1).cpu().numpy().tolist())
+				for i in range(len(pred_label)):
+					output_list_out.append(P_pred[i].cpu().numpy())
+					output_list_out_softmax.append(torch.nn.functional.softmax(P_pred[i],dim=0).cpu().numpy())
+					output_list_pred.append(pred_label[i].cpu().numpy())
+		self.output_list["out"] = output_list_out
+		self.output_list["out_softmax"] = output_list_out_softmax
+		self.output_list["pred"] = output_list_pred
+		self.save_info_list()
+		image_ids = ["{:06d}".format(i) for i in self.test_loader.dataset.data_df.image_id]
+		# write predictions
+		df = pd.DataFrame({"image_id": image_ids, 'label': label_list})
+		df.to_csv(self.kaggle_file, index=False)
+		print("===> File saved as {}".format(self.kaggle_file))
+
+	def valid_and_savemat(self):
+		val_nums = 0.0
+		val_loss = 0.0
+		val_acc = 0.0
+		val_nums_freq = 0.0
+		val_nums_common = 0.0
+		val_nums_rare = 0.0
+		val_acc_freq = 0.0
+		val_acc_common = 0.0
+		val_acc_rare = 0.0
+		self.model.eval()
+		output_list_out = []
+		output_list_out_softmax = []
+		output_list_pred = []
+		output_list_label = []
+		with torch.no_grad():
+			for batch_idx,(data,label) in enumerate(tqdm(self.val_loader)):
+				data,label =data.to(self.device),label.to(self.device)
+				output =self.model(data)
+				val_loss += self.criterion(output,label).item()
+				_,pred_label=torch.max(output,1)
+				val_acc+=(label==pred_label).sum().item()
+				val_nums+=pred_label.size(0)
+				for i in range(len(label)):
+					if (self.val_loader.dataset.freq_list[label[i]] == 0):
+						val_acc_freq += (label[i]==pred_label[i]).item()
+						val_nums_freq += 1
+					elif (self.val_loader.dataset.freq_list[label[i]] == 1):
+						val_acc_common += (label[i]==pred_label[i]).item()
+						val_nums_common += 1
+					else:
+						val_acc_rare += (label[i]==pred_label[i]).item()
+						val_nums_rare += 1
+				for i in range(len(label)):
+					output_list_out.append(output[i].cpu().numpy())
+					output_list_out_softmax.append(torch.nn.functional.softmax(output[i],dim=0).cpu().numpy())
+					output_list_pred.append(pred_label[i].cpu().numpy())		
+					output_list_label.append(label[i].cpu().numpy())
+		self.output_list["out"] = output_list_out
+		self.output_list["out_softmax"] = output_list_out_softmax
+		self.output_list["pred"] = output_list_pred
+		self.output_list["label"] = output_list_label
+		self.save_info_list()
+		val_acc_rate = val_acc/val_nums
+		self.logger.info("Validation loss:{:5f} Validation accuracy:{:5f}".format(val_loss,val_acc_rate ) )
+		self.logger.info("Val_acc {:d} Val_nums {:d}".format(int(val_acc),int(val_nums)) )
+		val_acc_rate_freq = val_acc_freq/val_nums_freq
+		val_acc_rate_common = val_acc_common/val_nums_common
+		val_acc_rate_rare = val_acc_rare/val_nums_rare
+		self.logger.info("Validation accuracy (freq) : {:5f}".format(val_acc_rate_freq))
+		self.logger.info("Val_acc {:d} Val_nums {:d} (freq)".format(int(val_acc_freq),int(val_nums_freq)) )
+		self.logger.info("Validation accuracy (common) : {:5f}".format(val_acc_rate_common))
+		self.logger.info("Val_acc {:d} Val_nums {:d} (common)".format(int(val_acc_common),int(val_nums_common)) )
+		self.logger.info("Validation accuracy (rare) : {:5f}".format(val_acc_rate_rare))
+		self.logger.info("Val_acc {:d} Val_nums {:d} (rare)".format(int(val_acc_rare),int(val_nums_rare)) )
+
+	def save_info_list(self):
+		if self.mat_file:
+			savemat(self.mat_file, self.output_list)
+			self.logger.info("Save output .mat to {}".format(self.mat_file))
\ No newline at end of file
diff --git a/final-project/base/trainer.py b/final-project/base/trainer.py
new file mode 100644
index 0000000..47389aa
--- /dev/null
+++ b/final-project/base/trainer.py
@@ -0,0 +1,350 @@
+import torch
+import time
+import os
+from scipy.io import savemat
+from tqdm import tqdm 
+from util import gen_logger
+
+class BaseTrainer():
+	def __init__(self,
+				 device,
+				 model,
+				 optimizer,
+				 scheduler,
+				 MAX_EPOCH,
+				 criterion,
+				 train_loader,
+				 val_loader,
+				 lr,
+				 batch_size,
+				 gradaccum_size, 
+				 model_path,
+				 save_period):
+		self.device = device
+		self.model = model.to(self.device)
+		self.optimizer = optimizer
+		self.MAX_EPOCH = MAX_EPOCH
+		self.criterion = criterion
+		self.train_loader = train_loader
+		self.val_loader = val_loader
+		self.model_path = model_path
+		self.scheduler = scheduler
+		self.save_period = save_period
+		self.lr = lr
+		self.batch_size = batch_size
+		self.gradaccum_size = gradaccum_size
+		self.logger = gen_logger(os.path.join(model_path,"console.log"))
+		# training related
+		self.val_best_acc = -1.0
+		self.info_list = {"train_loss":[],
+						  "val_loss":[],
+						  "train_acc":[],
+						  "val_acc":[],
+						  "lr":lr,
+						  "batch_size":batch_size
+						 }
+	def train(self):
+		for epoch in range(self.MAX_EPOCH):
+			start = time.time()
+			self.logger.info("====================================================")
+			self.logger.info("Epoch {}: train".format(epoch))
+			self._train(epoch)
+			self.logger.info("Epoch {}: validation".format(epoch))
+			self._valid(epoch)
+			self.logger.info("Total {:5f} sec per epoch".format(time.time()-start))
+			# log training info per epoch
+			self.save_info_list()
+	def _train(self,epoch):
+		self.model.train()
+		train_nums = 0.0
+		train_loss = 0.0
+		train_acc = 0.0
+		# train
+		loss_step = 1
+		for batch_idx,(data,label) in enumerate(tqdm(self.train_loader)):
+			data,label =data.to(self.device),label.to(self.device)
+			output = self.model(data)
+			loss = self.criterion(output,label)
+			train_loss +=loss.item()
+			#self.logger.info("Epoch:{:d} | batch: {:d}| loss: {:5f}".format(epoch,batch_idx,train_loss))
+			# update: backpropagation,lr schedule
+			loss = loss / self.gradaccum_size
+			loss.backward()
+			if (loss_step % self.gradaccum_size == 0) or ((batch_idx + 1) == len(self.train_loader)):
+				self.optimizer.step()
+				self.optimizer.zero_grad()
+			loss_step += 1
+			_,pred_label=torch.max(output,1)
+			train_acc+=(label==pred_label).sum().item()
+			train_nums+=pred_label.size(0)
+		if self.scheduler is not None:
+			self.scheduler.step()
+		train_acc_rate = train_acc/train_nums
+		self.logger.info("Training loss:{:5f},Training Accuracy:{:5f}".format(train_loss,train_acc_rate) )
+		# save info about loss,accurracy
+		self.info_list["train_acc"].append(train_acc_rate)
+		self.info_list["train_loss"].append(train_loss)
+	def _valid(self,epoch):
+		val_nums = 0.0
+		val_loss = 0.0
+		val_acc = 0.0
+		val_nums_freq = 0.0
+		val_nums_common = 0.0
+		val_nums_rare = 0.0
+		val_acc_freq = 0.0
+		val_acc_common = 0.0
+		val_acc_rare = 0.0
+		self.model.eval()
+		with torch.no_grad():
+			for batch_idx,(data,label) in enumerate(tqdm(self.val_loader)):
+				data,label =data.to(self.device),label.to(self.device)
+				output =self.model(data)
+				val_loss += self.criterion(output,label).item()
+				_,pred_label=torch.max(output,1)
+				val_acc+=(label==pred_label).sum().item()
+				val_nums+=pred_label.size(0)
+				for i in range(len(label)):
+					if (self.val_loader.dataset.freq_list[label[i]] == 0):
+						val_acc_freq += (label[i]==pred_label[i]).item()
+						val_nums_freq += 1
+					elif (self.val_loader.dataset.freq_list[label[i]] == 1):
+						val_acc_common += (label[i]==pred_label[i]).item()
+						val_nums_common += 1
+					else:
+						val_acc_rare += (label[i]==pred_label[i]).item()
+						val_nums_rare += 1		
+		val_acc_rate = val_acc/val_nums
+		self.logger.info("Validation loss:{:5f} Validation accuracy:{:5f}".format(val_loss,val_acc_rate ) )
+		self.logger.info("Val_acc {:d} Val_nums {:d}".format(int(val_acc),int(val_nums)) )
+		val_acc_rate_freq = val_acc_freq/val_nums_freq
+		val_acc_rate_common = val_acc_common/val_nums_common
+		val_acc_rate_rare = val_acc_rare/val_nums_rare
+		self.logger.info("Validation accuracy (freq) : {:5f}".format(val_acc_rate_freq))
+		self.logger.info("Val_acc {:d} Val_nums {:d} (freq)".format(int(val_acc_freq),int(val_nums_freq)) )
+		self.logger.info("Validation accuracy (common) : {:5f}".format(val_acc_rate_common))
+		self.logger.info("Val_acc {:d} Val_nums {:d} (common)".format(int(val_acc_common),int(val_nums_common)) )
+		self.logger.info("Validation accuracy (rare) : {:5f}".format(val_acc_rate_rare))
+		self.logger.info("Val_acc {:d} Val_nums {:d} (rare)".format(int(val_acc_rare),int(val_nums_rare)) )
+		# save info about loss,accurracy
+		self.info_list["val_acc"].append(val_acc_rate)
+		self.info_list["val_loss"].append(val_loss)
+		if self.val_best_acc < val_acc_rate:
+			self.val_best_acc = val_acc_rate
+			torch.save(self.model.state_dict(), os.path.join(self.model_path,"model_best.pth")) 
+			self.logger.info("Save best model")
+		if epoch % self.save_period == 0:
+			torch.save(self.model.state_dict(), os.path.join(self.model_path,"model_{:d}.pth".format(epoch))) 
+
+	def save_info_list(self):
+		savemat("{}/Loss.mat".format(self.model_path), self.info_list)
+		self.logger.info("Save learning history to {}/Loss.mat".format(self.model_path))
+
+
+class BBNTrainer(BaseTrainer):
+	def __init__(self,
+				 device,
+				 combiner,
+				 model,
+				 optimizer,
+				 scheduler,
+				 MAX_EPOCH,
+				 criterion,
+				 train_loader,
+				 train_loader_reverse,
+				 val_loader,
+				 lr,
+				 batch_size,
+				 gradaccum_size,
+				 model_path,
+				 save_period):
+		self.device = device
+		self.model = model.to(self.device)
+		self.combiner = combiner
+		self.optimizer = optimizer
+		self.MAX_EPOCH = MAX_EPOCH
+		self.criterion = criterion
+		self.train_loader = train_loader
+		self.train_loader_reverse = train_loader_reverse
+		self.val_loader = val_loader
+		self.model_path = model_path
+		self.scheduler = scheduler
+		self.save_period = save_period
+		self.lr = lr
+		self.batch_size = batch_size
+		self.gradaccum_size = gradaccum_size
+		self.logger = gen_logger(os.path.join(model_path,"console.log"))
+		# training related
+		self.val_best_acc = -1.0
+		self.info_list = {"train_loss":[],
+						  "val_loss":[],
+						  "train_acc":[],
+						  "val_acc":[],
+						  "balanced_val_acc":[],
+						  "balanced_val_loss":[],
+						  "reversed_val_acc":[],
+						  "reversed_val_loss":[],
+						  "lr":lr,
+						  "batch_size":batch_size
+						 }
+
+	def _train(self,epoch):
+		# Enter training mode
+		self.model.train()
+		self.combiner.reset_epoch(epoch)
+		train_nums = 0.0
+		train_loss = 0.0
+		train_acc = 0.0
+		# train
+		loss_step = 1
+		len_dataloader = min(len(self.train_loader), len(self.train_loader_reverse))
+		data_iter = iter(self.train_loader)
+		data_reverse_iter = iter(self.train_loader_reverse)
+		for batch_idx in tqdm(range(len_dataloader)):
+			# sameple bidirectional data
+			##################
+			# Balanced sampler
+			##################
+			try:
+				data,label = data_iter.next()
+			except StopIteration:
+				data_iter = iter(tqdm(self.train_loader))
+				data,label = data_iter.next()
+			##################
+			# Reverse sampler
+			##################
+			try:
+				data_reverse,label_reverse = data_reverse_iter.next()
+			except StopIteration:
+				data_reverse_iter = iter(self.train_loader_reverse)
+				data_reverse,label_reverse = data_reverse_iter.next()
+
+			data,label = data.to(self.device),label.to(self.device)
+			data_reverse,label_reverse = data_reverse.to(self.device),label_reverse.to(self.device)
+
+			feature_a, feature_b = (
+				self.model(data, feature_cb=True),
+				self.model(data_reverse, feature_rb=True),
+			)
+			##################
+			# alpha weight
+			##################
+			l = 1 - ((self.combiner.epoch - 1) / self.combiner.div_epoch) ** 2  # parabolic decay
+			#l = 0.5  # fix
+			#l = math.cos((self.epoch-1) / self.div_epoch * math.pi /2)   # cosine decay
+			#l = 1 - (1 - ((self.epoch - 1) / self.div_epoch) ** 2) * 1  # parabolic increment
+			#l = 1 - (self.epoch-1) / self.div_epoch  # linear decay
+			#l = np.random.beta(self.alpha, self.alpha) # beta distribution
+			#l = 1 if self.epoch <= 120 else 0  # seperated stage
+
+			mixed_feature = 2 * torch.cat((l * feature_a, (1-l) * feature_b), dim=1)
+			output = self.model(mixed_feature, classifier_flag=True)
+			loss = l * self.criterion(output, label) + (1 - l) * self.criterion(output, label_reverse)
+
+			train_loss +=loss.item()
+			# self.logger.info("Epoch:{} | batch: {}| loss: {}".format(epoch,batch_idx,train_loss))
+			# update: backpropagation,lr schedule
+			loss = loss / self.gradaccum_size
+			loss.backward()
+			if (loss_step % self.gradaccum_size == 0) or ((batch_idx + 1) == len(self.train_loader)):
+				self.optimizer.step()
+				self.optimizer.zero_grad()
+			loss_step += 1
+			_,pred_label=torch.max(output,1)
+			###############################
+			# compute average accurracy
+			###############################
+			train_acc+=(l*(label==pred_label)+(1-l)*(label_reverse==pred_label)).sum().item()
+			train_nums+=pred_label.size(0)
+		if self.scheduler is not None:
+			self.scheduler.step()
+		train_acc_rate = train_acc/train_nums
+		self.logger.info("Training loss:{:5f},Training Accuracy:{:5f}".format(train_loss,train_acc_rate) )
+		# save info about loss,accurracy
+		self.info_list["train_acc"].append(train_acc_rate)
+		self.info_list["train_loss"].append(train_loss)
+
+	def _valid_separate(self,epoch,branch = -1):
+
+		self.logger.info("==============================")
+		if branch == 0:
+			self.logger.info("Balanced branch model")
+			branch_flag = "balanced"
+			self.inference_cache = []
+		elif branch == 1:
+			self.logger.info("Reversed branch model")
+			branch_flag = "reversed"
+		else:
+			assert (False)
+		self.logger.info("==============================")
+		val_nums = 0.0
+		val_loss = 0.0
+		val_acc = 0.0
+		val_nums_freq = 0.0
+		val_nums_common = 0.0
+		val_nums_rare = 0.0
+		val_acc_freq = 0.0
+		val_acc_common = 0.0
+		val_acc_rare = 0.0
+		self.model.eval()
+		with torch.no_grad():
+			for batch_idx,(data,label) in enumerate(tqdm(self.val_loader)):
+				data,label =data.to(self.device),label.to(self.device)
+				##########################
+				# Choose branch of model #
+				##########################
+				if branch == 0:
+					# cache balanced branch
+					output,output_b = self.model(data,separate_classifier_flag=True)
+					self.inference_cache.append(output_b)
+				elif branch == 1:
+					output = self.inference_cache[batch_idx]
+
+				val_loss += self.criterion(output,label).item()
+				_,pred_label=torch.max(output,1)
+				val_acc+=(label==pred_label).sum().item()
+				val_nums+=pred_label.size(0)
+
+				for i in range(len(label)):
+					if (self.val_loader.dataset.freq_list[label[i]] == 0):
+						val_acc_freq += (label[i]==pred_label[i]).item()
+						val_nums_freq += 1
+					elif (self.val_loader.dataset.freq_list[label[i]] == 1):
+						val_acc_common += (label[i]==pred_label[i]).item()
+						val_nums_common += 1
+					else:
+						val_acc_rare += (label[i]==pred_label[i]).item()
+						val_nums_rare += 1		
+		val_acc_rate = val_acc/val_nums
+		self.logger.info("Validation loss:{:5f} Validation accuracy:{:5f}".format(val_loss,val_acc_rate ) )
+		self.logger.info("Val_acc {:d} Val_nums {:d}".format(int(val_acc),int(val_nums)) )
+		val_acc_rate_freq = val_acc_freq/val_nums_freq
+		val_acc_rate_common = val_acc_common/val_nums_common
+		val_acc_rate_rare = val_acc_rare/val_nums_rare
+		self.logger.info("Validation accuracy (freq) : {:5f}".format(val_acc_rate_freq))
+		self.logger.info("Val_acc {:d} Val_nums {:d} (freq)".format(int(val_acc_freq),int(val_nums_freq)) )
+		self.logger.info("Validation accuracy (common) : {:5f}".format(val_acc_rate_common))
+		self.logger.info("Val_acc {:d} Val_nums {:d} (common)".format(int(val_acc_common),int(val_nums_common)) )
+		self.logger.info("Validation accuracy (rare) : {:5f}".format(val_acc_rate_rare))
+		self.logger.info("Val_acc {:d} Val_nums {:d} (rare)".format(int(val_acc_rare),int(val_nums_rare)) )
+		# save info about loss,accurracy
+		self.info_list["{}_val_acc".format(branch_flag)].append(val_acc_rate)
+		self.info_list["{}_val_loss".format(branch_flag)].append(val_loss)
+
+
+	def train(self):
+		for epoch in range(1,self.MAX_EPOCH+1):
+			start = time.time()
+			self.logger.info("====================================================")
+			self.logger.info("Epoch {}: train".format(epoch))
+			self._train(epoch)
+			self.logger.info("Epoch {}: validation".format(epoch))
+			self._valid(epoch)
+			self.logger.info("Epoch {}: validation in Balanced Sampler model".format(epoch))
+			self._valid_separate(epoch,branch=0)
+			self.logger.info("Epoch {}: validation in Reversed Sampler model".format(epoch))
+			self._valid_separate(epoch,branch=1)
+			self.logger.info("Total {:5f} sec per epoch".format(time.time()-start))
+			# log training info per epoch
+			self.save_info_list()
+
+	
diff --git a/final-project/base_vis/dataset.py b/final-project/base_vis/dataset.py
new file mode 100644
index 0000000..0e2b776
--- /dev/null
+++ b/final-project/base_vis/dataset.py
@@ -0,0 +1,341 @@
+from PIL import Image
+import os
+import re
+import glob
+import numpy as np
+import pandas as pd
+import random
+import torch
+from torch.utils.data import sampler,Dataset,DataLoader
+import torch.nn.functional as F
+from torchvision import transforms
+import torchvision 
+
+filenameToPILImage = lambda x: Image.open(x)
+
+
+##############
+# Dataset
+##############
+class FoodDataset(Dataset):
+    def __init__(self,data_path,mode,img_size, class_list=None, num_per_class=None):
+        """
+        for training and validation data
+        return data with label
+        """
+        # visualize #
+        self.class_list = class_list
+        self.num_per_class = num_per_class
+
+        self.data_path = data_path
+        self.img_size = img_size
+        self.mode = mode
+        self.file_list,self.label_list = self.parse_folder_visualize()
+        self.num = len(self.file_list)
+        print("load %d images from %s"%(self.num,self.data_path))
+        if mode == "train":
+            self.transform = transforms.Compose([
+                filenameToPILImage,
+                transforms.Resize((self.img_size,self.img_size)),
+                transforms.RandomHorizontalFlip(p=0.5),
+                transforms.RandomRotation(10),
+                torchvision.transforms.ColorJitter(),
+                transforms.CenterCrop(self.img_size),
+                transforms.ToTensor(),
+                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
+                ])
+        else:
+            self.transform = transforms.Compose([
+                filenameToPILImage,
+                transforms.Resize((self.img_size,self.img_size)),
+                transforms.ToTensor(),
+                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
+                ])
+        self.freq_list = []
+        f = open('../final-project-challenge-3-no_qq_no_life/food_data/label2name.txt', encoding='utf8')
+        for line in f.readlines():
+            if (line.find("f") != -1):
+               self.freq_list.append(0)
+            elif (line.find("c") != -1):
+               self.freq_list.append(1)
+            else:
+               self.freq_list.append(2)         
+        f.close
+    def parse_folder(self):
+        '''
+        output : file _dict 
+        '''
+        file_list = []
+        label_list = []
+        for class_id in range(0,1000):
+            str_id = str(class_id)
+            sub_folder = os.path.join(self.data_path,str_id)
+            sub_list = sorted([name for name in os.listdir(sub_folder) if os.path.isfile(os.path.join(sub_folder, name))])
+            file_list.extend(sub_list)
+            label_list.extend([str_id]*len(sub_list))
+        return file_list,label_list
+
+    
+    def parse_folder_visualize(self):
+        '''
+        output : file _dict 
+        '''
+        file_list = []
+        label_list = []
+        for class_id in self.class_list:
+            str_id = str(class_id)
+            sub_folder = os.path.join(self.data_path,str_id)
+            sub_list = sorted([name for name in os.listdir(sub_folder) if os.path.isfile(os.path.join(sub_folder, name))])
+            if len(sub_list) > self.num_per_class:
+                sub_list = sub_list[:self.num_per_class]
+            file_list.extend(sub_list)
+            label_list.extend([str_id]*len(sub_list))
+        return file_list,label_list
+
+    def getOriginalImage(self, index):
+        img_path = os.path.join(self.data_path,self.label_list[index],self.file_list[index])
+        transform_original = transforms.Resize((384, 384))
+        img = Image.open(img_path)
+        orginal_img = np.array(transform_original(img))
+        return orginal_img
+        
+    def __len__(self) -> int:
+        return self.num
+    def __getitem__(self, index):       
+        img_path = os.path.join(self.data_path,self.label_list[index],self.file_list[index])
+        label = int(self.label_list[index])
+        # Preprocessing -> normalize image
+        img = self.transform(img_path)
+        return img,label
+
+class FoodTestDataset(Dataset):
+    def __init__(self,csv_path,data_path,img_size):
+        """
+        for training and validation data
+        return data without label
+        """
+        self.data_df = pd.read_csv(csv_path)
+        self.data_path = data_path
+        self.img_size = img_size
+        self.num = len(self.data_df)
+        print("load %d images from %s"%(self.num,self.data_path))
+        self.transform = transforms.Compose([
+                filenameToPILImage,
+                transforms.Resize((self.img_size,self.img_size)),
+                transforms.ToTensor(),
+                transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
+                ])
+    def __getitem__(self, index):       
+        img_path = os.path.join(self.data_path,"{:06d}.jpg".format(self.data_df.loc[index, "image_id"]))
+        img = self.transform(img_path)
+        return img
+    def __len__(self) -> int:
+        return self.num
+
+#############################
+# Long tail food dataloader #
+#############################
+
+class FoodLTDataLoader(DataLoader):
+    """
+    2021 ICLR RIDE
+    modified from ImageNetLT Data Loader
+    counting statistics of data,and construct a list of class_num
+    base on this list of class_num,we can do reweight/resample
+    """
+    def __init__(self, data_dir, img_size, batch_size, shuffle=True, num_workers=1, training=True, balanced=False, reversed=False, retain_epoch_size=True):
+        if training:
+            dataset = FoodDataset(os.path.join(data_dir,"train"),"train",img_size = img_size)
+            val_dataset = FoodDataset(os.path.join(data_dir,"val"),"val",img_size = img_size)
+        else: # test
+            dataset = FoodTestDataset(os.path.join(data_dir,"test"),img_size = img_size)
+            val_dataset = None
+
+        self.dataset = dataset
+        self.val_dataset = val_dataset
+
+        self.n_samples = len(self.dataset)
+
+        num_classes = len(np.unique(dataset.label_list))
+        assert num_classes == 1000
+
+        cls_num_list = [0] * num_classes
+        for label in dataset.label_list:
+            cls_num_list[int(label)] += 1
+
+        self.cls_num_list = cls_num_list
+
+        if balanced:
+            if training:
+                buckets = [[] for _ in range(num_classes)]
+                for idx, label in enumerate(dataset.label_list):
+                    buckets[int(label)].append(idx)
+                sampler = BalancedSampler(buckets, retain_epoch_size)
+                shuffle = False
+            else:
+                print("Test set will not be evaluated with balanced sampler, nothing is done to make it balanced")
+        elif reversed:
+            if training:
+                max_num = max(self.cls_num_list)
+                class_weight = [max_num / i for i in self.cls_num_list]
+                buckets = [[] for _ in range(num_classes)]
+                for idx, label in enumerate(dataset.label_list):
+                    buckets[int(label)].append(idx)
+                sampler = ReversedSampler(buckets, retain_epoch_size, class_weight)
+                shuffle = False 
+            else:
+                print("Test set will not be evaluated with reversed sampler")
+        else:
+            sampler = None
+        
+        self.shuffle = shuffle
+        self.init_kwargs = {
+            'batch_size': batch_size,
+            'shuffle': self.shuffle,
+            'num_workers': num_workers
+        }
+        self.val_init_kwargs = {
+            'batch_size': batch_size,
+            'shuffle': False, # For validation data,always false
+            'num_workers': num_workers
+        }
+        super().__init__(dataset=self.dataset, **self.init_kwargs, sampler=sampler) # Note that sampler does not apply to validation set
+
+    def split_validation(self):
+        # If you do not want to validate:
+        #return None
+        # If you want to validate:
+        return DataLoader(dataset=self.val_dataset, **self.val_init_kwargs)
+
+
+#######################
+# hw1 dataloader
+# Sanity check workload
+# use hw1 data to verify correctness of training code
+#######################
+class P1_Dataset(Dataset):
+    def __init__(self,data_path,val_mode):
+        self.data_path = data_path
+        self.val_mode = val_mode
+        self.file_list=sorted([name for name in os.listdir(self.data_path) if os.path.isfile(os.path.join(self.data_path, name)) ])
+        self.num = len(self.file_list)
+        print("load %d images from %s"%(self.num,self.data_path))
+        self.transform = transforms.Compose([
+            filenameToPILImage,
+            transforms.Resize((224,224)),
+            transforms.RandomRotation(degrees=10, resample=Image.BILINEAR),
+            transforms.RandomHorizontalFlip(),
+            transforms.ToTensor(),
+            transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
+            ])
+    def __len__(self) -> int:
+        return self.num
+    def __getitem__(self, index):
+        label_idx = int(re.findall(r'(\d+)_\d+.png',self.file_list[index])[0])
+        # Preprocessing -> normalize image
+        image_data = self.transform(os.path.join(self.data_path,self.file_list[index]))#.unsqueeze(0)
+        return image_data,label_idx
+##############
+# Sampler
+##############
+###################################
+# 2021 CVPR RIDE Balanced sampler #
+###################################
+class BalancedSampler(sampler.Sampler):
+    def __init__(self, buckets, retain_epoch_size=False):
+        for bucket in buckets:
+            random.shuffle(bucket)
+
+        self.bucket_num = len(buckets)
+        self.buckets = buckets
+        self.bucket_pointers = [0 for _ in range(self.bucket_num)]
+        self.retain_epoch_size = retain_epoch_size
+    
+    def __iter__(self):
+        count = self.__len__()
+        while count > 0:
+            yield self._next_item()
+            count -= 1
+
+    def _next_item(self):
+        bucket_idx = random.randint(0, self.bucket_num - 1)
+        bucket = self.buckets[bucket_idx]
+        item = bucket[self.bucket_pointers[bucket_idx]]
+        self.bucket_pointers[bucket_idx] += 1
+        if self.bucket_pointers[bucket_idx] == len(bucket):
+            self.bucket_pointers[bucket_idx] = 0
+            random.shuffle(bucket)
+        return item
+
+    def __len__(self):
+        if self.retain_epoch_size:
+            return sum([len(bucket) for bucket in self.buckets]) # Acrually we need to upscale to next full batch
+        else:
+            return max([len(bucket) for bucket in self.buckets]) * self.bucket_num # Ensures every instance has the chance to be visited in an epoch
+
+class ReversedSampler(sampler.Sampler):
+    def __init__(self, buckets, retain_epoch_size=False, class_weight=None):
+        for bucket in buckets:
+            random.shuffle(bucket)
+        self.class_weight = class_weight
+        self.sum_weight = sum(self.class_weight)
+        self.bucket_num = len(buckets)
+        self.buckets = buckets
+        self.bucket_pointers = [0 for _ in range(self.bucket_num)]
+        self.retain_epoch_size = retain_epoch_size
+    
+    def sample_class_index_by_weight(self):
+        rand_number, now_sum = random.random() * self.sum_weight, 0
+        for i in range(len(self.class_weight)):
+            now_sum += self.class_weight[i]
+            if rand_number <= now_sum:
+                return i
+
+    def __iter__(self):
+        count = self.__len__()
+        while count > 0:
+            yield self._next_item()
+            count -= 1
+
+    def _next_item(self):
+        bucket_idx = self.sample_class_index_by_weight()
+        bucket = self.buckets[bucket_idx]
+        item = bucket[self.bucket_pointers[bucket_idx]]
+        self.bucket_pointers[bucket_idx] += 1
+        if self.bucket_pointers[bucket_idx] == len(bucket):
+            self.bucket_pointers[bucket_idx] = 0
+            random.shuffle(bucket)
+        return item
+
+    def __len__(self):
+        if self.retain_epoch_size:
+            return sum([len(bucket) for bucket in self.buckets]) # Acrually we need to upscale to next full batch
+        else:
+            return max([len(bucket) for bucket in self.buckets]) * self.bucket_num # Ensures every instance has the chance to be visited in an epoch
+
+class ChunkSampler(sampler.Sampler):
+    """Samples elements sequentially from some offset for sanity check 
+    Arguments:
+        num_samples: # of desired datapoints
+        start: offset where we should start selecting from
+    """
+    def __init__(self, num_samples, start = 0):
+        self.num_samples = num_samples
+        self.start = start
+        
+
+    def __iter__(self):
+        return iter(range(self.start, self.start + self.num_samples))
+
+    def __len__(self):
+        return self.num_samples
+if __name__ == "__main__":
+    import time 
+    start = time.time()
+    #D = FoodDataset("food_data/train","train",img_size = 384 )
+    D = FoodTestDataset("food_data/testcase/sample_submission_comm_track.csv","food_data/test",img_size = 384 )
+    d = D.__getitem__(0)
+    print(d[0])
+
+    print(time.time()-start)
+
diff --git a/final-project/base_vis/loss.py b/final-project/base_vis/loss.py
new file mode 100644
index 0000000..2ea7068
--- /dev/null
+++ b/final-project/base_vis/loss.py
@@ -0,0 +1,288 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import numpy as np
+
+import random
+
+eps = 1e-7
+
+# CE and LDAM are supported
+
+# If you would like to add other losses, please have a look at:
+# Focal Loss: https://github.com/kaidic/LDAM-DRW
+# CRD, PKT, and SP Related Part: https://github.com/HobbitLong/RepDistiller
+
+def focal_loss(input_values, gamma):
+    """Computes the focal loss"""
+    p = torch.exp(-input_values)
+    loss = (1 - p) ** gamma * input_values
+    return loss.mean()
+
+class FocalLoss(nn.Module):
+    def __init__(self, cls_num_list=None, weight=None, gamma=0.):
+        super(FocalLoss, self).__init__()
+        assert gamma >= 0
+        self.gamma = gamma
+        self.weight = weight
+    
+    def _hook_before_epoch(self, epoch):
+        pass
+
+    def forward(self, output_logits, target):
+        return focal_loss(F.cross_entropy(output_logits, target, reduction='none', weight=self.weight), self.gamma)
+
+class CrossEntropyLoss(nn.Module):
+    def __init__(self, cls_num_list=None, reweight_CE=False):
+        super().__init__()
+        if reweight_CE:
+            idx = 1 # condition could be put in order to set idx
+            betas = [0, 0.9999]
+            effective_num = 1.0 - np.power(betas[idx], cls_num_list)
+            per_cls_weights = (1.0 - betas[idx]) / np.array(effective_num)
+            per_cls_weights = per_cls_weights / np.sum(per_cls_weights) * len(cls_num_list)
+            self.per_cls_weights = torch.tensor(per_cls_weights, dtype=torch.float, requires_grad=False)
+        else:
+            self.per_cls_weights = None
+
+    def to(self, device):
+        super().to(device)
+        if self.per_cls_weights is not None:
+            self.per_cls_weights = self.per_cls_weights.to(device)
+        
+        return self
+
+    def forward(self, output_logits, target): # output is logits
+        return F.cross_entropy(output_logits, target, weight=self.per_cls_weights)
+
+class LDAMLoss(nn.Module):
+    def __init__(self, cls_num_list=None, max_m=0.5, s=30, reweight_epoch=-1):
+        super().__init__()
+        if cls_num_list is None:
+            # No cls_num_list is provided, then we cannot adjust cross entropy with LDAM.
+            self.m_list = None
+        else:
+            self.reweight_epoch = reweight_epoch
+            m_list = 1.0 / np.sqrt(np.sqrt(cls_num_list))
+            m_list = m_list * (max_m / np.max(m_list))
+            m_list = torch.tensor(m_list, dtype=torch.float, requires_grad=False)
+            self.m_list = m_list
+            assert s > 0
+            self.s = s
+            if reweight_epoch != -1:
+                idx = 1 # condition could be put in order to set idx
+                betas = [0, 0.9999]
+                effective_num = 1.0 - np.power(betas[idx], cls_num_list)
+                per_cls_weights = (1.0 - betas[idx]) / np.array(effective_num)
+                per_cls_weights = per_cls_weights / np.sum(per_cls_weights) * len(cls_num_list)
+                self.per_cls_weights_enabled = torch.tensor(per_cls_weights, dtype=torch.float, requires_grad=False)
+            else:
+                self.per_cls_weights_enabled = None
+                self.per_cls_weights = None
+
+    def to(self, device):
+        super().to(device)
+        if self.m_list is not None:
+            self.m_list = self.m_list.to(device)
+
+        if self.per_cls_weights_enabled is not None:
+            self.per_cls_weights_enabled = self.per_cls_weights_enabled.to(device)
+
+        return self
+
+    def _hook_before_epoch(self, epoch):
+        if self.reweight_epoch != -1:
+            self.epoch = epoch
+
+            if epoch > self.reweight_epoch:
+                self.per_cls_weights = self.per_cls_weights_enabled
+            else:
+                self.per_cls_weights = None
+
+    def get_final_output(self, output_logits, target):
+        x = output_logits
+
+        index = torch.zeros_like(x, dtype=torch.uint8, device=x.device)
+        index.scatter_(1, target.data.view(-1, 1), 1)
+        
+        index_float = index.float()
+        batch_m = torch.matmul(self.m_list[None, :], index_float.transpose(0,1))
+        
+        batch_m = batch_m.view((-1, 1))
+        x_m = x - batch_m * self.s
+
+        final_output = torch.where(index, x_m, x)
+        return final_output
+
+    def forward(self, output_logits, target):
+        if self.m_list is None:
+            return F.cross_entropy(output_logits, target)
+        
+        final_output = self.get_final_output(output_logits, target)
+        return F.cross_entropy(final_output, target, weight=self.per_cls_weights)
+
+"""
+2021 ICLR RIDE
+Don't support plug-and play feature to our repo QQ
+"""
+
+class RIDELoss(nn.Module):
+    def __init__(self, cls_num_list=None, base_diversity_temperature=1.0, max_m=0.5, s=30, reweight=True, reweight_epoch=-1, 
+        base_loss_factor=1.0, additional_diversity_factor=-0.2, reweight_factor=0.05):
+        super().__init__()
+        self.base_loss = F.cross_entropy
+        self.base_loss_factor = base_loss_factor
+        if not reweight:
+            self.reweight_epoch = -1
+        else:
+            self.reweight_epoch = reweight_epoch
+
+        # LDAM is a variant of cross entropy and we handle it with self.m_list.
+        if cls_num_list is None:
+            # No cls_num_list is provided, then we cannot adjust cross entropy with LDAM.
+            self.m_list = None
+            self.per_cls_weights_enabled = None
+            self.per_cls_weights_enabled_diversity = None
+        else:
+            # We will use LDAM loss if we provide cls_num_list.
+            print("Class distributuion",cls_num_list)
+            m_list = 1.0 / np.sqrt(np.sqrt(cls_num_list))
+            m_list = m_list * (max_m / np.max(m_list))
+            m_list = torch.tensor(m_list, dtype=torch.float, requires_grad=False)
+            self.m_list = m_list
+            self.s = s
+            assert s > 0
+            
+            if reweight_epoch != -1:
+                idx = 1 # condition could be put in order to set idx
+                betas = [0, 0.9999]
+                effective_num = 1.0 - np.power(betas[idx], cls_num_list)
+                per_cls_weights = (1.0 - betas[idx]) / np.array(effective_num)
+                per_cls_weights = per_cls_weights / np.sum(per_cls_weights) * len(cls_num_list)
+                self.per_cls_weights_enabled = torch.tensor(per_cls_weights, dtype=torch.float, requires_grad=False)
+            else:
+                self.per_cls_weights_enabled = None
+
+            cls_num_list = np.array(cls_num_list) / np.sum(cls_num_list)
+            C = len(cls_num_list)
+            per_cls_weights = C * cls_num_list * reweight_factor + 1 - reweight_factor
+
+            # Experimental normalization: This is for easier hyperparam tuning, the effect can be described in the learning rate so the math formulation keeps the same.
+            # At the same time, the 1 - max trick that was previously used is not required since weights are already adjusted.
+            per_cls_weights = per_cls_weights / np.max(per_cls_weights)
+
+            assert np.all(per_cls_weights > 0), "reweight factor is too large: out of bounds"
+            # save diversity per_cls_weights
+            self.per_cls_weights_enabled_diversity = torch.tensor(per_cls_weights, dtype=torch.float, requires_grad=False).cuda()
+
+        self.base_diversity_temperature = base_diversity_temperature
+        self.additional_diversity_factor = additional_diversity_factor
+
+    def to(self, device):
+        super().to(device)
+        if self.m_list is not None:
+            self.m_list = self.m_list.to(device)
+        
+        if self.per_cls_weights_enabled is not None:
+            self.per_cls_weights_enabled = self.per_cls_weights_enabled.to(device)
+
+        if self.per_cls_weights_enabled_diversity is not None:
+            self.per_cls_weights_enabled_diversity = self.per_cls_weights_enabled_diversity.to(device)
+
+        return self
+
+    def _hook_before_epoch(self, epoch):
+        if self.reweight_epoch != -1:
+            self.epoch = epoch
+
+            if epoch > self.reweight_epoch:
+                self.per_cls_weights_base = self.per_cls_weights_enabled
+                self.per_cls_weights_diversity = self.per_cls_weights_enabled_diversity
+            else:
+                self.per_cls_weights_base = None
+                self.per_cls_weights_diversity = None
+
+    def get_final_output(self, output_logits, target):
+        x = output_logits
+
+        index = torch.zeros_like(x, dtype=torch.uint8, device=x.device)
+        index.scatter_(1, target.data.view(-1, 1), 1)
+        
+        index_float = index.float()
+        batch_m = torch.matmul(self.m_list[None, :], index_float.transpose(0,1))
+        
+        batch_m = batch_m.view((-1, 1))
+        x_m = x - batch_m * self.s
+
+        final_output = torch.where(index, x_m, x)
+        return final_output
+
+    def forward(self, output_logits, target, extra_info=None):
+        if extra_info is None:
+            return self.base_loss(output_logits, target)
+
+        loss = 0
+
+        # Adding RIDE Individual Loss for each expert
+        for logits_item in extra_info['logits']:
+            ride_loss_logits = output_logits if self.additional_diversity_factor == 0 else logits_item
+            if self.m_list is None:
+                loss += self.base_loss_factor * self.base_loss(ride_loss_logits, target)
+            else:
+                final_output = self.get_final_output(ride_loss_logits, target)
+                loss += self.base_loss_factor * self.base_loss(final_output, target, weight=self.per_cls_weights_base)
+            
+            base_diversity_temperature = self.base_diversity_temperature
+
+            if self.per_cls_weights_diversity is not None:
+                diversity_temperature = base_diversity_temperature * self.per_cls_weights_diversity.view((1, -1))
+                temperature_mean = diversity_temperature.mean().item()
+            else:
+                diversity_temperature = base_diversity_temperature
+                temperature_mean = base_diversity_temperature
+            
+            output_dist = F.log_softmax(logits_item / diversity_temperature, dim=1)
+            with torch.no_grad():
+                # Using the mean takes only linear instead of quadratic time in computing and has only a slight difference so using the mean is preferred here
+                mean_output_dist = F.softmax(output_logits / diversity_temperature, dim=1)
+            
+            loss += self.additional_diversity_factor * temperature_mean * temperature_mean * F.kl_div(output_dist, mean_output_dist, reduction='batchmean')
+        
+        return loss
+
+class RIDELossWithDistill(nn.Module):
+    def __init__(self, cls_num_list=None, additional_distill_loss_factor=1.0, distill_temperature=1.0, ride_loss_factor=1.0, **kwargs):
+        super().__init__()
+        self.ride_loss = RIDELoss(cls_num_list=cls_num_list, **kwargs)
+        self.distill_temperature = distill_temperature
+
+        self.ride_loss_factor = ride_loss_factor
+        self.additional_distill_loss_factor = additional_distill_loss_factor
+
+    def to(self, device):
+        super().to(device)
+        self.ride_loss = self.ride_loss.to(device)
+        return self
+
+    def _hook_before_epoch(self, epoch):
+        self.ride_loss._hook_before_epoch(epoch)
+
+    def forward(self, student, target=None, teacher=None, extra_info=None):
+        output_logits = student
+        if extra_info is None:
+            return self.ride_loss(output_logits, target)
+
+        loss = 0
+        num_experts = len(extra_info['logits'])
+        for logits_item in extra_info['logits']:
+            loss += self.ride_loss_factor * self.ride_loss(output_logits, target, extra_info)
+            distill_temperature = self.distill_temperature
+
+            student_dist = F.log_softmax(student / distill_temperature, dim=1)
+            with torch.no_grad():
+                teacher_dist = F.softmax(teacher / distill_temperature, dim=1)
+            
+            distill_loss = F.kl_div(student_dist, teacher_dist, reduction='batchmean')
+            distill_loss = distill_temperature * distill_temperature * distill_loss
+            loss += self.additional_distill_loss_factor * distill_loss
+        return loss
diff --git a/final-project/base_vis/tester.py b/final-project/base_vis/tester.py
new file mode 100644
index 0000000..563509b
--- /dev/null
+++ b/final-project/base_vis/tester.py
@@ -0,0 +1,43 @@
+import torch
+import time
+import os
+from scipy.io import savemat
+from tqdm import tqdm 
+from util import gen_logger
+import csv
+import itertools
+import pandas as pd
+import numpy as np
+class BaseTester():
+    def __init__(self,
+                 device,
+                 model,
+                 test_loader,
+                 load_model_path,
+                 kaggle_file):
+        self.device = device 
+        self.model = model.to(self.device) 
+        self.test_loader = test_loader
+        self.model_path = load_model_path
+        self.logger = gen_logger(os.path.join(load_model_path,"test.log"))
+        self.kaggle_file = kaggle_file
+    def test(self):
+        '''
+        log kaggle submission file
+        '''
+        label_list = []
+        self.model.eval()
+        with torch.no_grad():
+            for i, t_imgs in enumerate(tqdm(self.test_loader)):
+                t_imgs = t_imgs.to(self.device)
+                P_pred = self.model(t_imgs)
+                _,pred_label=torch.max(P_pred,1)
+                item = pred_label.flatten().cpu().squeeze().tolist()
+                label_list.append(item)
+        label_list = np.concatenate(label_list)
+        image_ids = ["{:06d}".format(i) for i in self.test_loader.dataset.data_df.image_id]
+        # write predictions
+        df = pd.DataFrame({"image_id": image_ids, 'label': label_list})
+        df.to_csv(self.kaggle_file, index=False)
+        print("===> File saved as {}".format(self.kaggle_file))
+
diff --git a/final-project/base_vis/trainer.py b/final-project/base_vis/trainer.py
new file mode 100644
index 0000000..8f69164
--- /dev/null
+++ b/final-project/base_vis/trainer.py
@@ -0,0 +1,304 @@
+import torch
+import time
+import os
+from scipy.io import savemat
+from tqdm import tqdm 
+from util import gen_logger
+import torch.nn.functional as F
+import pandas as pd
+from sklearn.metrics import confusion_matrix
+import numpy as np
+
+class BaseTrainer():
+    def __init__(self,
+                 device,
+                 model,
+                 optimizer,
+                 scheduler,
+                 MAX_EPOCH,
+                 criterion,
+                 train_loader,
+                 val_loader,
+                 lr,
+                 batch_size,
+                 gradaccum_size, 
+                 model_path,
+                 save_period):
+        self.device = device
+        self.model = model.to(self.device)
+        self.optimizer = optimizer
+        self.MAX_EPOCH = MAX_EPOCH
+        self.criterion = criterion
+        self.train_loader = train_loader
+        self.val_loader = val_loader
+        self.model_path = model_path
+        self.scheduler = scheduler
+        self.save_period = save_period
+        self.lr = lr
+        self.batch_size = batch_size
+        self.gradaccum_size = gradaccum_size
+        self.logger = gen_logger(os.path.join(model_path,"console.log"))
+        # training related
+        self.val_best_acc = -1.0
+        self.info_list = {"train_loss":[],
+                          "val_loss":[],
+                          "train_acc":[],
+                          "val_acc":[],
+                          "lr":lr,
+                          "batch_size":batch_size
+                         }
+    def train(self):
+        for epoch in range(self.MAX_EPOCH):
+            start = time.time()
+            self.logger.info("====================================================")
+            # self.logger.info("Epoch {}: train".format(epoch))
+            # self._train(epoch)
+            # self.logger.info("Epoch {}: validation".format(epoch))
+            self._valid(epoch)
+            # self._valid_visualize(epoch)
+            self.logger.info("Total {:5f} sec per epoch".format(time.time()-start))
+            # log training info per epoch
+            self.save_info_list()
+    def _train(self,epoch):
+        self.model.train()
+        train_nums = 0.0
+        train_loss = 0.0
+        train_acc = 0.0
+        # train
+        loss_step = 1
+        for batch_idx,(data,label) in enumerate(tqdm(self.train_loader)):
+            data,label =data.to(self.device),label.to(self.device)
+            output = self.model(data)
+            loss = self.criterion(output,label)
+            train_loss +=loss.item()
+            # self.logger.info("Epoch:{} | batch: {}| loss: {}".format(epoch,batch_idx,train_loss))
+            # update: backpropagation,lr schedule
+            loss = loss / self.gradaccum_size
+            loss.backward()
+            if (loss_step % self.gradaccum_size == 0) or ((batch_idx + 1) == len(self.train_loader)):
+                self.optimizer.step()
+                self.optimizer.zero_grad()
+            loss_step += 1
+            _,pred_label=torch.max(output,1)
+            train_acc+=(label==pred_label).sum().item()
+            train_nums+=pred_label.size(0)
+        if self.scheduler is not None:
+            self.scheduler.step()
+        train_acc_rate = train_acc/train_nums
+        self.logger.info("Training loss:{:5f},Training Accuracy:{:5f}".format(train_loss,train_acc_rate) )
+        # save info about loss,accurracy
+        self.info_list["train_acc"].append(train_acc_rate)
+        self.info_list["train_loss"].append(train_loss)
+    def _valid(self,epoch):
+        val_nums = 0.0
+        val_loss = 0.0
+        val_acc = 0.0
+        val_nums_freq = 0.0
+        val_nums_common = 0.0
+        val_nums_rare = 0.0
+        val_acc_freq = 0.0
+        val_acc_common = 0.0
+        val_acc_rare = 0.0
+        self.model.eval()
+        with torch.no_grad():
+            for batch_idx,(data,label) in enumerate(tqdm(self.val_loader)):
+                data,label =data.to(self.device),label.to(self.device)
+                output =self.model(data)
+                val_loss += self.criterion(output,label).item()
+                _,pred_label=torch.max(output,1)
+                val_acc+=(label==pred_label).sum().item()
+                val_nums+=pred_label.size(0)
+                for i in range(len(label)):
+                    if (self.val_loader.dataset.freq_list[label[i]] == 0):
+                        val_acc_freq += (label[i]==pred_label[i]).item()
+                        val_nums_freq += 1
+                    elif (self.val_loader.dataset.freq_list[label[i]] == 1):
+                        val_acc_common += (label[i]==pred_label[i]).item()
+                        val_nums_common += 1
+                    else:
+                        val_acc_rare += (label[i]==pred_label[i]).item()
+                        val_nums_rare += 1      
+        val_acc_rate = val_acc/val_nums
+        self.logger.info("Validation loss:{:5f} Validation accuracy:{:5f}".format(val_loss,val_acc_rate ) )
+        self.logger.info("Val_acc {:d} Val_nums {:d}".format(int(val_acc),int(val_nums)) )
+        val_acc_rate_freq = val_acc_freq/val_nums_freq
+        val_acc_rate_common = val_acc_common/val_nums_common
+        val_acc_rate_rare = val_acc_rare/val_nums_rare
+        self.logger.info("Validation accuracy (freq) : {:5f}".format(val_acc_rate_freq))
+        self.logger.info("Val_acc {:d} Val_nums {:d} (freq)".format(int(val_acc_freq),int(val_nums_freq)) )
+        self.logger.info("Validation accuracy (common) : {:5f}".format(val_acc_rate_common))
+        self.logger.info("Val_acc {:d} Val_nums {:d} (common)".format(int(val_acc_common),int(val_nums_common)) )
+        self.logger.info("Validation accuracy (rare) : {:5f}".format(val_acc_rate_rare))
+        self.logger.info("Val_acc {:d} Val_nums {:d} (rare)".format(int(val_acc_rare),int(val_nums_rare)) )
+        # save info about loss,accurracy
+        self.info_list["train_acc"].append(val_acc_rate)
+        self.info_list["val_loss"].append(val_loss)
+        # if self.val_best_acc < val_acc_rate:
+        #     self.val_best_acc = val_acc_rate
+        #     torch.save(self.model.state_dict(), os.path.join(self.model_path,"model_best.pth")) 
+        #     self.logger.info("Save best model")
+        # if epoch % self.save_period == 0:
+        #     torch.save(self.model.state_dict(), os.path.join(self.model_path,"model_{:d}.pth".format(epoch))) 
+
+    def save_info_list(self):
+        savemat("{}/Loss.mat".format(self.model_path), self.info_list)
+        self.logger.info("Save learning history to {}/Loss.mat".format(self.model_path))
+
+    def _valid_visualize(self, epoch):
+        val_nums = 0.0
+        val_loss = 0.0
+        val_acc = 0.0
+        val_nums_freq = 0.0
+        val_nums_common = 0.0
+        val_nums_rare = 0.0
+        val_acc_freq = 0.0
+        val_acc_common = 0.0
+        val_acc_rare = 0.0
+        self.model.eval()
+
+        y_true = []
+        y_pred = []
+        label2correct = dict(zip([i for i in range(1000)], [0 for i in range(1000)]))
+        label2total = dict(zip([i for i in range(1000)], [0 for i in range(1000)]))
+        with torch.no_grad():
+            for batch_idx,(data,label) in enumerate(tqdm(self.val_loader)):
+                data,label =data.to(self.device),label.to(self.device)
+                output =self.model(data)
+                val_loss += self.criterion(output,label).item()
+                _,pred_label=torch.max(output,1)
+                val_acc+=(label==pred_label).sum().item()
+                val_nums+=pred_label.size(0)
+                y_pred.extend(pred_label.cpu().tolist())
+                y_true.extend(label.cpu().tolist())
+                for i in range(len(label)):
+                    label2correct[label[i].item()] += (label[i]==pred_label[i]).item()
+                    label2total[label[i].item()] += 1
+                    if (self.val_loader.dataset.freq_list[label[i]] == 0):
+                        val_acc_freq += (label[i]==pred_label[i]).item()
+                        val_nums_freq += 1
+                    elif (self.val_loader.dataset.freq_list[label[i]] == 1):
+                        val_acc_common += (label[i]==pred_label[i]).item()
+                        val_nums_common += 1
+                    else:
+                        val_acc_rare += (label[i]==pred_label[i]).item()
+                        val_nums_rare += 1
+        acc_list = []
+        for correct, total in zip(list(label2correct.values()), list(label2total.values())):
+            acc_list.append(round(correct/total, 2))
+        freq_list = []
+        name_list = []
+        f = open('../final-project-challenge-3-no_qq_no_life/food_data/label2name.txt', encoding='utf8')
+        for line in f.readlines():
+            _id, freq, name = line.split()
+            freq_list.append(freq)
+            name_list.append(name)
+        # label, freq/comm/rare, name, total, correct, acc
+        # df = pd.DataFrame(list(zip(list(range(1000)), freq_list, name_list, list(label2total.values()), list(label2correct.values()), acc_list)),
+        #                     columns=['label', 'frequency', 'name', 'total', 'correct', 'accuracy'])
+        # df.to_csv(os.path.join(self.model_path, 'valid_info.csv'), index=False)
+        # confusion matrix
+        print('computing confusion matrix...')
+        cm = confusion_matrix(y_true, y_pred, labels=[i for i in range(1000)])
+        np.save(os.path.join(self.model_path, 'cm.npy'), cm)
+        
+        
+        val_acc_rate = val_acc/val_nums
+        self.logger.info("Validation loss:{:5f} Validation accuracy:{:5f}".format(val_loss,val_acc_rate ) )
+        self.logger.info("Val_acc {:d} Val_nums {:d}".format(int(val_acc),int(val_nums)) )
+        val_acc_rate_freq = val_acc_freq/val_nums_freq
+        val_acc_rate_common = val_acc_common/val_nums_common
+        val_acc_rate_rare = val_acc_rare/val_nums_rare
+        self.logger.info("Validation accuracy (freq) : {:5f}".format(val_acc_rate_freq))
+        self.logger.info("Val_acc {:d} Val_nums {:d} (freq)".format(int(val_acc_freq),int(val_nums_freq)) )
+        self.logger.info("Validation accuracy (common) : {:5f}".format(val_acc_rate_common))
+        self.logger.info("Val_acc {:d} Val_nums {:d} (common)".format(int(val_acc_common),int(val_nums_common)) )
+        self.logger.info("Validation accuracy (rare) : {:5f}".format(val_acc_rate_rare))
+        self.logger.info("Val_acc {:d} Val_nums {:d} (rare)".format(int(val_acc_rare),int(val_nums_rare)) )
+        # save info about loss,accurracy
+        self.info_list["train_acc"].append(val_acc_rate)
+        self.info_list["val_loss"].append(val_loss)
+        if self.val_best_acc < val_acc_rate:
+            self.val_best_acc = val_acc_rate
+            torch.save(self.model.state_dict(), os.path.join(self.model_path,"model_best.pth")) 
+            self.logger.info("Save best model")
+        if epoch % self.save_period == 0:
+            torch.save(self.model.state_dict(), os.path.join(self.model_path,"model_{:d}.pth".format(epoch))) 
+    def train_seperate(self):
+        for epoch in range(self.MAX_EPOCH):
+            start = time.time()
+            self.logger.info("====================================================")
+            self.logger.info("Epoch {}: train".format(epoch))
+            self._train_seperate(epoch)
+            self.logger.info("Epoch {}: validation".format(epoch))
+            self._valid_seperate(epoch)
+            self.logger.info("Total {:5f} sec per epoch".format(time.time()-start))
+            # log training info per epoch
+            self.save_info_list()
+    
+    def _train_seperate(self,epoch):
+        # seperately update different layer's loss
+        self.model.train()
+        train_nums = 0.0
+        train_loss = 0.0
+        train_acc = 0.0
+        # train
+        for batch_idx,(data,label) in enumerate(tqdm(self.train_loader)):
+            data,label =data.to(self.device),label.to(self.device)
+            self.optimizer.zero_grad()
+            output1, output2, output3, output4 = self.model(data)
+            loss1 = self.criterion(output1, label)
+            loss2 = self.criterion(output2, label)
+            loss3 = self.criterion(output3, label)
+            loss4 = self.criterion(output4, label)
+            loss = (loss1 + loss2 + loss3 + loss4) / 4
+            train_loss += (loss1.item() + loss2.item() + loss3.item() + loss4.item()) / 4
+            # self.logger.info("Epoch:{} | batch: {}| loss: {}".format(epoch,batch_idx,train_loss))
+            # update: backpropagation,lr schedule
+            loss.backward()
+            self.optimizer.step()
+            if self.scheduler is not None:
+                self.scheduler.step()
+            logit = (F.softmax(output1, dim=-1) + F.softmax(output2, dim=-1) + F.softmax(output3, dim=-1) + F.softmax(output4, dim=-1)) / 4
+            _,pred_label=torch.max(logit, 1)
+            train_acc+=(label==pred_label).sum().item()
+            train_nums+=pred_label.size(0)
+            if batch_idx % 100 == 0 and batch_idx > 0:
+                print("Training loss:{:5f},Training Accuracy:{:5f}".format(loss.item(),train_acc/train_nums))
+        train_acc_rate = train_acc/train_nums
+        self.logger.info("Training loss:{:5f},Training Accuracy:{:5f}".format(train_loss,train_acc_rate) )
+        # save info about loss,accurracy
+        self.info_list["train_acc"].append(train_acc_rate)
+        self.info_list["train_loss"].append(train_loss)
+
+    def _valid_seperate(self,epoch):
+        val_nums = 0.0
+        val_loss = 0.0
+        val_acc = 0.0
+        self.model.eval()
+        with torch.no_grad():
+            for batch_idx,(data,label) in enumerate(self.val_loader):
+                data,label =data.to(self.device),label.to(self.device)
+                output1, output2, output3, output4 = self.model(data)
+                loss1 = self.criterion(output1, label)
+                loss2 = self.criterion(output2, label)
+                loss3 = self.criterion(output3, label)
+                loss4 = self.criterion(output4, label)
+                loss = (loss1 + loss2 + loss3 + loss4) / 4
+                val_loss += (loss1.item() + loss2.item() + loss3.item() + loss4.item()) / 4
+                logit = (F.softmax(output1, dim=-1) + F.softmax(output2, dim=-1) + F.softmax(output3, dim=-1) + F.softmax(output4, dim=-1)) / 4
+                _,pred_label=torch.max(logit, 1)
+                val_acc+=(label==pred_label).sum().item()
+                val_nums+=pred_label.size(0)
+        val_acc_rate = val_acc/val_nums
+        self.logger.info("Validation loss:{:5f} Validation accuracy:{:5f}".format(val_loss,val_acc_rate ) )
+        self.logger.info("Val_acc {:d} Val_nums {:d}".format(int(val_acc),int(val_nums)) )
+        # save info about loss,accurracy
+        self.info_list["train_acc"].append(val_acc_rate)
+        self.info_list["val_loss"].append(val_loss)
+        if self.val_best_acc < val_acc_rate:
+            self.val_best_acc = val_acc_rate
+            torch.save(self.model.state_dict(), os.path.join(self.model_path,"model_best.pth")) 
+            self.logger.info("Save best model")
+        if epoch % self.save_period == 0:
+            torch.save(self.model.state_dict(), os.path.join(self.model_path,"model_{:d}.pth".format(epoch))) 
+
diff --git a/final-project/get_checkpoints.sh b/final-project/get_checkpoints.sh
new file mode 100644
index 0000000..a39e46f
--- /dev/null
+++ b/final-project/get_checkpoints.sh
@@ -0,0 +1,3 @@
+wget --no-check-certificate "https://onedrive.live.com/download?cid=EDF068524DEBC79F&resid=EDF068524DEBC79F%211077&authkey=ACO8aUAM6pV8jmw" -O "checkpoints.zip"
+unzip checkpoints.zip
+cp -f checkpoints/swin_large_patch4_window12_384_22kto1k.pth model_zoo/swin
diff --git a/final-project/get_dataset.sh b/final-project/get_dataset.sh
new file mode 100644
index 0000000..0ba6d77
--- /dev/null
+++ b/final-project/get_dataset.sh
@@ -0,0 +1,8 @@
+# Download dataset
+wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1IYWPK8h9FWyo0p4-SCAatLGy0l5omQaw' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1IYWPK8h9FWyo0p4-SCAatLGy0l5omQaw" -O food_data.zip && rm -rf /tmp/cookies.txt
+
+# Unzip the downloaded zip file
+unzip ./food_data.zip
+
+# Remove the downloaded zip file
+rm ./food_data.zip
diff --git a/final-project/hw1_data b/final-project/hw1_data
new file mode 120000
index 0000000..8212527
--- /dev/null
+++ b/final-project/hw1_data
@@ -0,0 +1 @@
+/home/r09021/DLCV_110FALL/hw1-leo870823/hw1_data/p1_data
\ No newline at end of file
diff --git a/final-project/model_zoo/BBN/combiner.py b/final-project/model_zoo/BBN/combiner.py
new file mode 100644
index 0000000..8e024cd
--- /dev/null
+++ b/final-project/model_zoo/BBN/combiner.py
@@ -0,0 +1,77 @@
+import numpy as np
+import torch, math
+
+
+class Combiner:
+	def __init__(self, MaxEpoch,model_type, device):
+		self.type = model_type
+		self.device = device
+		self.epoch_number = MaxEpoch
+		self.func = torch.nn.Softmax(dim=1)
+		self.initilize_all_parameters()
+
+	def initilize_all_parameters(self):
+		self.alpha = 0.2
+		if self.epoch_number in [90, 180]:
+			self.div_epoch = 100 * (self.epoch_number // 100 + 1)
+		else:
+			self.div_epoch = self.epoch_number
+
+	def reset_epoch(self, epoch):
+		self.epoch = epoch
+	
+	def forward(self, model, criterion, image, label, meta, **kwargs):
+		return eval("self.{}".format(self.type))(
+			model, criterion, image, label, meta, **kwargs
+		)
+
+	def default(self, model, criterion, image, label, **kwargs):
+		image, label = image.to(self.device), label.to(self.device)
+		output = model(image)
+		loss = criterion(output, label)
+
+		return loss
+
+	def gen_weight(self):
+		for epoch in range(1,self.epoch_number+1):
+			self.reset_epoch(epoch)
+			l = 1 - ((self.epoch - 1) / self.div_epoch) ** 2  # parabolic decay
+			print("Epoch {:d}:{:5f} , {:5f}".format(epoch,l,1-l))
+
+	def bbn_mix(self, model, criterion, image, label, meta, **kwargs):
+
+		image_a, image_b = image.to(self.device), meta["sample_image"].to(self.device)
+		label_a, label_b = label.to(self.device), meta["sample_label"].to(self.device)
+
+		feature_a, feature_b = (
+			model(image_a, feature_cb=True),
+			model(image_b, feature_rb=True),
+		)
+
+		l = 1 - ((self.epoch - 1) / self.div_epoch) ** 2  # parabolic decay
+		#l = 0.5  # fix
+		#l = math.cos((self.epoch-1) / self.div_epoch * math.pi /2)   # cosine decay
+		#l = 1 - (1 - ((self.epoch - 1) / self.div_epoch) ** 2) * 1  # parabolic increment
+		#l = 1 - (self.epoch-1) / self.div_epoch  # linear decay
+		#l = np.random.beta(self.alpha, self.alpha) # beta distribution
+		#l = 1 if self.epoch <= 120 else 0  # seperated stage
+
+		mixed_feature = 2 * torch.cat((l * feature_a, (1-l) * feature_b), dim=1)
+		output = model(mixed_feature, classifier_flag=True)
+		loss = l * criterion(output, label_a) + (1 - l) * criterion(output, label_b)
+
+		#now_result = torch.argmax(self.func(output), 1)
+		#now_acc = (
+		#        l * accuracy(now_result.cpu().numpy(), label_a.cpu().numpy())[0]
+		#        + (1 - l) * accuracy(now_result.cpu().numpy(), label_b.cpu().numpy())[0]
+		#)
+
+
+		return loss
+
+
+if __name__ == "__main__":
+	C = Combiner(MaxEpoch = 10,
+                 model_type = "QQ",
+                 device = "cuda")
+	C.gen_weight()
diff --git a/final-project/model_zoo/BBN/network.py b/final-project/model_zoo/BBN/network.py
new file mode 100644
index 0000000..ee716dd
--- /dev/null
+++ b/final-project/model_zoo/BBN/network.py
@@ -0,0 +1,138 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from ..pytorch_pretrained_vit import ViT
+
+
+#################
+# Common Module
+#################
+class GAP(nn.Module):
+	"""Global Average pooling
+		Widely used in ResNet, Inception, DenseNet, etc.
+	 """
+
+	def __init__(self):
+		super(GAP, self).__init__()
+		self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
+
+	def forward(self, x):
+		x = self.avgpool(x)
+		#         x = x.view(x.shape[0], -1)
+		return x
+
+class Identity(nn.Module):
+	def __init__(self):
+		super(Identity, self).__init__()
+		
+	def forward(self, x):
+		return x
+
+
+##################
+# BNN architecture
+##################
+class BNNetwork(nn.Module):
+	def __init__(self, backbone_model,num_classes=1000,num_features=4096,mode="swin"):
+		super(BNNetwork, self).__init__()
+		self.num_classes = num_classes
+		if mode == "swin":
+			####################
+			# SWIN backbone
+			####################
+			self.backbone = backbone_model
+			self.module = Identity()
+			print(self.backbone)
+		elif mode == "ViT" :
+			####################
+			# ViT backbone
+			####################
+			model_name = "B_16_imagenet1k"
+			self.backbone = ViT(model_name, pretrained=True,num_classes=num_classes,image_size=324)
+			#self.backbone.norm = Identity()
+			#self.backbone.fc = Identity()
+			self.module = GAP()
+			print(self.backbone)
+		elif mode == "ResNet50":
+			####################
+			# ResNet50 backbone
+			####################
+			self.backbone = backbone_model
+			self.module = GAP()
+			print(self.backbone)		
+		else:
+			print("invalid model mode QQ")
+		self.num_features = num_features
+		self.classifier = nn.Linear(self.num_features, self.num_classes, bias=True)
+		
+		## swin flag classifier
+		#self.swin_classifier=nn.Sequential(
+		#					 nn.Linear(self.num_features*2,num_classes)
+		#					 )
+
+
+	def forward(self, x, **kwargs):
+		if "feature_flag" in kwargs or "feature_cb" in kwargs or "feature_rb" in kwargs:
+			return self.extract_feature(x, **kwargs)
+		elif "classifier_flag" in kwargs:
+			return self.classifier(x)
+
+		x = self.backbone(x)
+		x = self.module(x)
+		x = x.view(x.shape[0], -1)
+
+		if "separate_classifier_flag" in kwargs:
+			return self.separate_classifier(x)
+		x = self.classifier(x)
+		return x
+
+
+	def extract_feature(self, x, **kwargs):
+		if len(kwargs) > 0:
+			x = self.backbone(x, **kwargs)
+		else:
+			x = self.backbone(x)
+		x = self.module(x)
+		x = x.view(x.shape[0], -1)
+
+		return x
+
+
+	def freeze_backbone(self):
+		print("Freezing backbone .......")
+		for p in self.backbone.parameters():
+			p.requires_grad = False
+
+
+	def load_backbone_model(self, backbone_path=""):
+		self.backbone.load_model(backbone_path)
+		print("Backbone has been loaded...")
+
+
+	def load_model(self, model_path):
+		pretrain_dict = torch.load(
+			model_path #, map_location="cpu" if self.cfg.CPU_MODE else "cuda"
+		)
+		pretrain_dict = pretrain_dict['state_dict'] if 'state_dict' in pretrain_dict else pretrain_dict
+		model_dict = self.state_dict()
+		from collections import OrderedDict
+		new_dict = OrderedDict()
+		for k, v in pretrain_dict.items():
+			if k.startswith("module"):
+				new_dict[k[7:]] = v
+			else:
+				new_dict[k] = v
+		model_dict.update(new_dict)
+		self.load_state_dict(model_dict)
+		print("Model has been loaded...")
+
+	def separate_classifier(self,fcfb):
+		# weight of FC
+		feature_len = self.classifier.weight.shape[1]//2
+		Wc = torch.permute(self.classifier.weight[:,:feature_len],dims=(1,0))
+		Wb = torch.permute(self.classifier.weight[:,feature_len:],dims=(1,0))
+		Bias = self.classifier.bias #WcWb[1][1]
+		# y=xW^T+bias
+		o_Wcfc = torch.matmul(fcfb[:,:feature_len],Wc)+0.5*Bias
+		o_Wbfb = torch.matmul(fcfb[:,feature_len:],Wb)+0.5*Bias
+		return  o_Wcfc,o_Wbfb
diff --git a/final-project/model_zoo/BBN/resnet.py b/final-project/model_zoo/BBN/resnet.py
new file mode 100644
index 0000000..2554a02
--- /dev/null
+++ b/final-project/model_zoo/BBN/resnet.py
@@ -0,0 +1,296 @@
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+import math
+
+model_urls = {
+	"resnet18": "https://s3.amazonaws.com/pytorch/models/resnet18-5c106cde.pth",
+	"resnet34": "https://s3.amazonaws.com/pytorch/models/resnet34-333f7ec4.pth",
+	"resnet50": "https://s3.amazonaws.com/pytorch/models/resnet50-19c8e357.pth",
+	"resnet101": "https://s3.amazonaws.com/pytorch/models/resnet101-5d3b4d8f.pth",
+	"resnet152": "https://s3.amazonaws.com/pytorch/models/resnet152-b121ed2d.pth",
+}
+
+
+class BasicBlock(nn.Module):
+	expansion = 1
+
+	def __init__(self, inplanes, planes, stride=1):
+		super(BasicBlock, self).__init__()
+		self.conv1 = nn.Conv2d(
+			inplanes, planes, kernel_size=3, padding=1, bias=False, stride=stride
+		)
+		self.bn1 = nn.BatchNorm2d(planes)
+		self.relu = nn.ReLU(inplace=True)
+		self.conv2 = nn.Conv2d(
+			planes, planes, kernel_size=3, padding=1, bias=False, stride=1
+		)
+		self.bn2 = nn.BatchNorm2d(planes)
+		# self.downsample = downsample
+		if stride != 1 or self.expansion * planes != inplanes:
+			self.downsample = nn.Sequential(
+				nn.Conv2d(
+					inplanes,
+					self.expansion * planes,
+					kernel_size=1,
+					stride=stride,
+					bias=False,
+				),
+				nn.BatchNorm2d(self.expansion * planes),
+			)
+		else:
+			self.downsample = None
+
+	def forward(self, x):
+		identity = x
+
+		out = self.conv1(x)
+		out = self.bn1(out)
+		out = self.relu(out)
+
+		out = self.conv2(out)
+		out = self.bn2(out)
+
+		if self.downsample is not None:
+			identity = self.downsample(x)
+
+		out += identity
+		out = self.relu(out)
+
+		return out
+
+
+class BottleNeck(nn.Module):
+
+	expansion = 4
+
+	def __init__(self, inplanes, planes, stride=1):
+		super(BottleNeck, self).__init__()
+		self.conv1 = nn.Conv2d(inplanes, planes, kernel_size=1, bias=False)
+		self.bn1 = nn.BatchNorm2d(planes)
+		self.relu1 = nn.ReLU(True)
+		self.conv2 = nn.Conv2d(
+			planes, planes, kernel_size=3, stride=stride, padding=1, bias=False
+		)
+		self.bn2 = nn.BatchNorm2d(planes)
+		self.relu2 = nn.ReLU(True)
+		self.conv3 = nn.Conv2d(
+			planes, planes * self.expansion, kernel_size=1, bias=False
+		)
+		self.bn3 = nn.BatchNorm2d(planes * self.expansion)
+		if stride != 1 or self.expansion * planes != inplanes:
+			self.downsample = nn.Sequential(
+				nn.Conv2d(
+					inplanes,
+					self.expansion * planes,
+					kernel_size=1,
+					stride=stride,
+					bias=False,
+				),
+				nn.BatchNorm2d(self.expansion * planes),
+			)
+		else:
+			self.downsample = None
+		self.relu = nn.ReLU(True)
+
+	def forward(self, x):
+		out = self.relu1(self.bn1(self.conv1(x)))
+
+		out = self.relu2(self.bn2(self.conv2(out)))
+
+		out = self.bn3(self.conv3(out))
+
+		if self.downsample != None:
+			residual = self.downsample(x)
+		else:
+			residual = x
+		out = out + residual
+		out = self.relu(out)
+		return out
+
+class ResNet(nn.Module):
+	def __init__(
+		self,
+		cfg,
+		block_type,
+		num_blocks,
+		last_layer_stride=2,
+	):
+		super(ResNet, self).__init__()
+		self.inplanes = 64
+		self.block = block_type
+		self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
+		self.bn1 = nn.BatchNorm2d(64)
+		self.relu = nn.ReLU(True)
+		self.pool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
+
+		self.layer1 = self._make_layer(num_blocks[0], 64)
+		self.layer2 = self._make_layer(
+			num_blocks[1], 128, stride=2
+		)
+		self.layer3 = self._make_layer(
+			num_blocks[2], 256, stride=2
+		)
+		self.layer4 = self._make_layer(
+			num_blocks[3],
+			512,
+			stride=last_layer_stride,
+		)
+
+	def load_model(self, pretrain):
+		print("Loading Backbone pretrain model from {}......".format(pretrain))
+		model_dict = self.state_dict()
+		pretrain_dict = torch.load(pretrain)
+		pretrain_dict = pretrain_dict["state_dict"] if "state_dict" in pretrain_dict else pretrain_dict
+		from collections import OrderedDict
+
+		new_dict = OrderedDict()
+		for k, v in pretrain_dict.items():
+			if k.startswith("module"):
+				k = k[7:]
+			if "fc" not in k and "classifier" not in k:
+				k = k.replace("backbone.", "")
+				new_dict[k] = v
+
+		model_dict.update(new_dict)
+		self.load_state_dict(model_dict)
+		print("Backbone model has been loaded......")
+
+	def _make_layer(self, num_block, planes, stride=1):
+		strides = [stride] + [1] * (num_block - 1)
+		layers = []
+		for now_stride in strides:
+			layers.append(
+				self.block(
+					self.inplanes, planes, stride=now_stride
+				)
+			)
+			self.inplanes = planes * self.block.expansion
+		return nn.Sequential(*layers)
+
+	def forward(self, x):
+		out = self.conv1(x)
+		out = self.bn1(out)
+		out = self.relu(out)
+		out = self.pool(out)
+
+		out = self.layer1(out)
+		out = self.layer2(out)
+		out = self.layer3(out)
+		out = self.layer4(out)
+
+		return out
+
+class BBN_ResNet(nn.Module):
+	def __init__(
+		self,
+		cfg,
+		block_type,
+		num_blocks,
+		last_layer_stride=2,
+	):
+		super(BBN_ResNet, self).__init__()
+		self.inplanes = 64
+		self.block = block_type
+
+		self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
+		self.bn1 = nn.BatchNorm2d(64)
+		self.relu = nn.ReLU(True)
+		self.pool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
+
+		self.layer1 = self._make_layer(num_blocks[0], 64)
+		self.layer2 = self._make_layer(num_blocks[1], 128, stride=2)
+		self.layer3 = self._make_layer(num_blocks[2], 256, stride=2)
+		self.layer4 = self._make_layer(num_blocks[3] - 1, 512, stride=last_layer_stride)
+
+		self.cb_block = self.block(self.inplanes, self.inplanes // 4, stride=1)
+		self.rb_block = self.block(self.inplanes, self.inplanes // 4, stride=1)
+
+	def load_model(self, pretrain):
+		print("Loading Backbone pretrain model from {}......".format(pretrain))
+		model_dict = self.state_dict()
+		pretrain_dict = torch.load(pretrain)
+		pretrain_dict = pretrain_dict["state_dict"] if "state_dict" in pretrain_dict else pretrain_dict
+		from collections import OrderedDict
+
+		new_dict = OrderedDict()
+		for k, v in pretrain_dict.items():
+			if k.startswith("module"):
+				k = k[7:]
+			if "fc" not in k and "classifier" not in k:
+				k = k.replace("backbone.", "")
+				new_dict[k] = v
+
+		model_dict.update(new_dict)
+		self.load_state_dict(model_dict)
+		print("Backbone model has been loaded......")
+
+	def _make_layer(self, num_block, planes, stride=1):
+		strides = [stride] + [1] * (num_block - 1)
+		layers = []
+		for now_stride in strides:
+			layers.append(self.block(self.inplanes, planes, stride=now_stride))
+			self.inplanes = planes * self.block.expansion
+		return nn.Sequential(*layers)
+
+	def forward(self, x, **kwargs):
+		out = self.conv1(x)
+		out = self.bn1(out)
+		out = self.relu(out)
+		out = self.pool(out)
+
+		out = self.layer1(out)
+		out = self.layer2(out)
+		out = self.layer3(out)
+		out = self.layer4(out)
+
+		# TODO handle non-fixed weight
+		if "feature_cb" in kwargs:
+			out = self.cb_block(out)
+			return out
+		elif "feature_rb" in kwargs:
+			out = self.rb_block(out)
+			return out
+		out1 = self.cb_block(out)
+		out2 = self.rb_block(out)
+		out = torch.cat((out1, out2), dim=1)
+
+		return out
+
+def res50(
+	cfg,
+	pretrain=True,
+	pretrained_model="/data/Data/pretrain_models/resnet50-19c8e357.pth",
+	last_layer_stride=2,
+):
+	resnet = ResNet(
+		cfg,
+		BottleNeck,
+		[3, 4, 6, 3],
+		last_layer_stride=last_layer_stride,
+	)
+	if pretrain and pretrained_model != "":
+		resnet.load_model(pretrain=pretrained_model)
+	else:
+		print("Choose to train from scratch")
+	return resnet
+
+def bbn_res50(
+	cfg,
+	pretrain=True,
+	pretrained_model="/data/Data/pretrain_models/resnet50-19c8e357.pth",
+	last_layer_stride=2,
+):
+	resnet = BBN_ResNet(
+		cfg,
+		BottleNeck,
+		[3, 4, 6, 4],
+		last_layer_stride=last_layer_stride,
+	)
+	print(resnet)
+	if pretrain and pretrained_model != "":
+		resnet.load_model(pretrain=pretrained_model)
+	else:
+		print("Choose to train from scratch")
+	return resnet
+
diff --git a/final-project/model_zoo/pytorch_pretrained_vit/__init__.py b/final-project/model_zoo/pytorch_pretrained_vit/__init__.py
new file mode 100644
index 0000000..2c1dd69
--- /dev/null
+++ b/final-project/model_zoo/pytorch_pretrained_vit/__init__.py
@@ -0,0 +1,5 @@
+__version__ = "0.0.7"
+
+from .model import ViT,ViT_revised
+from .configs import *
+from .utils import load_pretrained_weights
diff --git a/final-project/model_zoo/pytorch_pretrained_vit/configs.py b/final-project/model_zoo/pytorch_pretrained_vit/configs.py
new file mode 100644
index 0000000..80c0481
--- /dev/null
+++ b/final-project/model_zoo/pytorch_pretrained_vit/configs.py
@@ -0,0 +1,105 @@
+"""configs.py - ViT model configurations, based on:
+https://github.com/google-research/vision_transformer/blob/master/vit_jax/configs.py
+"""
+
+def get_base_config():
+    """Base ViT config ViT"""
+    return dict(
+      dim=768,
+      ff_dim=3072,
+      num_heads=12,
+      num_layers=12,
+      attention_dropout_rate=0.0,
+      dropout_rate=0.1,
+      representation_size=768,
+      classifier='token'
+    )
+
+def get_b16_config():
+    """Returns the ViT-B/16 configuration."""
+    config = get_base_config()
+    config.update(dict(patches=(16, 16)))
+    return config
+
+def get_b32_config():
+    """Returns the ViT-B/32 configuration."""
+    config = get_b16_config()
+    config.update(dict(patches=(32, 32)))
+    return config
+
+def get_l16_config():
+    """Returns the ViT-L/16 configuration."""
+    config = get_base_config()
+    config.update(dict(
+        patches=(16, 16),
+        dim=1024,
+        ff_dim=4096,
+        num_heads=16,
+        num_layers=24,
+        attention_dropout_rate=0.0,
+        dropout_rate=0.1,
+        representation_size=1024
+    ))
+    return config
+
+def get_l32_config():
+    """Returns the ViT-L/32 configuration."""
+    config = get_l16_config()
+    config.update(dict(patches=(32, 32)))
+    return config
+
+def drop_head_variant(config):
+    config.update(dict(representation_size=None))
+    return config
+
+
+PRETRAINED_MODELS = {
+    'B_16': {
+      'config': get_b16_config(),
+      'num_classes': 21843,
+      'image_size': (224, 224),
+      'url': "https://github.com/lukemelas/PyTorch-Pretrained-ViT/releases/download/0.0.2/B_16.pth"
+    },
+    'B_32': {
+      'config': get_b32_config(),
+      'num_classes': 21843,
+      'image_size': (224, 224),
+      'url': "https://github.com/lukemelas/PyTorch-Pretrained-ViT/releases/download/0.0.2/B_32.pth"
+    },
+    'L_16': {
+      'config': get_l16_config(),
+      'num_classes': 21843,
+      'image_size': (224, 224),
+      'url': None
+    },
+    'L_32': {
+      'config': get_l32_config(),
+      'num_classes': 21843,
+      'image_size': (224, 224),
+      'url': "https://github.com/lukemelas/PyTorch-Pretrained-ViT/releases/download/0.0.2/L_32.pth"
+    },
+    'B_16_imagenet1k': {
+      'config': drop_head_variant(get_b16_config()),
+      'num_classes': 1000,
+      'image_size': (384, 384),
+      'url': "https://github.com/lukemelas/PyTorch-Pretrained-ViT/releases/download/0.0.2/B_16_imagenet1k.pth"
+    },
+    'B_32_imagenet1k': {
+      'config': drop_head_variant(get_b32_config()),
+      'num_classes': 1000,
+      'image_size': (384, 384),
+      'url': "https://github.com/lukemelas/PyTorch-Pretrained-ViT/releases/download/0.0.2/B_32_imagenet1k.pth"
+    },
+    'L_16_imagenet1k': {
+      'config': drop_head_variant(get_l16_config()),
+      'num_classes': 1000,
+      'image_size': (384, 384),
+      'url': "https://github.com/lukemelas/PyTorch-Pretrained-ViT/releases/download/0.0.2/L_16_imagenet1k.pth"
+    },
+    'L_32_imagenet1k': {
+      'config': drop_head_variant(get_l32_config()),
+      'num_classes': 1000,
+      'image_size': (384, 384),
+      'url': "https://github.com/lukemelas/PyTorch-Pretrained-ViT/releases/download/0.0.2/L_32_imagenet1k.pth"
+    },
+}
diff --git a/final-project/model_zoo/pytorch_pretrained_vit/model.py b/final-project/model_zoo/pytorch_pretrained_vit/model.py
new file mode 100755
index 0000000..0e22585
--- /dev/null
+++ b/final-project/model_zoo/pytorch_pretrained_vit/model.py
@@ -0,0 +1,250 @@
+"""model.py - Model and module class for ViT.
+   They are built to mirror those in the official Jax implementation.
+"""
+
+from os import name
+from typing import Optional
+import torch
+from torch import nn
+from torch.nn import functional as F
+
+from .transformer import Transformer
+from .utils import load_pretrained_weights, as_tuple
+from .configs import PRETRAINED_MODELS
+
+
+class PositionalEmbedding1D(nn.Module):
+    """Adds (optionally learned) positional embeddings to the inputs."""
+
+    def __init__(self, seq_len, dim):
+        super().__init__()
+        self.pos_embedding = nn.Parameter(torch.zeros(1, seq_len, dim))
+    
+    def forward(self, x):
+        """Input has shape `(batch_size, seq_len, emb_dim)`"""
+        return x + self.pos_embedding, self.pos_embedding
+
+
+class ViT(nn.Module):
+    """
+    Args:
+        name (str): Model name, e.g. 'B_16'
+        pretrained (bool): Load pretrained weights
+        in_channels (int): Number of channels in input data
+        num_classes (int): Number of classes, default 1000
+
+    References:
+        [1] https://openreview.net/forum?id=YicbFdNTTy
+    """
+
+    def __init__(
+        self, 
+        name: Optional[str] = None, 
+        pretrained: bool = False, 
+        patches: int = 16,
+        dim: int = 768,
+        ff_dim: int = 3072,
+        num_heads: int = 12,
+        num_layers: int = 12,
+        attention_dropout_rate: float = 0.0,
+        dropout_rate: float = 0.1,
+        representation_size: Optional[int] = None,
+        load_repr_layer: bool = False,
+        classifier: str = 'token',
+        positional_embedding: str = '1d',
+        in_channels: int = 3, 
+        image_size: Optional[int] = None,
+        num_classes: Optional[int] = None,
+    ):
+        super().__init__()
+
+        # Configuration
+        if name is None:
+            check_msg = 'must specify name of pretrained model'
+            assert not pretrained, check_msg
+            assert not resize_positional_embedding, check_msg
+            if num_classes is None:
+                num_classes = 1000
+            if image_size is None:
+                image_size = 384
+        else:  # load pretrained model
+            assert name in PRETRAINED_MODELS.keys(), \
+                'name should be in: ' + ', '.join(PRETRAINED_MODELS.keys())
+            config = PRETRAINED_MODELS[name]['config']
+            patches = config['patches']
+            dim = config['dim']
+            ff_dim = config['ff_dim']
+            num_heads = config['num_heads']
+            num_layers = config['num_layers']
+            attention_dropout_rate = config['attention_dropout_rate']
+            dropout_rate = config['dropout_rate']
+            representation_size = config['representation_size']
+            classifier = config['classifier']
+            if image_size is None:
+                image_size = PRETRAINED_MODELS[name]['image_size']
+            if num_classes is None:
+                num_classes = PRETRAINED_MODELS[name]['num_classes']
+        self.image_size = image_size                
+
+        # Image and patch sizes
+        h, w = as_tuple(image_size)  # image sizes
+        fh, fw = as_tuple(patches)  # patch sizes
+        gh, gw = h // fh, w // fw  # number of patches
+        seq_len = gh * gw
+
+        # Patch embedding
+        self.patch_embedding = nn.Conv2d(in_channels, dim, kernel_size=(fh, fw), stride=(fh, fw))
+
+        # Class token
+        if classifier == 'token':
+            self.class_token = nn.Parameter(torch.zeros(1, 1, dim))
+            seq_len += 1
+        
+        # Positional embedding
+        if positional_embedding.lower() == '1d':
+            self.positional_embedding = PositionalEmbedding1D(seq_len, dim)
+        else:
+            raise NotImplementedError()
+        
+        # Transformer
+        self.transformer = Transformer(num_layers=num_layers, dim=dim, num_heads=num_heads, 
+                                       ff_dim=ff_dim, dropout=dropout_rate)
+        
+        # Representation layer
+        if representation_size and load_repr_layer:
+            self.pre_logits = nn.Linear(dim, representation_size)
+            pre_logits_size = representation_size
+        else:
+            pre_logits_size = dim
+
+        # Classifier head
+        self.norm = nn.LayerNorm(pre_logits_size, eps=1e-6)
+        self.fc = nn.Linear(pre_logits_size, num_classes)
+
+        # Initialize weights
+        self.init_weights()
+        
+        # Load pretrained model
+        if pretrained:
+            pretrained_num_channels = 3
+            pretrained_num_classes = PRETRAINED_MODELS[name]['num_classes']
+            pretrained_image_size = PRETRAINED_MODELS[name]['image_size']
+            load_pretrained_weights(
+                self, name, 
+                load_first_conv=(in_channels == pretrained_num_channels),
+                load_fc=(num_classes == pretrained_num_classes),
+                load_repr_layer=load_repr_layer,
+                resize_positional_embedding=(image_size != pretrained_image_size),
+            )
+        
+    @torch.no_grad()
+    def init_weights(self):
+        def _init(m):
+            if isinstance(m, nn.Linear):
+                nn.init.xavier_uniform_(m.weight)  # _trunc_normal(m.weight, std=0.02)  # from .initialization import _trunc_normal
+                if hasattr(m, 'bias') and m.bias is not None:
+                    nn.init.normal_(m.bias, std=1e-6)  # nn.init.constant(m.bias, 0)
+        self.apply(_init)
+        nn.init.constant_(self.fc.weight, 0)
+        nn.init.constant_(self.fc.bias, 0)
+        nn.init.normal_(self.positional_embedding.pos_embedding, std=0.02)  # _trunc_normal(self.positional_embedding.pos_embedding, std=0.02)
+        nn.init.constant_(self.class_token, 0)
+
+    def forward(self, x):
+        """Breaks image into patches, applies transformer, applies MLP head.
+
+        Args:
+            x (tensor): `b,c,fh,fw`
+        """
+        b, c, fh, fw = x.shape
+        x = self.patch_embedding(x)  # b,d,gh,gw
+        x = x.flatten(2).transpose(1, 2)  # b,gh*gw,d
+        if hasattr(self, 'class_token'):
+            x = torch.cat((self.class_token.expand(b, -1, -1), x), dim=1)  # b,gh*gw+1,d
+        if hasattr(self, 'positional_embedding'): 
+            x,pos = self.positional_embedding(x)  # b,gh*gw+1,d 
+        x = self.transformer(x)  # b,gh*gw+1,d
+        if hasattr(self, 'pre_logits'):
+            x = self.pre_logits(x)
+            x = torch.tanh(x)
+        if hasattr(self, 'fc'):
+            x = self.norm(x)[:, 0]  # b,d
+            x = self.fc(x)  # b,num_classes
+        return x# ,pos
+
+class ViT_revised(ViT):
+    """
+    Args:
+        name (str): Model name, e.g. 'B_16'
+        pretrained (bool): Load pretrained weights
+        in_channels (int): Number of channels in input data
+        num_classes (int): Number of classes, default 1000
+
+    References:
+        [1] https://openreview.net/forum?id=YicbFdNTTy
+    """
+
+    def __init__(
+        self, 
+        name: Optional[str] = None, 
+        pretrained: bool = False, 
+        patches: int = 16,
+        dim: int = 768,
+        ff_dim: int = 3072,
+        num_heads: int = 12,
+        num_layers: int = 12,
+        attention_dropout_rate: float = 0.0,
+        dropout_rate: float = 0.1,
+        representation_size: Optional[int] = None,
+        load_repr_layer: bool = False,
+        classifier: str = 'token',
+        positional_embedding: str = '1d',
+        in_channels: int = 3, 
+        image_size: Optional[int] = None,
+        num_classes: Optional[int] = None,
+    ):
+        super().__init__(name=name,
+                         pretrained=pretrained,
+                         patches=patches,
+                         dim=dim,
+                         ff_dim=ff_dim,
+                         num_heads=num_heads,
+                         num_layers=num_layers,
+                         dropout_rate=dropout_rate,
+                         representation_size = representation_size,
+                         load_repr_layer = load_repr_layer,
+                         classifier = classifier,
+                         positional_embedding=positional_embedding,
+                         in_channels = in_channels,
+                         image_size = image_size, 
+                         num_classes = num_classes
+                         )
+
+
+        # Representation layer
+        if representation_size and load_repr_layer:
+            self.pre_logits = nn.Linear(dim, representation_size)
+            pre_logits_size = representation_size
+        else:
+            pre_logits_size = dim
+
+
+        # Revised Classifier head
+        self.fc = nn.Linear(pre_logits_size, 1000)
+        self.fc2 = nn.Linear(1000,num_classes)
+        self.relu =nn.ReLU()
+        # model initializtion
+        # self.init_weights()
+    def forward(self,x):
+        x = super().forward(x)
+        x = self.relu(x)
+        x = self.fc2(x)
+        x = self.relu(x)
+        return x
+
+if name == "__main__":
+    # Model
+    model_name = 'B_16_imagenet1k'
+    #model_ViT = ViT(model_name, pretrained=True,num_classes=37,image_size=args.image_size)
+    model_ViT = ViT_revised(model_name, pretrained=True,num_classes=37,image_size=384)
+    print(model_ViT)
\ No newline at end of file
diff --git a/final-project/model_zoo/pytorch_pretrained_vit/transformer.py b/final-project/model_zoo/pytorch_pretrained_vit/transformer.py
new file mode 100644
index 0000000..a2deddb
--- /dev/null
+++ b/final-project/model_zoo/pytorch_pretrained_vit/transformer.py
@@ -0,0 +1,102 @@
+"""
+Adapted from https://github.com/lukemelas/simple-bert
+"""
+ 
+import numpy as np
+from torch import nn
+from torch import Tensor 
+from torch.nn import functional as F
+
+
+def split_last(x, shape):
+    "split the last dimension to given shape"
+    shape = list(shape)
+    assert shape.count(-1) <= 1
+    if -1 in shape:
+        shape[shape.index(-1)] = int(x.size(-1) / -np.prod(shape))
+    return x.view(*x.size()[:-1], *shape)
+
+
+def merge_last(x, n_dims):
+    "merge the last n_dims to a dimension"
+    s = x.size()
+    assert n_dims > 1 and n_dims < len(s)
+    return x.view(*s[:-n_dims], -1)
+
+
+class MultiHeadedSelfAttention(nn.Module):
+    """Multi-Headed Dot Product Attention"""
+    def __init__(self, dim, num_heads, dropout):
+        super().__init__()
+        self.proj_q = nn.Linear(dim, dim)
+        self.proj_k = nn.Linear(dim, dim)
+        self.proj_v = nn.Linear(dim, dim)
+        self.drop = nn.Dropout(dropout)
+        self.n_heads = num_heads
+        self.scores = None # for visualization
+
+    def forward(self, x, mask):
+        """
+        x, q(query), k(key), v(value) : (B(batch_size), S(seq_len), D(dim))
+        mask : (B(batch_size) x S(seq_len))
+        * split D(dim) into (H(n_heads), W(width of head)) ; D = H * W
+        """
+        # (B, S, D) -proj-> (B, S, D) -split-> (B, S, H, W) -trans-> (B, H, S, W)
+        q, k, v = self.proj_q(x), self.proj_k(x), self.proj_v(x)
+        q, k, v = (split_last(x, (self.n_heads, -1)).transpose(1, 2) for x in [q, k, v])
+        # (B, H, S, W) @ (B, H, W, S) -> (B, H, S, S) -softmax-> (B, H, S, S)
+        scores = q @ k.transpose(-2, -1) / np.sqrt(k.size(-1))
+        if mask is not None:
+            mask = mask[:, None, None, :].float()
+            scores -= 10000.0 * (1.0 - mask)
+        scores = self.drop(F.softmax(scores, dim=-1))
+        # (B, H, S, S) @ (B, H, S, W) -> (B, H, S, W) -trans-> (B, S, H, W)
+        h = (scores @ v).transpose(1, 2).contiguous()
+        # -merge-> (B, S, D)
+        h = merge_last(h, 2)
+        self.scores = scores
+        return h
+
+
+class PositionWiseFeedForward(nn.Module):
+    """FeedForward Neural Networks for each position"""
+    def __init__(self, dim, ff_dim):
+        super().__init__()
+        self.fc1 = nn.Linear(dim, ff_dim)
+        self.fc2 = nn.Linear(ff_dim, dim)
+
+    def forward(self, x):
+        # (B, S, D) -> (B, S, D_ff) -> (B, S, D)
+        return self.fc2(F.gelu(self.fc1(x)))
+
+
+class Block(nn.Module):
+    """Transformer Block"""
+    def __init__(self, dim, num_heads, ff_dim, dropout):
+        super().__init__()
+        self.attn = MultiHeadedSelfAttention(dim, num_heads, dropout)
+        self.proj = nn.Linear(dim, dim)
+        self.norm1 = nn.LayerNorm(dim, eps=1e-6)
+        self.pwff = PositionWiseFeedForward(dim, ff_dim)
+        self.norm2 = nn.LayerNorm(dim, eps=1e-6)
+        self.drop = nn.Dropout(dropout)
+
+    def forward(self, x, mask):
+        h = self.drop(self.proj(self.attn(self.norm1(x), mask)))
+        x = x + h
+        h = self.drop(self.pwff(self.norm2(x)))
+        x = x + h
+        return x
+
+
+class Transformer(nn.Module):
+    """Transformer with Self-Attentive Blocks"""
+    def __init__(self, num_layers, dim, num_heads, ff_dim, dropout):
+        super().__init__()
+        self.blocks = nn.ModuleList([
+            Block(dim, num_heads, ff_dim, dropout) for _ in range(num_layers)])
+
+    def forward(self, x, mask=None):
+        for block in self.blocks:
+            x = block(x, mask)
+        return x
diff --git a/final-project/model_zoo/pytorch_pretrained_vit/utils.py b/final-project/model_zoo/pytorch_pretrained_vit/utils.py
new file mode 100755
index 0000000..da40561
--- /dev/null
+++ b/final-project/model_zoo/pytorch_pretrained_vit/utils.py
@@ -0,0 +1,116 @@
+"""utils.py - Helper functions
+"""
+
+import numpy as np
+import torch
+from torch.utils import model_zoo
+
+from .configs import PRETRAINED_MODELS
+
+
+def load_pretrained_weights(
+    model, 
+    model_name=None, 
+    weights_path=None, 
+    load_first_conv=True, 
+    load_fc=True, 
+    load_repr_layer=False,
+    resize_positional_embedding=False,
+    verbose=True,
+    strict=True,
+):
+    """Loads pretrained weights from weights path or download using url.
+    Args:
+        model (Module): Full model (a nn.Module)
+        model_name (str): Model name (e.g. B_16)
+        weights_path (None or str):
+            str: path to pretrained weights file on the local disk.
+            None: use pretrained weights downloaded from the Internet.
+        load_first_conv (bool): Whether to load patch embedding.
+        load_fc (bool): Whether to load pretrained weights for fc layer at the end of the model.
+        resize_positional_embedding=False,
+        verbose (bool): Whether to print on completion
+    """
+    assert bool(model_name) ^ bool(weights_path), 'Expected exactly one of model_name or weights_path'
+    
+    # Load or download weights
+    if weights_path is None:
+        url = PRETRAINED_MODELS[model_name]['url']
+        if url:
+            state_dict = model_zoo.load_url(url)
+        else:
+            raise ValueError(f'Pretrained model for {model_name} has not yet been released')
+    else:
+        state_dict = torch.load(weights_path)
+
+    # Modifications to load partial state dict
+    expected_missing_keys = []
+    if not load_first_conv and 'patch_embedding.weight' in state_dict:
+        expected_missing_keys += ['patch_embedding.weight', 'patch_embedding.bias']
+    if not load_fc and 'fc.weight' in state_dict:
+        expected_missing_keys += ['fc.weight', 'fc.bias']
+    if not load_repr_layer and 'pre_logits.weight' in state_dict:
+        expected_missing_keys += ['pre_logits.weight', 'pre_logits.bias']
+    for key in expected_missing_keys:
+        state_dict.pop(key)
+
+    # Change size of positional embeddings
+    if resize_positional_embedding: 
+        posemb = state_dict['positional_embedding.pos_embedding']
+        posemb_new = model.state_dict()['positional_embedding.pos_embedding']
+        state_dict['positional_embedding.pos_embedding'] = \
+            resize_positional_embedding_(posemb=posemb, posemb_new=posemb_new, 
+                has_class_token=hasattr(model, 'class_token'))
+        maybe_print('Resized positional embeddings from {} to {}'.format(
+                    posemb.shape, posemb_new.shape), verbose)
+
+    # Load state dict
+    ret = model.load_state_dict(state_dict, strict=False)
+    if strict:
+        assert set(ret.missing_keys) == set(expected_missing_keys), \
+            'Missing keys when loading pretrained weights: {}'.format(ret.missing_keys)
+        assert not ret.unexpected_keys, \
+            'Missing keys when loading pretrained weights: {}'.format(ret.unexpected_keys)
+        maybe_print('Loaded pretrained weights.', verbose)
+    else:
+        maybe_print('Missing keys when loading pretrained weights: {}'.format(ret.missing_keys), verbose)
+        maybe_print('Unexpected keys when loading pretrained weights: {}'.format(ret.unexpected_keys), verbose)
+        return ret
+
+
+def maybe_print(s: str, flag: bool):
+    if flag:
+        print(s)
+
+
+def as_tuple(x):
+    return x if isinstance(x, tuple) else (x, x)
+
+
+def resize_positional_embedding_(posemb, posemb_new, has_class_token=True):
+    """Rescale the grid of position embeddings in a sensible manner"""
+    from scipy.ndimage import zoom
+
+    # Deal with class token
+    ntok_new = posemb_new.shape[1]
+    if has_class_token:  # this means classifier == 'token'
+        posemb_tok, posemb_grid = posemb[:, :1], posemb[0, 1:]
+        ntok_new -= 1
+    else:
+        posemb_tok, posemb_grid = posemb[:, :0], posemb[0]
+
+    # Get old and new grid sizes
+    gs_old = int(np.sqrt(len(posemb_grid)))
+    gs_new = int(np.sqrt(ntok_new))
+    posemb_grid = posemb_grid.reshape(gs_old, gs_old, -1)
+
+    # Rescale grid
+    zoom_factor = (gs_new / gs_old, gs_new / gs_old, 1)
+    posemb_grid = zoom(posemb_grid, zoom_factor, order=1)
+    posemb_grid = posemb_grid.reshape(1, gs_new * gs_new, -1)
+    posemb_grid = torch.from_numpy(posemb_grid)
+
+    # Deal with class token and return
+    posemb = torch.cat([posemb_tok, posemb_grid], dim=1)
+    return posemb
+
diff --git a/final-project/model_zoo/pytorch_resnest/.github/workflows/pypi_nightly.yml b/final-project/model_zoo/pytorch_resnest/.github/workflows/pypi_nightly.yml
new file mode 100644
index 0000000..aed6dfb
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/.github/workflows/pypi_nightly.yml
@@ -0,0 +1,31 @@
+# This workflows will upload a Python Package using Twine when a release is created
+# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries
+
+name: Pypi Nightly
+
+on:
+  schedule:
+    - cron: "0 12 * * *"
+
+jobs:
+  deploy:
+
+    runs-on: ubuntu-18.04
+
+    steps:
+    - uses: actions/checkout@master
+    - name: Set up Python
+      uses: actions/setup-python@v1
+      with:
+        python-version: '3.7'
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install setuptools wheel twine pypandoc
+    - name: Build and publish
+      env:
+        TWINE_USERNAME: ${{ secrets.pypi_username }}
+        TWINE_PASSWORD: ${{ secrets.pypi_password }}
+      run: |
+        python setup.py sdist bdist_wheel
+        twine upload dist/* --verbose
diff --git a/final-project/model_zoo/pytorch_resnest/.github/workflows/pypi_release.yml b/final-project/model_zoo/pytorch_resnest/.github/workflows/pypi_release.yml
new file mode 100644
index 0000000..51d673e
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/.github/workflows/pypi_release.yml
@@ -0,0 +1,32 @@
+# This workflows will upload a Python Package using Twine when a release is created
+# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries
+
+name: Pypi Release
+
+on:
+  release:
+    types: [created]
+
+jobs:
+  deploy:
+
+    runs-on: ubuntu-18.04
+
+    steps:
+    - uses: actions/checkout@master
+    - name: Set up Python
+      uses: actions/setup-python@v1
+      with:
+        python-version: '3.7'
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install setuptools wheel twine pypandoc
+    - name: Build and publish
+      env:
+        TWINE_USERNAME: ${{ secrets.pypi_username }}
+        TWINE_PASSWORD: ${{ secrets.pypi_password }}
+        RELEASE: 1
+      run: |
+        python setup.py sdist bdist_wheel
+        twine upload dist/* --verbose
diff --git a/final-project/model_zoo/pytorch_resnest/.github/workflows/unit_test.yml b/final-project/model_zoo/pytorch_resnest/.github/workflows/unit_test.yml
new file mode 100644
index 0000000..70a085f
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/.github/workflows/unit_test.yml
@@ -0,0 +1,36 @@
+# This workflow will install Python dependencies, run tests and lint with a single version of Python
+# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
+
+name: Unit Test
+
+on:
+  push:
+    branches: [ master ]
+  pull_request:
+    branches: [ master ]
+
+jobs:
+  build:
+
+    runs-on: ubuntu-latest
+
+    steps:
+      - uses: actions/checkout@v2
+      - uses: seanmiddleditch/gha-setup-ninja@master
+
+      - name: Set up Python
+        uses: actions/setup-python@v1
+        with:
+          python-version: 3.7
+
+      - name: Install package
+        run: |
+          python -m pip install --upgrade pip
+          pip install numpy -I
+          pip install pytest torch mxnet
+          pip install nose
+          pip install -e .
+            
+      - name: Run pytest
+        run: |
+          for f in tests/*.py; do python "$f"; done
diff --git a/final-project/model_zoo/pytorch_resnest/.gitignore b/final-project/model_zoo/pytorch_resnest/.gitignore
new file mode 100644
index 0000000..0db19f2
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/.gitignore
@@ -0,0 +1,132 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+pip-wheel-metadata/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+db.sqlite3-journal
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# IPython
+profile_default/
+ipython_config.py
+
+# pyenv
+.python-version
+
+# pipenv
+#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
+#   However, in case of collaboration, if having platform-specific dependencies or dependencies
+#   having no cross-platform support, pipenv may install dependencies that don't work, or not
+#   install all needed dependencies.
+#Pipfile.lock
+
+# PEP 582; used by e.g. github.com/David-OConnor/pyflow
+__pypackages__/
+
+# Celery stuff
+celerybeat-schedule
+celerybeat.pid
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+.dmypy.json
+dmypy.json
+
+# Pyre type checker
+.pyre/
+*.swp
+*.DS_Store
+version.py
diff --git a/final-project/model_zoo/pytorch_resnest/LICENSE b/final-project/model_zoo/pytorch_resnest/LICENSE
new file mode 100644
index 0000000..261eeb9
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/LICENSE
@@ -0,0 +1,201 @@
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
diff --git a/final-project/model_zoo/pytorch_resnest/README.md b/final-project/model_zoo/pytorch_resnest/README.md
new file mode 100644
index 0000000..ab6a88e
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/README.md
@@ -0,0 +1,170 @@
+[![PyPI](https://img.shields.io/pypi/v/resnest.svg)](https://pypi.python.org/pypi/resnest)
+[![PyPI Pre-release](https://img.shields.io/badge/pypi--prerelease-v0.0.6-ff69b4.svg)](https://pypi.org/project/resnest/#history)
+[![PyPI Nightly](https://github.com/zhanghang1989/ResNeSt/workflows/Pypi%20Nightly/badge.svg)](https://github.com/zhanghang1989/ResNeSt/actions)
+[![Downloads](http://pepy.tech/badge/resnest)](http://pepy.tech/project/resnest)
+[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
+[![Unit Test](https://github.com/zhanghang1989/ResNeSt/workflows/Unit%20Test/badge.svg)](https://github.com/zhanghang1989/ResNeSt/actions)
+[![arXiv](http://img.shields.io/badge/cs.CV-arXiv%3A2004.08955-B31B1B.svg)](https://arxiv.org/abs/2004.08955)
+
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/resnest-split-attention-networks/instance-segmentation-on-coco)](https://paperswithcode.com/sota/instance-segmentation-on-coco?p=resnest-split-attention-networks)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/resnest-split-attention-networks/object-detection-on-coco)](https://paperswithcode.com/sota/object-detection-on-coco?p=resnest-split-attention-networks)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/resnest-split-attention-networks/panoptic-segmentation-on-coco-panoptic)](https://paperswithcode.com/sota/panoptic-segmentation-on-coco-panoptic?p=resnest-split-attention-networks)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/resnest-split-attention-networks/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=resnest-split-attention-networks)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/resnest-split-attention-networks/semantic-segmentation-on-cityscapes-val)](https://paperswithcode.com/sota/semantic-segmentation-on-cityscapes-val?p=resnest-split-attention-networks)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/resnest-split-attention-networks/semantic-segmentation-on-pascal-context)](https://paperswithcode.com/sota/semantic-segmentation-on-pascal-context?p=resnest-split-attention-networks)
+
+
+# ResNeSt
+Split-Attention Network, A New ResNet Variant. It significantly boosts the performance of downstream models such as Mask R-CNN, Cascade R-CNN and DeepLabV3.
+
+![](./miscs/abstract.jpg)
+
+### Table of Contents
+0. [Pretrained Models](#pretrained-models)
+0. [Transfer Learning Models](#transfer-learning-models)
+0. [Verify  ImageNet Results](#verify-imagenet-results)
+0. [How to Train](#how-to-train)
+0. [Reference](#reference)
+
+
+### Pypi / GitHub Install
+
+0. Install this package repo, note that you only need to choose one of the options
+
+```bash
+# using github url
+pip install git+https://github.com/zhanghang1989/ResNeSt
+
+# using pypi
+pip install resnest --pre
+```
+
+## Pretrained Models
+
+|             | crop size | PyTorch | Gluon |
+|-------------|-----------|---------|-------|
+| ResNeSt-50  | 224       | 81.03   | 81.04 |
+| ResNeSt-101 | 256       | 82.83   | 82.81 |
+| ResNeSt-200 | 320       | 83.84   | 83.88 |
+| ResNeSt-269 | 416       | 84.54   | 84.53 |
+
+- **3rd party implementations** are available: [Tensorflow](https://github.com/QiaoranC/tf_ResNeSt_RegNet_model), [Caffe](https://github.com/NetEase-GameAI/ResNeSt-caffe), [JAX](https://github.com/n2cholas/jax-resnet/).
+
+- Extra ablation study models are available in [link](./ablation.md)
+
+### PyTorch Models
+
+- Load using Torch Hub
+
+```python
+import torch
+# get list of models
+torch.hub.list('zhanghang1989/ResNeSt', force_reload=True)
+
+# load pretrained models, using ResNeSt-50 as an example
+net = torch.hub.load('zhanghang1989/ResNeSt', 'resnest50', pretrained=True)
+```
+
+
+- Load using python package
+
+```python
+# using ResNeSt-50 as an example
+from resnest.torch import resnest50
+net = resnest50(pretrained=True)
+```
+
+
+### Gluon Models
+
+- Load pretrained model:
+
+```python
+# using ResNeSt-50 as an example
+from resnest.gluon import resnest50
+net = resnest50(pretrained=True)
+```
+
+## Transfer Learning Models
+
+### Detectron2
+
+We provide a wrapper for training Detectron2 models with ResNeSt backbone at [d2](./d2). Training configs and pretrained models are released. See details in [d2](./d2).
+
+### MMDetection
+
+The ResNeSt backbone has been adopted by [MMDetection](https://github.com/open-mmlab/mmdetection/tree/master/configs/resnest).
+
+### Semantic Segmentation
+
+- PyTorch models and training: Please visit [PyTorch Encoding Toolkit](https://hangzhang.org/PyTorch-Encoding/model_zoo/segmentation.html).
+- Gluon models and training: Please visit [GluonCV Toolkit](https://gluon-cv.mxnet.io/model_zoo/segmentation.html#ade20k-dataset).
+
+
+## Verify ImageNet Results:
+
+**Note:** the inference speed reported in the paper are tested using Gluon implementation with RecordIO data.
+
+### Prepare ImageNet dataset:
+
+Here we use raw image data format for simplicity, please follow [GluonCV tutorial](https://gluon-cv.mxnet.io/build/examples_datasets/recordio.html) if you would like to use RecordIO format.
+
+```bash
+cd scripts/dataset/
+# assuming you have downloaded the dataset in the current folder
+python prepare_imagenet.py --download-dir ./
+```
+
+### Torch Model
+
+```bash
+# use resnest50 as an example
+cd scripts/torch/
+python verify.py --model resnest50 --crop-size 224
+```
+
+### Gluon Model
+
+```bash
+# use resnest50 as an example
+cd scripts/gluon/
+python verify.py --model resnest50 --crop-size 224
+```
+
+## How to Train
+
+### ImageNet Models
+
+- Training with MXNet Gluon: Please visit [Gluon folder](./scripts/gluon/).
+- Training with PyTorch: Please visit [PyTorch Encoding Toolkit](https://hangzhang.org/PyTorch-Encoding/model_zoo/imagenet.html) (slightly worse than Gluon implementation).
+
+### Detectron Models
+
+For object detection and instance segmentation models, please visit our [detectron2-ResNeSt fork](https://github.com/zhanghang1989/detectron2-ResNeSt).
+
+### Semantic Segmentation
+
+- Training with PyTorch: [Encoding Toolkit](https://hangzhang.org/PyTorch-Encoding/model_zoo/segmentation.html).
+- Training with MXNet: [GluonCV Toolkit](https://gluon-cv.mxnet.io/model_zoo/segmentation.html#ade20k-dataset).
+
+## Reference
+
+**ResNeSt: Split-Attention Networks** [[arXiv](https://arxiv.org/pdf/2004.08955.pdf)]
+
+Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Zhi Zhang, Haibin Lin, Yue Sun, Tong He, Jonas Muller, R. Manmatha, Mu Li and Alex Smola
+
+```
+@article{zhang2020resnest,
+title={ResNeSt: Split-Attention Networks},
+author={Zhang, Hang and Wu, Chongruo and Zhang, Zhongyue and Zhu, Yi and Zhang, Zhi and Lin, Haibin and Sun, Yue and He, Tong and Muller, Jonas and Manmatha, R. and Li, Mu and Smola, Alexander},
+journal={arXiv preprint arXiv:2004.08955},
+year={2020}
+}
+```
+
+### Major Contributors
+
+- ResNeSt Backbone ([Hang Zhang](https://hangzhang.org/))
+- Detectron Models ([Chongruo Wu](https://github.com/chongruo), [Zhongyue Zhang](http://zhongyuezhang.com/))
+- Semantic Segmentation ([Yi Zhu](https://sites.google.com/view/yizhu/home))
+- Distributed Training ([Haibin Lin](https://sites.google.com/view/haibinlin/))
diff --git a/final-project/model_zoo/pytorch_resnest/ablation.md b/final-project/model_zoo/pytorch_resnest/ablation.md
new file mode 100644
index 0000000..125db4f
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/ablation.md
@@ -0,0 +1,45 @@
+## Pretrained Models
+
+|                 | setting | #P    | GFLOPs | PyTorch | Gluon |
+|-----------------|---------|-------|--------|---------|-------|
+| ResNeSt-50-fast | 1s1x64d | 26.3M | 4.34   | 80.33   | 80.35 |
+| ResNeSt-50-fast | 2s1x64d | 27.5M | 4.34   | 80.53   | 80.65 |
+| ResNeSt-50-fast | 4s1x64d | 31.9M | 4.35   | 80.76   | 80.90 |
+| ResNeSt-50-fast | 1s2x40d | 25.9M | 4.38   | 80.59   | 80.72 |
+| ResNeSt-50-fast | 2s2x40d | 26.9M | 4.38   | 80.61   | 80.84 |
+| ResNeSt-50-fast | 4s2x40d | 30.4M | 4.41   | 81.14   | 81.17 |
+| ResNeSt-50-fast | 1s4x24d | 25.7M | 4.42   | 80.99   | 80.97 |
+
+### PyTorch Models
+
+- Load using Torch Hub
+
+```python
+import torch
+# get list of models
+torch.hub.list('zhanghang1989/ResNeSt', force_reload=True)
+
+# load pretrained models, using ResNeSt-50-fast_2s1x64d as an example
+net = torch.hub.load('zhanghang1989/ResNeSt', 'resnest50_fast_2s1x64d', pretrained=True)
+```
+
+
+- Load using python package
+
+```python
+# using ResNeSt-50 as an example
+from resnest.torch import resnest50_fast_2s1x64d
+net = resnest50_fast_2s1x64d(pretrained=True)
+```
+
+
+### Gluon Models
+
+- Load pretrained model:
+
+```python
+# using ResNeSt-50 as an example
+from resnest.gluon import resnest50_fast_2s1x64d
+net = resnest50_fast_2s1x64d(pretrained=True)
+```
+
diff --git a/final-project/model_zoo/pytorch_resnest/configs/Base-ResNet50.yaml b/final-project/model_zoo/pytorch_resnest/configs/Base-ResNet50.yaml
new file mode 100644
index 0000000..9d04d18
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/configs/Base-ResNet50.yaml
@@ -0,0 +1,7 @@
+MODEL:
+  NAME: 'resnet50'
+TRAINING:
+  EPOCHS: 120
+  BATCH_SIZE: 32
+OPTIMIZER:
+  LR: 0.0125
diff --git a/final-project/model_zoo/pytorch_resnest/d2/README.md b/final-project/model_zoo/pytorch_resnest/d2/README.md
new file mode 100644
index 0000000..971106e
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/README.md
@@ -0,0 +1,213 @@
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/resnest-split-attention-networks/instance-segmentation-on-coco)](https://paperswithcode.com/sota/instance-segmentation-on-coco?p=resnest-split-attention-networks)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/resnest-split-attention-networks/object-detection-on-coco)](https://paperswithcode.com/sota/object-detection-on-coco?p=resnest-split-attention-networks)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/resnest-split-attention-networks/panoptic-segmentation-on-coco-panoptic)](https://paperswithcode.com/sota/panoptic-segmentation-on-coco-panoptic?p=resnest-split-attention-networks)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/resnest-split-attention-networks/instance-segmentation-on-coco-minival)](https://paperswithcode.com/sota/instance-segmentation-on-coco-minival?p=resnest-split-attention-networks)
+[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/resnest-split-attention-networks/object-detection-on-coco-minival)](https://paperswithcode.com/sota/object-detection-on-coco-minival?p=resnest-split-attention-networks)
+
+# ResNeSt (Detectron2 Wrapper)
+
+Code for detection and instance segmentation experiments in [ResNeSt](https://hangzhang.org/files/resnest.pdf).
+
+
+## Training and Inference
+Please follow [INSTALL.md](https://github.com/facebookresearch/detectron2/blob/master/INSTALL.md) to install detectron2. 
+
+To train a model with 8 gpus, please run
+```shell
+python train_net.py  --num-gpus 8 --config-file your_config.yaml
+```
+
+For inference
+```shell
+python train_net.py  \
+    --config-file your_config.yaml
+    --eval-only MODEL.WEIGHTS /path/to/checkpoint_file
+```
+
+For the inference demo, please see [GETTING_STARTED.md](https://github.com/facebookresearch/detectron2/blob/master/GETTING_STARTED.md).
+
+## Pretrained Models
+
+### Object Detection
+<table class="tg">
+  <tr>
+    <th class="tg-0pky">Method</th>
+    <th class="tg-0pky">Backbone</th>
+    <th class="tg-0pky">mAP%</th>
+    <th class="tg-0pky">download</th>
+  </tr>
+  <tr>
+    <td rowspan="5" class="tg-0pky">Faster R-CNN</td>
+    <td class="tg-0pky">ResNet-50</td>
+    <td class="tg-0pky">39.25</td>
+    <td class="tg-0lax"><a href="./configs/COCO-Detection/faster_rcnn_R_50_FPN_syncbn_range-scale_1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_rcnn_R_50_FPN_syncbn_range-scale_1x-fde56e2b.pth ">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_rcnn_R_50_FPN_syncbn_range-scale_1x.txt">log</a> </td>
+  </tr>
+  <tr>
+    <td class="tg-0lax">ResNet-101</td>
+    <td class="tg-0lax">41.37</td>
+     <td class="tg-0lax"><a href="./configs/COCO-Detection/faster_rcnn_R_101_FPN_syncbn_range-scale_1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_rcnn_R_101_FPN_syncbn_range-scale_1x-57c73356.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_rcnn_R_101_FPN_syncbn_range-scale_1x.txt">log</a> </td>
+  </tr>
+  <tr>
+    <td class="tg-0lax">ResNeSt-50 (<span style="color:red">ours</span>)</td>
+    <td class="tg-0lax"><b>42.33</b></td>
+     <td class="tg-0lax"><a href="./configs/COCO-Detection/faster_rcnn_ResNeSt_50_FPN_syncbn_range-scale_1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_rcnn_ResNeSt_50_FPN_syncbn_range-scale_1x-ad123c0b.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_rcnn_ResNeSt_50_FPN_syncbn_range-scale_1x.txt">log</a> </td>
+  </tr>
+  <tr>
+    <td class="tg-0lax">ResNeSt-50-DCNv2 (<span style="color:red">ours</span>)</td>
+    <td class="tg-0lax"><b>44.11</b></td>
+     <td class="tg-0lax"><a href="./configs/COCO-Detection/faster_rcnn_ResNeSt_50_FPN_dcn_syncbn_range-scale_1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_rcnn_ResNeSt_50_FPN_dcn_syncbn_range-scale_1x.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_rcnn_ResNeSt_50_FPN_dcn_syncbn_range-scale_1x.txt">log</a> </td>
+  </tr> 
+  <tr>
+    <td class="tg-0lax">ResNeSt-101 (<span style="color:red">ours</span>)</td>
+    <td class="tg-0lax"><b>44.72</b></td>
+    <td class="tg-0lax"><a href="./configs/COCO-Detection/faster_rcnn_ResNeSt_101_FPN_syncbn_range-scale_1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_rcnn_ResNeSt_101_FPN_syncbn_range-scale_1x-d8f284b6.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_rcnn_ResNeSt_101_FPN_syncbn_range-scale_1x.txt">log</a> </td>
+  </tr>
+  <tr>
+    <td rowspan="5" class="tg-0lax">Cascade R-CNN</td>
+    <td class="tg-0lax">ResNet-50</td>
+    <td class="tg-0lax">42.52</td>
+    <td class="tg-0lax"><a href="./configs/COCO-Detection/faster_cascade_rcnn_R_50_FPN_syncbn_range-scale_1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_cascade_rcnn_R_50_FPN_syncbn_range-scale_1x-3c7f2ef2.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_cascade_rcnn_R_50_FPN_syncbn_range-scale_1x.txt">log</a> </td>
+  </tr>
+  <tr>
+    <td class="tg-0lax">ResNet-101</td>
+    <td class="tg-0lax">44.03</td>
+    <td class="tg-0lax"><a href="./configs/COCO-Detection/faster_cascade_rcnn_R_101_FPN_syncbn_range-scale_1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_cascade_rcnn_R_101_FPN_syncbn_range-scale_1x-4073359b.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_cascade_rcnn_R_101_FPN_syncbn_range-scale_1x.txt">log</a> </td>
+  </tr>
+  <tr>
+    <td class="tg-0lax">ResNeSt-50 (<span style="color:red">ours</span>)</td>
+    <td class="tg-0lax"><b>45.41</b></td>
+    <td class="tg-0lax"><a href="./configs/COCO-Detection/faster_cascade_rcnn_ResNeSt_50_FPN_syncbn_range-scale-1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_cascade_rcnn_ResNeSt_50_FPN_syncbn_range-scale-1x-e9955232.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_cascade_rcnn_ResNeSt_50_FPN_syncbn_range-scale-1x.txt">log</a> </td>
+  </tr>
+  <tr>
+    <td class="tg-0lax">ResNeSt-101 (<span style="color:red">ours</span>)</td>
+    <td class="tg-0lax"><b>47.50</b></td>
+    <td class="tg-0lax"><a href="./configs/COCO-Detection/faster_cascade_rcnn_ResNeSt_101_FPN_syncbn_range-scale_1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_cascade_rcnn_ResNeSt_101_FPN_syncbn_range-scale_1x-3627ef78.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_cascade_rcnn_ResNeSt_101_FPN_syncbn_range-scale_1x.txt">log</a> </td>
+  </tr>
+  <tr>
+    <td class="tg-0lax">ResNeSt-200 (<span style="color:red">ours</span>)</td>
+    <td class="tg-0lax"><b>49.03</b></td>
+    <td class="tg-0lax"><a href="./configs/COCO-Detection/faster_cascade_rcnn_ResNeSt_200_FPN_syncbn_range-scale_1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_cascade_rcnn_ResNeSt_200_FPN_syncbn_range-scale_1x-1be2a87e.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/faster_cascade_rcnn_ResNeSt_200_FPN_syncbn_range-scale_1x.txt">log</a> </td>
+  </tr>
+</table>
+
+We train all models with FPN, SyncBN and image scale augmentation (short size of a image is pickedrandomly from 640 to 800). 1x learning rate schedule is used. All of them are reported on COCO-2017 validation dataset.
+
+
+
+### Instance Segmentation
+<table class="tg">
+  <tr>
+    <th class="tg-0pky">Method</th>
+    <th class="tg-0pky">Backbone</th>
+    <th class="tg-0pky">bbox</th>
+    <th class="tg-0lax">mask</th>
+    <th class="tg-0pky">download</th>
+  </tr>
+  <tr>
+    <td rowspan="4" class="tg-0pky">Mask R-CNN</td>
+    <td class="tg-0pky">ResNet-50</td>
+    <td class="tg-0pky">39.97</td>
+    <td class="tg-0lax">36.05</td>
+    <td class="tg-0lax"><a href="./configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_syncbn_1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_rcnn_R_50_FPN_syncbn_1x-4939bd58.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_rcnn_R_50_FPN_syncbn_1x.txt">log</a> </td>
+</tr>
+  <tr>
+    <td class="tg-0lax">ResNet-101</td>
+    <td class="tg-0lax">41.78</td>
+    <td class="tg-0lax">37.51</td>
+    <td class="tg-0lax"><a href="./configs/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_syncbn_1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_rcnn_R_101_FPN_syncbn_1x-55493cc2.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_rcnn_R_101_FPN_syncbn_1x.txt">log</a> </td>
+  </tr>
+  <tr>
+    <td class="tg-0lax">ResNeSt-50 (<span style="color:red">ours</span>)</td>
+    <td class="tg-0lax"><b>42.81</b></td>
+    <td class="tg-0lax"><b>38.14</td>
+    <td class="tg-0lax"><a href="./configs/COCO-InstanceSegmentation/mask_rcnn_ResNeSt_50_FPN_syncBN_1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_rcnn_ResNeSt_50_FPN_syncBN_1x-f442d863.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_rcnn_ResNeSt_50_FPN_syncBN_1x.txt">log</a> </td>
+  </tr>
+  <tr>
+    <td class="tg-0lax">ResNeSt-101 (<span style="color:red">ours</span>)</td>
+    <td class="tg-0lax"><b>45.75</b></td>
+    <td class="tg-0lax"><b>40.65</b></td>
+     <td class="tg-0lax"><a href="./configs/COCO-InstanceSegmentation/mask_rcnn_ResNeSt_101_FPN_syncBN_1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_rcnn_ResNeSt_101_FPN_syncBN_1x-528502c6.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_rcnn_ResNeSt_101_FPN_syncBN_1x.txt">log</a> </td>   
+  </tr>
+  <tr>
+    <td rowspan="7" class="tg-0lax">Cascade R-CNN</td>
+    <td class="tg-0lax">ResNet-50</td>
+    <td class="tg-0lax">43.06</td>
+    <td class="tg-0lax">37.19</td>
+    <td class="tg-0lax"><a href="./configs/COCO-InstanceSegmentation/mask_cascade_rcnn_R_50_FPN_syncbn_1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_cascade_rcnn_R_50_FPN_syncbn_1x-03310c9b.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_cascade_rcnn_R_50_FPN_syncbn_1x.txt">log</a> </td>
+  </tr>
+  <tr>
+    <td class="tg-0lax">ResNet-101</td>
+    <td class="tg-0lax">44.79</td>
+    <td class="tg-0lax">38.52</td>
+    <td class="tg-0lax"><a href="./configs/COCO-InstanceSegmentation/mask_cascade_rcnn_R_101_FPN_syncbn_1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_cascade_rcnn_R_101_FPN_syncbn_1x-8cec1631.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_cascade_rcnn_R_101_FPN_syncbn_1x.txt">log</a> </td>
+  </tr>
+  <tr>
+    <td class="tg-0lax">ResNeSt-50 (<span style="color:red">ours</span>)</td>
+    <td class="tg-0lax"><b>46.19</b></td>
+    <td class="tg-0lax"><b>39.55</b></td>
+    <td class="tg-0lax"><a href="./configs/COCO-InstanceSegmentation/mask_cascade_rcnn_ResNeSt_50_FPN_syncBN_1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_cascade_rcnn_ResNeSt_50_FPN_syncBN_1x-c58bd325.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_cascade_rcnn_ResNeSt_50_FPN_syncBN_1x.txt">log</a> </td>
+  </tr>
+  <tr>
+    <td class="tg-0lax">ResNeSt-101 (<span style="color:red">ours</span>)</td>
+    <td class="tg-0lax"><b>48.30</b></td>
+    <td class="tg-0lax"><b>41.56</b></td>
+     <td class="tg-0lax"><a href="./configs/COCO-InstanceSegmentation/mask_cascade_rcnn_ResNeSt_101_FPN_syncBN_1x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_cascade_rcnn_ResNeSt_101_FPN_syncBN_1x-62448b9c.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_cascade_rcnn_ResNeSt_101_FPN_syncBN_1x.txt">log</a> </td>
+  </tr>
+  <tr>
+    <td class="tg-0lax">ResNeSt-200-tricks-3x (<span style="color:red">ours</span>)</td>
+    <td class="tg-0lax"><b>50.54</b></td>
+    <td class="tg-0lax"><b>44.21</b></td>
+     <td class="tg-0lax"><a href="./configs/COCO-InstanceSegmentation/mask_cascade_rcnn_ResNeSt_200_FPN_syncBN_all_tricks_3x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_cascade_rcnn_ResNeSt_200_FPN_syncBN_all_tricks_3x.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_cascade_rcnn_ResNeSt_200_FPN_syncBN_all_tricks_3x.txt">log</a> </td>
+  </tr>
+  <tr>
+    <td rowspan="2" class="tg-0lax">ResNeSt-200-dcn-tricks-3x (<span style="color:red">ours</span>)</td>
+    <td class="tg-0lax"><b>50.91</b></td>
+    <td class="tg-0lax"><b>44.50</b></td>
+     <td rowspan="2"class="tg-0lax"><a href="./configs/COCO-InstanceSegmentation/mask_cascade_rcnn_ResNeSt_200_FPN_dcn_syncBN_all_tricks_3x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_cascade_rcnn_ResNeSt_200_FPN_dcn_syncBN_all_tricks_3x-e1901134.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/mask_cascade_rcnn_ResNeSt_200_FPN_dcn_syncBN_all_tricks_3x.txt">log</a> </td>
+  </tr>
+  <tr>
+    <td class="tg-0lax"><b>53.30*</b></td>
+    <td class="tg-0lax"><b>47.10*</b></td>
+  </tr>
+</table>
+
+All models are trained along with FPN and SyncBN. For data augmentation,input images’ shorter side are randomly scaled to one of (640, 672, 704, 736, 768, 800). 1x learning rate schedule is used, if not otherwise specified. All of them are reported on COCO-2017 validation dataset. The values with * demonstrate the mutli-scale testing performance on the test-dev2019.
+
+
+
+### Panoptic Segmentation
+<table class="tg">
+  <tr>
+    <th class="tg-0pky">Backbone</th>
+    <th class="tg-0pky">bbox</th>
+    <th class="tg-0lax">mask</th>
+    <th class="tg-0lax">PQ</th>
+    <th class="tg-0pky">download</th>
+  </tr>
+  <tr>
+    <td class="tg-0pky">ResNeSt-200</td>
+    <td class="tg-0pky">51.00</td>
+    <td class="tg-0lax">43.68</td>
+    <td class="tg-0lax">47.90</td>
+    <td class="tg-0lax"><a href="./configs/COCO-PanopticSegmentation/panoptic_ResNeSt_200_FPN_syncBN_tricks_3x.yaml">config</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/panoptic_ResNeSt_200_FPN_syncBN_tricks_3x-43f8b731.pth">model</a> | <a href="https://s3.us-west-1.wasabisys.com/resnest/detectron/panoptic_ResNeSt_200_FPN_syncBN_tricks_3x.txt">log</a> </td>
+</tr> 
+</table>
+
+
+## Reference
+
+**ResNeSt: Split-Attention Networks** [[arXiv](https://arxiv.org/pdf/2004.08955.pdf)]
+
+Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Zhi Zhang, Haibin Lin, Yue Sun, Tong He, Jonas Muller, R. Manmatha, Mu Li and Alex Smola
+
+```
+@article{zhang2020resnest,
+title={ResNeSt: Split-Attention Networks},
+author={Zhang, Hang and Wu, Chongruo and Zhang, Zhongyue and Zhu, Yi and Zhang, Zhi and Lin, Haibin and Sun, Yue and He, Tong and Muller, Jonas and Manmatha, R. and Li, Mu and Smola, Alexander},
+journal={arXiv preprint arXiv:2004.08955},
+year={2020}
+}
+```
+
+### Contributors
+[Chongruo Wu](https://github.com/chongruo), [Zhongyue Zhang](http://zhongyuezhang.com/), [Hang Zhang](https://hangzhang.org/)
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/Base-RCNN-FPN.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/Base-RCNN-FPN.yaml
new file mode 100644
index 0000000..3e020f2
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/Base-RCNN-FPN.yaml
@@ -0,0 +1,42 @@
+MODEL:
+  META_ARCHITECTURE: "GeneralizedRCNN"
+  BACKBONE:
+    NAME: "build_resnet_fpn_backbone"
+  RESNETS:
+    OUT_FEATURES: ["res2", "res3", "res4", "res5"]
+  FPN:
+    IN_FEATURES: ["res2", "res3", "res4", "res5"]
+  ANCHOR_GENERATOR:
+    SIZES: [[32], [64], [128], [256], [512]]  # One size for each in feature map
+    ASPECT_RATIOS: [[0.5, 1.0, 2.0]]  # Three aspect ratios (same for all in feature maps)
+  RPN:
+    IN_FEATURES: ["p2", "p3", "p4", "p5", "p6"]
+    PRE_NMS_TOPK_TRAIN: 2000  # Per FPN level
+    PRE_NMS_TOPK_TEST: 1000  # Per FPN level
+    # Detectron1 uses 2000 proposals per-batch,
+    # (See "modeling/rpn/rpn_outputs.py" for details of this legacy issue)
+    # which is approximately 1000 proposals per-image since the default batch size for FPN is 2.
+    POST_NMS_TOPK_TRAIN: 1000
+    POST_NMS_TOPK_TEST: 1000
+  ROI_HEADS:
+    NAME: "StandardROIHeads"
+    IN_FEATURES: ["p2", "p3", "p4", "p5"]
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_FC: 2
+    POOLER_RESOLUTION: 7
+  ROI_MASK_HEAD:
+    NAME: "MaskRCNNConvUpsampleHead"
+    NUM_CONV: 4
+    POOLER_RESOLUTION: 14
+DATASETS:
+  TRAIN: ("coco_2017_train",)
+  TEST: ("coco_2017_val",)
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.02
+  STEPS: (60000, 80000)
+  MAX_ITER: 90000
+INPUT:
+  MIN_SIZE_TRAIN: (640, 672, 704, 736, 768, 800)
+VERSION: 2
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_cascade_rcnn_R_101_FPN_syncbn_range-scale_1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_cascade_rcnn_R_101_FPN_syncbn_range-scale_1x.yaml
new file mode 100644
index 0000000..659656b
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_cascade_rcnn_R_101_FPN_syncbn_range-scale_1x.yaml
@@ -0,0 +1,30 @@
+_BASE_: "../Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-101.pkl"
+  MASK_ON: False
+  RESNETS:
+    DEPTH: 101
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_HEADS:
+    NAME: CascadeROIHeads
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+    CLS_AGNOSTIC_BBOX_REG: True
+  RPN: 
+    POST_NMS_TOPK_TRAIN: 2000
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.02    
+INPUT:
+  MIN_SIZE_TRAIN: (640, 800)
+  MIN_SIZE_TRAIN_SAMPLING: "range"
+  MAX_SIZE_TRAIN: 1333
+TEST:
+  PRECISE_BN:
+    ENABLED: True
+
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_cascade_rcnn_R_50_FPN_syncbn_range-scale_1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_cascade_rcnn_R_50_FPN_syncbn_range-scale_1x.yaml
new file mode 100644
index 0000000..8d3c2ad
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_cascade_rcnn_R_50_FPN_syncbn_range-scale_1x.yaml
@@ -0,0 +1,30 @@
+_BASE_: "../Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
+  MASK_ON: False
+  RESNETS:
+    DEPTH: 50
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_HEADS:
+    NAME: CascadeROIHeads
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+    CLS_AGNOSTIC_BBOX_REG: True
+  RPN:
+    POST_NMS_TOPK_TRAIN: 2000
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.02    
+INPUT:
+  MIN_SIZE_TRAIN: (640, 800)
+  MIN_SIZE_TRAIN_SAMPLING: "range"
+  MAX_SIZE_TRAIN: 1333
+TEST:
+  PRECISE_BN:
+    ENABLED: True
+
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_cascade_rcnn_ResNeSt_101_FPN_syncbn_range-scale_1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_cascade_rcnn_ResNeSt_101_FPN_syncbn_range-scale_1x.yaml
new file mode 100644
index 0000000..94c9072
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_cascade_rcnn_ResNeSt_101_FPN_syncbn_range-scale_1x.yaml
@@ -0,0 +1,34 @@
+_BASE_: "../ResNest-Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "https://s3.us-west-1.wasabisys.com/resnest/detectron/resnest101_detectron-486f69a8.pth"
+  MASK_ON: False
+  RESNETS:
+    DEPTH: 101
+    STRIDE_IN_1X1: False
+    RADIX: 2
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_HEADS:
+    NAME: CascadeROIHeads
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+    CLS_AGNOSTIC_BBOX_REG: True
+  RPN:
+    POST_NMS_TOPK_TRAIN: 2000
+  PIXEL_MEAN: [123.68, 116.779, 103.939]
+  PIXEL_STD: [58.393, 57.12, 57.375]
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.02    
+INPUT:
+  MIN_SIZE_TRAIN: (640, 800)
+  MIN_SIZE_TRAIN_SAMPLING: "range"
+  MAX_SIZE_TRAIN: 1333
+  FORMAT: "RGB"
+TEST:
+  PRECISE_BN:
+    ENABLED: True
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_cascade_rcnn_ResNeSt_200_FPN_syncbn_range-scale_1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_cascade_rcnn_ResNeSt_200_FPN_syncbn_range-scale_1x.yaml
new file mode 100644
index 0000000..338f686
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_cascade_rcnn_ResNeSt_200_FPN_syncbn_range-scale_1x.yaml
@@ -0,0 +1,34 @@
+_BASE_: "../ResNest-Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "https://s3.us-west-1.wasabisys.com/resnest/detectron/resnest200_detectron-02644020.pth"
+  MASK_ON: False
+  RESNETS:
+    DEPTH: 200
+    STRIDE_IN_1X1: False
+    RADIX: 2
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_HEADS:
+    NAME: CascadeROIHeads
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+    CLS_AGNOSTIC_BBOX_REG: True
+  RPN:
+    POST_NMS_TOPK_TRAIN: 2000
+  PIXEL_MEAN: [123.68, 116.779, 103.939]
+  PIXEL_STD: [58.393, 57.12, 57.375]
+SOLVER:
+  IMS_PER_BATCH: 16  
+  BASE_LR: 0.02    
+INPUT:
+  MIN_SIZE_TRAIN: (640, 800)
+  MIN_SIZE_TRAIN_SAMPLING: "range"
+  MAX_SIZE_TRAIN: 1333
+  FORMAT: "RGB"
+TEST:
+  PRECISE_BN:
+    ENABLED: True
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_cascade_rcnn_ResNeSt_50_FPN_syncbn_range-scale-1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_cascade_rcnn_ResNeSt_50_FPN_syncbn_range-scale-1x.yaml
new file mode 100644
index 0000000..c61906f
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_cascade_rcnn_ResNeSt_50_FPN_syncbn_range-scale-1x.yaml
@@ -0,0 +1,34 @@
+_BASE_: "../ResNest-Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "https://s3.us-west-1.wasabisys.com/resnest/detectron/resnest50_detectron-255b5649.pth"
+  MASK_ON: False
+  RESNETS:
+    DEPTH: 50
+    STRIDE_IN_1X1: False
+    RADIX: 2
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_HEADS:
+    NAME: CascadeROIHeads
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+    CLS_AGNOSTIC_BBOX_REG: True
+  RPN:
+    POST_NMS_TOPK_TRAIN: 2000
+  PIXEL_MEAN: [123.68, 116.779, 103.939]
+  PIXEL_STD: [58.393, 57.12, 57.375]
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.02    
+INPUT:
+  MIN_SIZE_TRAIN: (640, 800)
+  MIN_SIZE_TRAIN_SAMPLING: "range"
+  MAX_SIZE_TRAIN: 1333
+  FORMAT: "RGB"
+TEST:
+  PRECISE_BN:
+    ENABLED: True
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_rcnn_R_101_FPN_syncbn_range-scale_1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_rcnn_R_101_FPN_syncbn_range-scale_1x.yaml
new file mode 100644
index 0000000..f55c188
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_rcnn_R_101_FPN_syncbn_range-scale_1x.yaml
@@ -0,0 +1,30 @@
+_BASE_: "../Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-101.pkl"
+  MASK_ON: False
+  RESNETS:
+    DEPTH: 101
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+SOLVER:
+  IMS_PER_BATCH: 16 
+  BASE_LR: 0.02    
+INPUT:
+  MIN_SIZE_TRAIN: (640, 800)
+  MIN_SIZE_TRAIN_SAMPLING: "range"
+  MAX_SIZE_TRAIN: 1333
+TEST:
+  PRECISE_BN:
+    ENABLED: True
+
+
+
+
+
+
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_rcnn_R_50_FPN_syncbn_range-scale_1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_rcnn_R_50_FPN_syncbn_range-scale_1x.yaml
new file mode 100644
index 0000000..7d5b581
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_rcnn_R_50_FPN_syncbn_range-scale_1x.yaml
@@ -0,0 +1,25 @@
+_BASE_: "../Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
+  MASK_ON: False
+  RESNETS:
+    STRIDE_IN_1X1: True
+    DEPTH: 50
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+INPUT:
+  MIN_SIZE_TRAIN: (640, 800)
+  MIN_SIZE_TRAIN_SAMPLING: "range"
+  MAX_SIZE_TRAIN: 1333
+TEST:
+  PRECISE_BN:
+    ENABLED: True
+
+
+
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_rcnn_ResNeSt_101_FPN_syncbn_range-scale_1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_rcnn_ResNeSt_101_FPN_syncbn_range-scale_1x.yaml
new file mode 100644
index 0000000..f759107
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_rcnn_ResNeSt_101_FPN_syncbn_range-scale_1x.yaml
@@ -0,0 +1,29 @@
+_BASE_: "../ResNest-Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "https://s3.us-west-1.wasabisys.com/resnest/detectron/resnest101_detectron-486f69a8.pth"
+  MASK_ON: False
+  RESNETS:
+    DEPTH: 101
+    STRIDE_IN_1X1: False
+    RADIX: 2
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+  PIXEL_MEAN: [123.68, 116.779, 103.939]
+  PIXEL_STD: [58.393, 57.12, 57.375]
+SOLVER:
+  IMS_PER_BATCH: 16  
+  BASE_LR: 0.02    
+INPUT:
+  MIN_SIZE_TRAIN: (640, 800)
+  MIN_SIZE_TRAIN_SAMPLING: "range"
+  MAX_SIZE_TRAIN: 1333
+  FORMAT: "RGB"
+TEST:
+  PRECISE_BN:
+    ENABLED: True
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_rcnn_ResNeSt_50_FPN_dcn_syncbn_range-scale_1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_rcnn_ResNeSt_50_FPN_dcn_syncbn_range-scale_1x.yaml
new file mode 100644
index 0000000..df53f92
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_rcnn_ResNeSt_50_FPN_dcn_syncbn_range-scale_1x.yaml
@@ -0,0 +1,38 @@
+_BASE_: "../ResNest-Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "https://s3.us-west-1.wasabisys.com/resnest/detectron/resnest50_detectron-255b5649.pth"
+  MASK_ON: False
+  RESNETS:
+    DEPTH: 50
+    STRIDE_IN_1X1: False
+    RADIX: 2
+    DEFORM_ON_PER_STAGE: [False, True, True, True] # on Res3,Res4,Res5
+    DEFORM_MODULATED: True
+    DEFORM_NUM_GROUPS: 2
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+  PIXEL_MEAN: [123.68, 116.779, 103.939]
+  PIXEL_STD: [58.393, 57.12, 57.375]
+SOLVER:
+  IMS_PER_BATCH: 16  
+  BASE_LR: 0.02    
+INPUT:
+  MIN_SIZE_TRAIN: (640, 800)
+  MIN_SIZE_TRAIN_SAMPLING: "range"
+  MAX_SIZE_TRAIN: 1333
+  FORMAT: "RGB"
+TEST:
+  PRECISE_BN:
+    ENABLED: True
+
+
+
+
+
+
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_rcnn_ResNeSt_50_FPN_syncbn_range-scale_1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_rcnn_ResNeSt_50_FPN_syncbn_range-scale_1x.yaml
new file mode 100644
index 0000000..08c48dc
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-Detection/faster_rcnn_ResNeSt_50_FPN_syncbn_range-scale_1x.yaml
@@ -0,0 +1,35 @@
+_BASE_: "../ResNest-Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "https://s3.us-west-1.wasabisys.com/resnest/detectron/resnest50_detectron-255b5649.pth"
+  MASK_ON: False
+  RESNETS:
+    DEPTH: 50
+    STRIDE_IN_1X1: False
+    RADIX: 2
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+  PIXEL_MEAN: [123.68, 116.779, 103.939]
+  PIXEL_STD: [58.393, 57.12, 57.375]
+SOLVER:
+  IMS_PER_BATCH: 16  
+  BASE_LR: 0.02    
+INPUT:
+  MIN_SIZE_TRAIN: (640, 800)
+  MIN_SIZE_TRAIN_SAMPLING: "range"
+  MAX_SIZE_TRAIN: 1333
+  FORMAT: "RGB"
+TEST:
+  PRECISE_BN:
+    ENABLED: True
+
+
+
+
+
+
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_R_101_FPN_syncbn_1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_R_101_FPN_syncbn_1x.yaml
new file mode 100644
index 0000000..12390be
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_R_101_FPN_syncbn_1x.yaml
@@ -0,0 +1,23 @@
+_BASE_: "../Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-101.pkl"
+  MASK_ON: True
+  RESNETS:
+    DEPTH: 101
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_HEADS:
+    NAME: CascadeROIHeads
+  ROI_BOX_HEAD:
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+    CLS_AGNOSTIC_BBOX_REG: True
+  ROI_MASK_HEAD:
+    NORM: "SyncBN"
+  RPN:
+    POST_NMS_TOPK_TRAIN: 2000
+TEST:
+  PRECISE_BN:
+    ENABLED: True
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_R_50_FPN_syncbn_1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_R_50_FPN_syncbn_1x.yaml
new file mode 100644
index 0000000..142c6f6
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_R_50_FPN_syncbn_1x.yaml
@@ -0,0 +1,23 @@
+_BASE_: "../Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
+  MASK_ON: True
+  RESNETS:
+    DEPTH: 50
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_HEADS:
+    NAME: CascadeROIHeads
+  ROI_BOX_HEAD:
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+    CLS_AGNOSTIC_BBOX_REG: True
+  ROI_MASK_HEAD:
+    NORM: "SyncBN"
+  RPN:
+    POST_NMS_TOPK_TRAIN: 2000
+TEST:
+  PRECISE_BN:
+    ENABLED: True
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_ResNeSt_101_FPN_syncBN_1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_ResNeSt_101_FPN_syncBN_1x.yaml
new file mode 100644
index 0000000..3656aeb
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_ResNeSt_101_FPN_syncBN_1x.yaml
@@ -0,0 +1,35 @@
+_BASE_: "../ResNest-Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "https://s3.us-west-1.wasabisys.com/resnest/detectron/resnest101_detectron-486f69a8.pth"
+  MASK_ON: True
+  RESNETS:
+    DEPTH: 101
+    STRIDE_IN_1X1: False
+    RADIX: 2
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_HEADS:
+    NAME: CascadeROIHeads
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+    CLS_AGNOSTIC_BBOX_REG: True
+  ROI_MASK_HEAD:
+    NORM: "SyncBN"
+  RPN:
+    POST_NMS_TOPK_TRAIN: 2000
+  PIXEL_MEAN: [123.68, 116.779, 103.939]
+  PIXEL_STD: [58.393, 57.12, 57.375]
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.02    
+INPUT:
+  FORMAT: "RGB"
+TEST:
+  PRECISE_BN:
+    ENABLED: True
+
+
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_ResNeSt_200_FPN_dcn_syncBN_all_tricks_3x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_ResNeSt_200_FPN_dcn_syncBN_all_tricks_3x.yaml
new file mode 100644
index 0000000..1d69288
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_ResNeSt_200_FPN_dcn_syncBN_all_tricks_3x.yaml
@@ -0,0 +1,46 @@
+_BASE_: "../ResNest-Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "https://s3.us-west-1.wasabisys.com/resnest/detectron/resnest200_detectron-02644020.pth"
+  MASK_ON: True
+  RESNETS:
+    DEPTH: 200
+    STRIDE_IN_1X1: False
+    RADIX: 2
+    DEFORM_ON_PER_STAGE: [False, True, True, True] # on Res3,Res4,Res5
+    DEFORM_MODULATED: True
+    DEFORM_NUM_GROUPS: 2
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_HEADS:
+    NAME: CascadeROIHeads
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+    CLS_AGNOSTIC_BBOX_REG: True
+  ROI_MASK_HEAD:
+    NUM_CONV: 8
+    NORM: "SyncBN"
+  RPN:
+    POST_NMS_TOPK_TRAIN: 2000
+  PIXEL_MEAN: [123.68, 116.779, 103.939]
+  PIXEL_STD: [58.393, 57.12, 57.375]
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.02
+  STEPS: (240000, 255000)
+  MAX_ITER: 270000
+INPUT:
+  MIN_SIZE_TRAIN: (640, 864)
+  MIN_SIZE_TRAIN_SAMPLING: "range"
+  MAX_SIZE_TRAIN: 1440
+  CROP:
+    ENABLED: True
+  FORMAT: "RGB"
+TEST:
+  PRECISE_BN:
+    ENABLED: True
+  AUG:
+    ENABLED: False
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_ResNeSt_200_FPN_syncBN_all_tricks_3x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_ResNeSt_200_FPN_syncBN_all_tricks_3x.yaml
new file mode 100644
index 0000000..26a2eaf
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_ResNeSt_200_FPN_syncBN_all_tricks_3x.yaml
@@ -0,0 +1,47 @@
+_BASE_: "../ResNest-Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "https://s3.us-west-1.wasabisys.com/resnest/detectron/resnest200_detectron-02644020.pth"
+  MASK_ON: True
+  RESNETS:
+    DEPTH: 200
+    STRIDE_IN_1X1: False
+    RADIX: 2        
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_HEADS:
+    NAME: CascadeROIHeads
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+    CLS_AGNOSTIC_BBOX_REG: True
+  ROI_MASK_HEAD:
+    NUM_CONV: 8
+    NORM: "SyncBN"
+  RPN:
+    POST_NMS_TOPK_TRAIN: 2000
+  PIXEL_MEAN: [123.68, 116.779, 103.939]
+  PIXEL_STD: [58.393, 57.12, 57.375]
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.02    
+  STEPS: (240000, 255000)
+  MAX_ITER: 270000
+INPUT:
+  MIN_SIZE_TRAIN: (640, 864)
+  MIN_SIZE_TRAIN_SAMPLING: "range"
+  MAX_SIZE_TRAIN: 1440
+  CROP:
+    ENABLED: True
+  FORMAT: "RGB"
+TEST:
+  PRECISE_BN:
+    ENABLED: True
+
+
+
+
+
+
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_ResNeSt_50_FPN_syncBN_1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_ResNeSt_50_FPN_syncBN_1x.yaml
new file mode 100644
index 0000000..d1423f6
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_cascade_rcnn_ResNeSt_50_FPN_syncBN_1x.yaml
@@ -0,0 +1,37 @@
+_BASE_: "../ResNest-Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "https://s3.us-west-1.wasabisys.com/resnest/detectron/resnest50_detectron-255b5649.pth"
+  MASK_ON: True
+  RESNETS:
+    DEPTH: 50
+    STRIDE_IN_1X1: False
+    RADIX: 2
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_HEADS:
+    NAME: CascadeROIHeads
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+    CLS_AGNOSTIC_BBOX_REG: True
+  ROI_MASK_HEAD:
+    NORM: "SyncBN"
+  RPN:
+    POST_NMS_TOPK_TRAIN: 2000
+  PIXEL_MEAN: [123.68, 116.779, 103.939]
+  PIXEL_STD: [58.393, 57.12, 57.375]
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.02    
+INPUT:
+  FORMAT: "RGB"
+TEST:
+  PRECISE_BN:
+    ENABLED: True
+
+
+
+
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_syncbn_1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_syncbn_1x.yaml
new file mode 100644
index 0000000..2a3cff4
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_syncbn_1x.yaml
@@ -0,0 +1,19 @@
+_BASE_: "../Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-101.pkl"
+  MASK_ON: True
+  RESNETS:
+    DEPTH: 101
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+  ROI_MASK_HEAD:
+    NORM: "SyncBN"
+TEST:
+  PRECISE_BN:
+    ENABLED: True
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_syncbn_1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_syncbn_1x.yaml
new file mode 100644
index 0000000..314bcbb
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_syncbn_1x.yaml
@@ -0,0 +1,19 @@
+_BASE_: "../Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "detectron2://ImageNetPretrained/MSRA/R-50.pkl"
+  MASK_ON: True
+  RESNETS:
+    DEPTH: 50
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+  ROI_MASK_HEAD:
+    NORM: "SyncBN"
+TEST:
+  PRECISE_BN:
+    ENABLED: True
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_rcnn_ResNeSt_101_FPN_syncBN_1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_rcnn_ResNeSt_101_FPN_syncBN_1x.yaml
new file mode 100644
index 0000000..f30488c
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_rcnn_ResNeSt_101_FPN_syncBN_1x.yaml
@@ -0,0 +1,28 @@
+_BASE_: "../ResNest-Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "https://s3.us-west-1.wasabisys.com/resnest/detectron/resnest101_detectron-486f69a8.pth"
+  MASK_ON: True
+  RESNETS:
+    DEPTH: 101
+    STRIDE_IN_1X1: False
+    RADIX: 2        
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+  ROI_MASK_HEAD:
+    NORM: "SyncBN"
+  PIXEL_MEAN: [123.68, 116.779, 103.939]
+  PIXEL_STD: [58.393, 57.12, 57.375]
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.02    
+INPUT:
+  FORMAT: "RGB"
+TEST:
+  PRECISE_BN:
+    ENABLED: True
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_rcnn_ResNeSt_50_FPN_syncBN_1x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_rcnn_ResNeSt_50_FPN_syncBN_1x.yaml
new file mode 100644
index 0000000..a9112aa
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-InstanceSegmentation/mask_rcnn_ResNeSt_50_FPN_syncBN_1x.yaml
@@ -0,0 +1,28 @@
+_BASE_: "../ResNest-Base-RCNN-FPN.yaml"
+MODEL:
+  WEIGHTS: "https://s3.us-west-1.wasabisys.com/resnest/detectron/resnest50_detectron-255b5649.pth"
+  MASK_ON: True
+  RESNETS:
+    DEPTH: 50
+    STRIDE_IN_1X1: False
+    RADIX: 2        
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+  ROI_MASK_HEAD:
+    NORM: "SyncBN"
+  PIXEL_MEAN: [123.68, 116.779, 103.939]
+  PIXEL_STD: [58.393, 57.12, 57.375]
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.02    
+INPUT:
+  FORMAT: "RGB"
+TEST:
+  PRECISE_BN:
+    ENABLED: True
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-PanopticSegmentation/ResNeSt-Base-Panoptic-FPN.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-PanopticSegmentation/ResNeSt-Base-Panoptic-FPN.yaml
new file mode 100644
index 0000000..3ce4548
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-PanopticSegmentation/ResNeSt-Base-Panoptic-FPN.yaml
@@ -0,0 +1,9 @@
+_BASE_: "../ResNest-Base-RCNN-FPN.yaml"
+MODEL:
+  META_ARCHITECTURE: "PanopticFPN"
+  MASK_ON: True
+  SEM_SEG_HEAD:
+    LOSS_WEIGHT: 0.5
+DATASETS:
+  TRAIN: ("coco_2017_train_panoptic_separated",)
+  TEST: ("coco_2017_val_panoptic_separated",)
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-PanopticSegmentation/panoptic_ResNeSt_200_FPN_syncBN_tricks_3x.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-PanopticSegmentation/panoptic_ResNeSt_200_FPN_syncBN_tricks_3x.yaml
new file mode 100644
index 0000000..0c33ad8
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/COCO-PanopticSegmentation/panoptic_ResNeSt_200_FPN_syncBN_tricks_3x.yaml
@@ -0,0 +1,42 @@
+_BASE_: "ResNeSt-Base-Panoptic-FPN.yaml"
+MODEL:
+  WEIGHTS: "https://s3.us-west-1.wasabisys.com/resnest/detectron/resnest200_detectron-02644020.pth"
+  RESNETS:
+    DEPTH: 200
+    STRIDE_IN_1X1: False
+    RADIX: 2
+    NORM: "SyncBN"
+  FPN:
+    NORM: "SyncBN"
+  ROI_HEADS:
+    NAME: CascadeROIHeads
+  ROI_BOX_HEAD:
+    NAME: "FastRCNNConvFCHead"
+    NUM_CONV: 4
+    NUM_FC: 1
+    NORM: "SyncBN"
+    CLS_AGNOSTIC_BBOX_REG: True
+  SEM_SEG_HEAD:
+    NORM: "SyncBN"
+  RPN:
+    POST_NMS_TOPK_TRAIN: 2000
+  PIXEL_MEAN: [123.68, 116.779, 103.939]
+  PIXEL_STD: [58.393, 57.12, 57.375]
+SOLVER:
+  IMS_PER_BATCH: 16
+  BASE_LR: 0.02
+  STEPS: (240000, 255000)
+  MAX_ITER: 270000
+INPUT:
+  MIN_SIZE_TRAIN: (400, 1000)
+  MIN_SIZE_TRAIN_SAMPLING: "range"
+  MAX_SIZE_TRAIN: 1440
+  FORMAT: "RGB"
+TEST:
+  PRECISE_BN:
+    ENABLED: True
+  AUG:
+    ENABLED: True
+
+
+
diff --git a/final-project/model_zoo/pytorch_resnest/d2/configs/ResNest-Base-RCNN-FPN.yaml b/final-project/model_zoo/pytorch_resnest/d2/configs/ResNest-Base-RCNN-FPN.yaml
new file mode 100644
index 0000000..619d279
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/configs/ResNest-Base-RCNN-FPN.yaml
@@ -0,0 +1,4 @@
+_BASE_: "Base-RCNN-FPN.yaml"
+MODEL:
+  BACKBONE:
+    NAME: "build_resnest_fpn_backbone"
diff --git a/final-project/model_zoo/pytorch_resnest/d2/datasets/prepare_coco.py b/final-project/model_zoo/pytorch_resnest/d2/datasets/prepare_coco.py
new file mode 100644
index 0000000..b1d15d0
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/datasets/prepare_coco.py
@@ -0,0 +1,64 @@
+"""Prepare MS COCO datasets"""
+import os
+import shutil
+import argparse
+import zipfile
+from resnest.utils import download, mkdir
+
+_TARGET_DIR = os.path.expanduser('./coco')
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='Initialize MS COCO dataset.',
+        epilog='Example: python mscoco.py --download-dir ~/mscoco',
+        formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    parser.add_argument('--download-dir', type=str, default=None, help='dataset directory on disk')
+    args = parser.parse_args()
+    return args
+
+def download_coco(path, overwrite=False):
+    _DOWNLOAD_URLS = [
+        ('http://images.cocodataset.org/zips/train2017.zip',
+         '10ad623668ab00c62c096f0ed636d6aff41faca5'),
+        ('http://images.cocodataset.org/zips/val2017.zip',
+         '4950dc9d00dbe1c933ee0170f5797584351d2a41'),
+        ('http://images.cocodataset.org/annotations/annotations_trainval2017.zip',
+         '8551ee4bb5860311e79dace7e79cb91e432e78b3'),
+        ('https://hangzh.s3.amazonaws.com/encoding/data/coco/train_ids.pth',
+         '12cd266f97c8d9ea86e15a11f11bcb5faba700b6'),
+        ('https://hangzh.s3.amazonaws.com/encoding/data/coco/val_ids.pth',
+         '4ce037ac33cbf3712fd93280a1c5e92dae3136bb'),
+    ]
+    mkdir(path)
+    for url, checksum in _DOWNLOAD_URLS:
+        filename = download(url, path=path, overwrite=overwrite, sha1_hash=checksum)
+        # extract
+        if os.path.splitext(filename)[1] == '.zip':
+            with zipfile.ZipFile(filename) as zf:
+                zf.extractall(path=path)
+        else:
+            shutil.move(filename, os.path.join(path, 'annotations/'+os.path.basename(filename)))
+
+
+def install_coco_api():
+    repo_url = "https://github.com/cocodataset/cocoapi"
+    os.system("git clone " + repo_url)
+    os.system("cd cocoapi/PythonAPI/ && python setup.py install")
+    shutil.rmtree('cocoapi')
+    try:
+        import pycocotools
+    except Exception:
+        print("Installing COCO API failed, please install it manually %s"%(repo_url))
+
+
+if __name__ == '__main__':
+    args = parse_args()
+    mkdir(os.path.expanduser('~/.encoding/data'))
+    if args.download_dir is not None:
+        if os.path.isdir(_TARGET_DIR):
+            os.remove(_TARGET_DIR)
+        # make symlink
+        os.symlink(args.download_dir, _TARGET_DIR)
+    else:
+        download_coco(_TARGET_DIR, overwrite=False)
+    install_coco_api()
diff --git a/final-project/model_zoo/pytorch_resnest/d2/train_net.py b/final-project/model_zoo/pytorch_resnest/d2/train_net.py
new file mode 100644
index 0000000..6fbb7b6
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/d2/train_net.py
@@ -0,0 +1,170 @@
+#!/usr/bin/env python
+# Copyright (c) Facebook, Inc. and its affiliates.
+"""
+Detection Training Script.
+
+This scripts reads a given config file and runs the training or evaluation.
+It is an entry point that is made to train standard models in detectron2.
+
+In order to let one script support training of many models,
+this script contains logic that are specific to these built-in models and therefore
+may not be suitable for your own project.
+For example, your research project perhaps only needs a single "evaluator".
+
+Therefore, we recommend you to use detectron2 as an library and take
+this file as an example of how to use the library.
+You may want to write your own script with your datasets and other customizations.
+"""
+
+import logging
+import os
+from collections import OrderedDict
+import torch
+
+import detectron2.utils.comm as comm
+from detectron2.checkpoint import DetectionCheckpointer
+from detectron2.config import get_cfg
+from detectron2.data import MetadataCatalog
+from detectron2.engine import DefaultTrainer, default_argument_parser, default_setup, hooks, launch
+from detectron2.evaluation import (
+    CityscapesInstanceEvaluator,
+    CityscapesSemSegEvaluator,
+    COCOEvaluator,
+    COCOPanopticEvaluator,
+    DatasetEvaluators,
+    LVISEvaluator,
+    PascalVOCDetectionEvaluator,
+    SemSegEvaluator,
+    verify_results,
+)
+from detectron2.modeling import GeneralizedRCNNWithTTA
+from resnest.d2 import add_resnest_config
+
+
+class Trainer(DefaultTrainer):
+    """
+    We use the "DefaultTrainer" which contains pre-defined default logic for
+    standard training workflow. They may not work for you, especially if you
+    are working on a new research project. In that case you can write your
+    own training loop. You can use "tools/plain_train_net.py" as an example.
+    """
+
+    @classmethod
+    def build_evaluator(cls, cfg, dataset_name, output_folder=None):
+        """
+        Create evaluator(s) for a given dataset.
+        This uses the special metadata "evaluator_type" associated with each builtin dataset.
+        For your own dataset, you can simply create an evaluator manually in your
+        script and do not have to worry about the hacky if-else logic here.
+        """
+        if output_folder is None:
+            output_folder = os.path.join(cfg.OUTPUT_DIR, "inference")
+        evaluator_list = []
+        evaluator_type = MetadataCatalog.get(dataset_name).evaluator_type
+        if evaluator_type in ["sem_seg", "coco_panoptic_seg"]:
+            evaluator_list.append(
+                SemSegEvaluator(
+                    dataset_name,
+                    distributed=True,
+                    output_dir=output_folder,
+                )
+            )
+        if evaluator_type in ["coco", "coco_panoptic_seg"]:
+            evaluator_list.append(COCOEvaluator(dataset_name, output_dir=output_folder))
+        if evaluator_type == "coco_panoptic_seg":
+            evaluator_list.append(COCOPanopticEvaluator(dataset_name, output_folder))
+        if evaluator_type == "cityscapes_instance":
+            assert (
+                torch.cuda.device_count() >= comm.get_rank()
+            ), "CityscapesEvaluator currently do not work with multiple machines."
+            return CityscapesInstanceEvaluator(dataset_name)
+        if evaluator_type == "cityscapes_sem_seg":
+            assert (
+                torch.cuda.device_count() >= comm.get_rank()
+            ), "CityscapesEvaluator currently do not work with multiple machines."
+            return CityscapesSemSegEvaluator(dataset_name)
+        elif evaluator_type == "pascal_voc":
+            return PascalVOCDetectionEvaluator(dataset_name)
+        elif evaluator_type == "lvis":
+            return LVISEvaluator(dataset_name, output_dir=output_folder)
+        if len(evaluator_list) == 0:
+            raise NotImplementedError(
+                "no Evaluator for the dataset {} with the type {}".format(
+                    dataset_name, evaluator_type
+                )
+            )
+        elif len(evaluator_list) == 1:
+            return evaluator_list[0]
+        return DatasetEvaluators(evaluator_list)
+
+    @classmethod
+    def test_with_TTA(cls, cfg, model):
+        logger = logging.getLogger("detectron2.trainer")
+        # In the end of training, run an evaluation with TTA
+        # Only support some R-CNN models.
+        logger.info("Running inference with test-time augmentation ...")
+        model = GeneralizedRCNNWithTTA(cfg, model)
+        evaluators = [
+            cls.build_evaluator(
+                cfg, name, output_folder=os.path.join(cfg.OUTPUT_DIR, "inference_TTA")
+            )
+            for name in cfg.DATASETS.TEST
+        ]
+        res = cls.test(cfg, model, evaluators)
+        res = OrderedDict({k + "_TTA": v for k, v in res.items()})
+        return res
+
+
+def setup(args):
+    """
+    Create configs and perform basic setups.
+    """
+    cfg = get_cfg()
+    add_resnest_config(cfg)
+    cfg.merge_from_file(args.config_file)
+    cfg.merge_from_list(args.opts)
+    cfg.freeze()
+    default_setup(cfg, args)
+    return cfg
+
+
+def main(args):
+    cfg = setup(args)
+
+    if args.eval_only:
+        model = Trainer.build_model(cfg)
+        DetectionCheckpointer(model, save_dir=cfg.OUTPUT_DIR).resume_or_load(
+            cfg.MODEL.WEIGHTS, resume=args.resume
+        )
+        res = Trainer.test(cfg, model)
+        if cfg.TEST.AUG.ENABLED:
+            res.update(Trainer.test_with_TTA(cfg, model))
+        if comm.is_main_process():
+            verify_results(cfg, res)
+        return res
+
+    """
+    If you'd like to do anything fancier than the standard training logic,
+    consider writing your own training loop (see plain_train_net.py) or
+    subclassing the trainer.
+    """
+    trainer = Trainer(cfg)
+    trainer.resume_or_load(resume=args.resume)
+    if cfg.TEST.AUG.ENABLED:
+        trainer.register_hooks(
+            [hooks.EvalHook(0, lambda: trainer.test_with_TTA(cfg, trainer.model))]
+        )
+    return trainer.train()
+
+
+if __name__ == "__main__":
+    args = default_argument_parser().parse_args()
+    print("Command Line Args:", args)
+    launch(
+        main,
+        args.num_gpus,
+        num_machines=args.num_machines,
+        machine_rank=args.machine_rank,
+        dist_url=args.dist_url,
+        args=(args,),
+    )
diff --git a/final-project/model_zoo/pytorch_resnest/hubconf.py b/final-project/model_zoo/pytorch_resnest/hubconf.py
new file mode 100644
index 0000000..10ac36e
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/hubconf.py
@@ -0,0 +1,6 @@
+dependencies = ['torch']
+
+from resnest.torch import resnest50, resnest101, resnest200, resnest269
+from resnest.torch import (resnest50_fast_1s1x64d, resnest50_fast_2s1x64d, resnest50_fast_4s1x64d,
+                                    resnest50_fast_1s2x40d, resnest50_fast_2s2x40d, resnest50_fast_4s2x40d,
+                                    resnest50_fast_1s4x24d)
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/__init__.py b/final-project/model_zoo/pytorch_resnest/resnest/__init__.py
new file mode 100644
index 0000000..e69de29
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/d2/__init__.py b/final-project/model_zoo/pytorch_resnest/resnest/d2/__init__.py
new file mode 100644
index 0000000..74ff03a
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/d2/__init__.py
@@ -0,0 +1,2 @@
+from .resnest import build_resnest_backbone, build_resnest_fpn_backbone
+from .config import add_resnest_config
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/d2/config.py b/final-project/model_zoo/pytorch_resnest/resnest/d2/config.py
new file mode 100644
index 0000000..471347f
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/d2/config.py
@@ -0,0 +1,20 @@
+from detectron2.config import CfgNode as CN
+
+def add_resnest_config(cfg):
+    """Add config for ResNeSt
+    """
+    # Place the stride 2 conv on the 1x1 filter
+    # Use True only for the original MSRA ResNet;
+    # use False for C2 and Torch models
+    cfg.MODEL.RESNETS.STRIDE_IN_1X1 = False
+    # Apply deep stem
+    cfg.MODEL.RESNETS.DEEP_STEM = True
+    # Apply avg after conv2 in the BottleBlock
+    # When AVD=True, the STRIDE_IN_1X1 should be False
+    cfg.MODEL.RESNETS.AVD = True
+    # Apply avg_down to the downsampling layer for residual path
+    cfg.MODEL.RESNETS.AVG_DOWN = True
+    # Radix in ResNeSt
+    cfg.MODEL.RESNETS.RADIX = 2
+    # Bottleneck_width in ResNeSt
+    cfg.MODEL.RESNETS.BOTTLENECK_WIDTH = 64
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/d2/resnest.py b/final-project/model_zoo/pytorch_resnest/resnest/d2/resnest.py
new file mode 100644
index 0000000..2d881c8
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/d2/resnest.py
@@ -0,0 +1,734 @@
+import numpy as np
+import fvcore.nn.weight_init as weight_init
+import torch
+import torch.nn.functional as F
+from torch import nn
+
+from detectron2.layers import (
+    Conv2d,
+    DeformConv,
+    FrozenBatchNorm2d,
+    ModulatedDeformConv,
+    ShapeSpec,
+    get_norm,
+)
+
+from detectron2.modeling.backbone import Backbone, FPN, BACKBONE_REGISTRY
+from detectron2.modeling.backbone.fpn import LastLevelMaxPool
+
+__all__ = [
+    "ResNeSt",
+    "build_resnest_backbone",
+    "build_resnest_fpn_backbone",
+]
+
+
+class ResNetBlockBase(nn.Module):
+    def __init__(self, in_channels, out_channels, stride):
+        """
+        The `__init__` method of any subclass should also contain these arguments.
+
+        Args:
+            in_channels (int):
+            out_channels (int):
+            stride (int):
+        """
+        super().__init__()
+        self.in_channels = in_channels
+        self.out_channels = out_channels
+        self.stride = stride
+
+    def freeze(self):
+        for p in self.parameters():
+            p.requires_grad = False
+        FrozenBatchNorm2d.convert_frozen_batchnorm(self)
+        return self
+
+
+class BasicBlock(ResNetBlockBase):
+    def __init__(self, in_channels, out_channels, *, stride=1, norm="BN"):
+        """
+        The standard block type for ResNet18 and ResNet34.
+
+        Args:
+            in_channels (int): Number of input channels.
+            out_channels (int): Number of output channels.
+            stride (int): Stride for the first conv.
+            norm (str or callable): A callable that takes the number of
+                channels and returns a `nn.Module`, or a pre-defined string
+                (one of {"FrozenBN", "BN", "GN"}).
+        """
+        super().__init__(in_channels, out_channels, stride)
+
+        if in_channels != out_channels:
+            self.shortcut = Conv2d(
+                in_channels,
+                out_channels,
+                kernel_size=1,
+                stride=stride,
+                bias=False,
+                norm=get_norm(norm, out_channels),
+            )
+        else:
+            self.shortcut = None
+
+        self.conv1 = Conv2d(
+            in_channels,
+            out_channels,
+            kernel_size=3,
+            stride=stride,
+            padding=1,
+            bias=False,
+            norm=get_norm(norm, out_channels),
+        )
+
+        self.conv2 = Conv2d(
+            out_channels,
+            out_channels,
+            kernel_size=3,
+            stride=1,
+            padding=1,
+            bias=False,
+            norm=get_norm(norm, out_channels),
+        )
+
+        for layer in [self.conv1, self.conv2, self.shortcut]:
+            if layer is not None:  # shortcut can be None
+                weight_init.c2_msra_fill(layer)
+
+    def forward(self, x):
+        out = self.conv1(x)
+        out = F.relu_(out)
+        out = self.conv2(out)
+
+        if self.shortcut is not None:
+            shortcut = self.shortcut(x)
+        else:
+            shortcut = x
+
+        out += shortcut
+        out = F.relu_(out)
+        return out
+
+
+class BottleneckBlock(ResNetBlockBase):
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        *,
+        bottleneck_channels,
+        stride=1,
+        num_groups=1,
+        norm="BN",
+        stride_in_1x1=False,
+        dilation=1,
+        avd=False,
+        avg_down=False,
+        radix=2,
+        bottleneck_width=64,
+    ):
+        """
+        Args:
+            norm (str or callable): a callable that takes the number of
+                channels and return a `nn.Module`, or a pre-defined string
+                (one of {"FrozenBN", "BN", "GN"}).
+            stride_in_1x1 (bool): when stride==2, whether to put stride in the
+                first 1x1 convolution or the bottleneck 3x3 convolution.
+        """
+        super().__init__(in_channels, out_channels, stride)
+
+        self.avd = avd and (stride>1)
+        self.avg_down = avg_down
+        self.radix = radix
+
+        cardinality = num_groups
+        group_width = int(bottleneck_channels * (bottleneck_width / 64.)) * cardinality 
+
+        if in_channels != out_channels:
+            if self.avg_down:
+                self.shortcut_avgpool = nn.AvgPool2d(kernel_size=stride, stride=stride, 
+                                                     ceil_mode=True, count_include_pad=False)
+                self.shortcut = Conv2d(
+                    in_channels,
+                    out_channels,
+                    kernel_size=1,
+                    stride=1,
+                    bias=False,
+                    norm=get_norm(norm, out_channels),
+                )
+            else:
+                self.shortcut = Conv2d(
+                    in_channels,
+                    out_channels,
+                    kernel_size=1,
+                    stride=stride,
+                    bias=False,
+                    norm=get_norm(norm, out_channels),
+                )
+        else:
+            self.shortcut = None
+
+        # The original MSRA ResNet models have stride in the first 1x1 conv
+        # The subsequent fb.torch.resnet and Caffe2 ResNe[X]t implementations have
+        # stride in the 3x3 conv
+        stride_1x1, stride_3x3 = (stride, 1) if stride_in_1x1 else (1, stride)
+
+        self.conv1 = Conv2d(
+            in_channels,
+            group_width,
+            kernel_size=1,
+            stride=stride_1x1,
+            bias=False,
+            norm=get_norm(norm, group_width),
+        )
+
+        if self.radix>1:
+            from .splat import SplAtConv2d
+            self.conv2 = SplAtConv2d(
+                            group_width, group_width, kernel_size=3, 
+                            stride = 1 if self.avd else stride_3x3,
+                            padding=dilation, dilation=dilation, 
+                            groups=cardinality, bias=False,
+                            radix=self.radix, 
+                            norm=norm,
+                         )
+        else:
+            self.conv2 = Conv2d(
+                group_width,
+                group_width,
+                kernel_size=3,
+                stride=1 if self.avd else stride_3x3,
+                padding=1 * dilation,
+                bias=False,
+                groups=num_groups,
+                dilation=dilation,
+                norm=get_norm(norm, group_width),
+            )
+
+        if self.avd:
+            self.avd_layer = nn.AvgPool2d(3, stride, padding=1)
+
+        self.conv3 = Conv2d(
+            group_width,
+            out_channels,
+            kernel_size=1,
+            bias=False,
+            norm=get_norm(norm, out_channels),
+        )
+
+        if self.radix>1:
+            for layer in [self.conv1, self.conv3, self.shortcut]:
+                if layer is not None:  # shortcut can be None
+                    weight_init.c2_msra_fill(layer)
+        else:
+            for layer in [self.conv1, self.conv2, self.conv3, self.shortcut]:
+                if layer is not None:  # shortcut can be None
+                    weight_init.c2_msra_fill(layer)
+
+        # Zero-initialize the last normalization in each residual branch,
+        # so that at the beginning, the residual branch starts with zeros,
+        # and each residual block behaves like an identity.
+        # See Sec 5.1 in "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour":
+        # "For BN layers, the learnable scaling coefficient γ is initialized
+        # to be 1, except for each residual block's last BN
+        # where γ is initialized to be 0."
+
+        # nn.init.constant_(self.conv3.norm.weight, 0)
+        # TODO this somehow hurts performance when training GN models from scratch.
+        # Add it as an option when we need to use this code to train a backbone.
+
+    def forward(self, x):
+        out = self.conv1(x)
+        out = F.relu_(out)
+
+        if self.radix>1:
+            out = self.conv2(out)
+        else:
+            out = self.conv2(out)
+            out = F.relu_(out)
+
+        if self.avd:
+            out = self.avd_layer(out)
+
+        out = self.conv3(out)
+
+        if self.shortcut is not None:
+            if self.avg_down:
+                x = self.shortcut_avgpool(x) 
+            shortcut = self.shortcut(x)
+        else:
+            shortcut = x
+
+        out += shortcut
+        out = F.relu_(out)
+        return out
+
+
+class DeformBottleneckBlock(ResNetBlockBase):
+    def __init__(
+        self,
+        in_channels,
+        out_channels,
+        *,
+        bottleneck_channels,
+        stride=1,
+        num_groups=1,
+        norm="BN",
+        stride_in_1x1=False,
+        dilation=1,
+        deform_modulated=False,
+        deform_num_groups=1,
+        avd=False,
+        avg_down=False,
+        radix=2,
+        bottleneck_width=64,
+    ):
+        """
+        Similar to :class:`BottleneckBlock`, but with deformable conv in the 3x3 convolution.
+        """
+        super().__init__(in_channels, out_channels, stride)
+        self.deform_modulated = deform_modulated
+        self.avd = avd and (stride>1)
+        self.avg_down = avg_down
+        self.radix = radix
+
+        cardinality = num_groups
+        group_width = int(bottleneck_channels * (bottleneck_width / 64.)) * cardinality 
+
+        if in_channels != out_channels:
+            if self.avg_down:
+                self.shortcut_avgpool = nn.AvgPool2d(kernel_size=stride, stride=stride, 
+                                                     ceil_mode=True, count_include_pad=False)
+                self.shortcut = Conv2d(
+                    in_channels,
+                    out_channels,
+                    kernel_size=1,
+                    stride=1,
+                    bias=False,
+                    norm=get_norm(norm, out_channels),
+                )
+            else:
+                self.shortcut = Conv2d(
+                    in_channels,
+                    out_channels,
+                    kernel_size=1,
+                    stride=stride,
+                    bias=False,
+                    norm=get_norm(norm, out_channels),
+                )
+        else:
+            self.shortcut = None
+
+        stride_1x1, stride_3x3 = (stride, 1) if stride_in_1x1 else (1, stride)
+
+        self.conv1 = Conv2d(
+            in_channels,
+            group_width,
+            kernel_size=1,
+            stride=stride_1x1,
+            bias=False,
+            norm=get_norm(norm, group_width),
+        )
+
+        if deform_modulated:
+            deform_conv_op = ModulatedDeformConv
+            # offset channels are 2 or 3 (if with modulated) * kernel_size * kernel_size
+            offset_channels = 27
+        else:
+            deform_conv_op = DeformConv
+            offset_channels = 18
+
+        self.conv2_offset = Conv2d(
+            bottleneck_channels,
+            offset_channels * deform_num_groups,
+            kernel_size=3,
+            stride=1 if self.avd else stride_3x3,
+            padding=1 * dilation,
+            dilation=dilation,
+            groups=deform_num_groups,
+        )
+        if self.radix>1:
+            from .splat import SplAtConv2d_dcn
+            self.conv2 = SplAtConv2d_dcn(
+                            group_width, group_width, kernel_size=3, 
+                            stride = 1 if self.avd else stride_3x3,
+                            padding=dilation, dilation=dilation, 
+                            groups=cardinality, bias=False,
+                            radix=self.radix, 
+                            norm=norm,
+                            deform_conv_op=deform_conv_op,
+                            deformable_groups=deform_num_groups,
+                            deform_modulated=deform_modulated,
+
+                         )
+        else:
+            self.conv2 = deform_conv_op(
+                bottleneck_channels,
+                bottleneck_channels,
+                kernel_size=3,
+                stride=1 if self.avd else stride_3x3,
+                padding=1 * dilation,
+                bias=False,
+                groups=num_groups,
+                dilation=dilation,
+                deformable_groups=deform_num_groups,
+                norm=get_norm(norm, bottleneck_channels),
+            )
+
+        if self.avd:
+            self.avd_layer = nn.AvgPool2d(3, stride, padding=1)
+
+        self.conv3 = Conv2d(
+            group_width,
+            out_channels,
+            kernel_size=1,
+            bias=False,
+            norm=get_norm(norm, out_channels),
+        )
+
+        if self.radix>1:
+            for layer in [self.conv1, self.conv3, self.shortcut]:
+                if layer is not None:  # shortcut can be None
+                    weight_init.c2_msra_fill(layer)
+        else:
+            for layer in [self.conv1, self.conv2, self.conv3, self.shortcut]:
+                if layer is not None:  # shortcut can be None
+                    weight_init.c2_msra_fill(layer)
+
+        nn.init.constant_(self.conv2_offset.weight, 0)
+        nn.init.constant_(self.conv2_offset.bias, 0)
+
+    def forward(self, x):
+        out = self.conv1(x)
+        out = F.relu_(out)
+
+        if self.radix>1:
+            offset = self.conv2_offset(out)
+            out = self.conv2(out, offset)
+        else:
+            if self.deform_modulated:
+                offset_mask = self.conv2_offset(out)
+                offset_x, offset_y, mask = torch.chunk(offset_mask, 3, dim=1)
+                offset = torch.cat((offset_x, offset_y), dim=1)
+                mask = mask.sigmoid()
+                out = self.conv2(out, offset, mask)
+            else:
+                offset = self.conv2_offset(out)
+                out = self.conv2(out, offset)
+            out = F.relu_(out)
+
+        if self.avd:
+            out = self.avd_layer(out)
+
+        out = self.conv3(out)
+
+        if self.shortcut is not None:
+            if self.avg_down:
+                x = self.shortcut_avgpool(x) 
+            shortcut = self.shortcut(x)
+        else:
+            shortcut = x
+
+        out += shortcut
+        out = F.relu_(out)
+        return out
+
+
+def make_stage(block_class, num_blocks, first_stride, **kwargs):
+    """
+    Create a resnet stage by creating many blocks.
+
+    Args:
+        block_class (class): a subclass of ResNetBlockBase
+        num_blocks (int):
+        first_stride (int): the stride of the first block. The other blocks will have stride=1.
+            A `stride` argument will be passed to the block constructor.
+        kwargs: other arguments passed to the block constructor.
+
+    Returns:
+        list[nn.Module]: a list of block module.
+    """
+    blocks = []
+    for i in range(num_blocks):
+        blocks.append(block_class(stride=first_stride if i == 0 else 1, **kwargs))
+        kwargs["in_channels"] = kwargs["out_channels"]
+    return blocks
+
+
+class BasicStem(nn.Module):
+    def __init__(self, in_channels=3, out_channels=64, norm="BN",
+                 deep_stem=False, stem_width=32):
+        """
+        Args:
+            norm (str or callable): a callable that takes the number of
+                channels and return a `nn.Module`, or a pre-defined string
+                (one of {"FrozenBN", "BN", "GN"}).
+        """
+        super().__init__()
+        self.deep_stem = deep_stem
+
+        if self.deep_stem:
+            self.conv1_1 = Conv2d(3, stem_width, kernel_size=3, stride=2, 
+                                  padding=1, bias=False,
+                                  norm=get_norm(norm, stem_width),
+                                 ) 
+            self.conv1_2 = Conv2d(stem_width, stem_width, kernel_size=3, stride=1,
+                                  padding=1, bias=False,
+                                  norm=get_norm(norm, stem_width),
+                                 ) 
+            self.conv1_3 = Conv2d(stem_width, stem_width*2, kernel_size=3, stride=1,
+                                  padding=1, bias=False,
+                                  norm=get_norm(norm, stem_width*2),
+                                 ) 
+            for layer in [self.conv1_1, self.conv1_2, self.conv1_3]:
+                if layer is not None:  
+                    weight_init.c2_msra_fill(layer)
+        else:
+            self.conv1 = Conv2d(
+                in_channels,
+                out_channels,
+                kernel_size=7,
+                stride=2,
+                padding=3,
+                bias=False,
+                norm=get_norm(norm, out_channels),
+            )
+            weight_init.c2_msra_fill(self.conv1)
+
+    def forward(self, x):
+        if self.deep_stem:
+            x = self.conv1_1(x)
+            x = F.relu_(x)
+            x = self.conv1_2(x)
+            x = F.relu_(x)
+            x = self.conv1_3(x)
+            x = F.relu_(x)
+        else:
+            x = self.conv1(x)
+            x = F.relu_(x)
+        x = F.max_pool2d(x, kernel_size=3, stride=2, padding=1)
+        return x
+
+    @property
+    def out_channels(self):
+        if self.deep_stem:
+            return self.conv1_3.out_channels
+        else:
+            return self.conv1.out_channels
+
+    @property
+    def stride(self):
+        return 4  # = stride 2 conv -> stride 2 max pool
+
+
+class ResNeSt(Backbone):
+    def __init__(self, stem, stages, num_classes=None, out_features=None):
+        """
+        Args:
+            stem (nn.Module): a stem module
+            stages (list[list[ResNetBlock]]): several (typically 4) stages,
+                each contains multiple :class:`ResNetBlockBase`.
+            num_classes (None or int): if None, will not perform classification.
+            out_features (list[str]): name of the layers whose outputs should
+                be returned in forward. Can be anything in "stem", "linear", or "res2" ...
+                If None, will return the output of the last layer.
+        """
+        super(ResNeSt, self).__init__()
+        self.stem = stem
+        self.num_classes = num_classes
+
+        current_stride = self.stem.stride
+        self._out_feature_strides = {"stem": current_stride}
+        self._out_feature_channels = {"stem": self.stem.out_channels}
+
+        self.stages_and_names = []
+        for i, blocks in enumerate(stages):
+            for block in blocks:
+                assert isinstance(block, ResNetBlockBase), block
+                curr_channels = block.out_channels
+            stage = nn.Sequential(*blocks)
+            name = "res" + str(i + 2)
+            self.add_module(name, stage)
+            self.stages_and_names.append((stage, name))
+            self._out_feature_strides[name] = current_stride = int(
+                current_stride * np.prod([k.stride for k in blocks])
+            )
+            self._out_feature_channels[name] = blocks[-1].out_channels
+
+        if num_classes is not None:
+            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
+            self.linear = nn.Linear(curr_channels, num_classes)
+
+            # Sec 5.1 in "Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour":
+            # "The 1000-way fully-connected layer is initialized by
+            # drawing weights from a zero-mean Gaussian with standard deviation of 0.01."
+            nn.init.normal_(self.linear.weight, std=0.01)
+            name = "linear"
+
+        if out_features is None:
+            out_features = [name]
+        self._out_features = out_features
+        assert len(self._out_features)
+        children = [x[0] for x in self.named_children()]
+        for out_feature in self._out_features:
+            assert out_feature in children, "Available children: {}".format(", ".join(children))
+
+    def forward(self, x):
+        outputs = {}
+        x = self.stem(x)
+        if "stem" in self._out_features:
+            outputs["stem"] = x
+        for stage, name in self.stages_and_names:
+            x = stage(x)
+            if name in self._out_features:
+                outputs[name] = x
+        if self.num_classes is not None:
+            x = self.avgpool(x)
+            x = torch.flatten(x, 1)
+            x = self.linear(x)
+            if "linear" in self._out_features:
+                outputs["linear"] = x
+        return outputs
+
+    def output_shape(self):
+        return {
+            name: ShapeSpec(
+                channels=self._out_feature_channels[name], stride=self._out_feature_strides[name]
+            )
+            for name in self._out_features
+        }
+
+
+@BACKBONE_REGISTRY.register()
+def build_resnest_backbone(cfg, input_shape):
+    """
+    Create a ResNeSt instance from config.
+
+    Returns:
+        ResNeSt: a :class:`ResNeSt` instance.
+    """
+
+    depth = cfg.MODEL.RESNETS.DEPTH
+    stem_width = {50: 32, 101: 64, 152: 64, 200: 64, 269: 64}[depth] 
+    radix = cfg.MODEL.RESNETS.RADIX 
+    deep_stem = cfg.MODEL.RESNETS.DEEP_STEM or (radix > 1)
+
+    # need registration of new blocks/stems?
+    norm = cfg.MODEL.RESNETS.NORM
+    stem = BasicStem(
+        in_channels=input_shape.channels,
+        out_channels=cfg.MODEL.RESNETS.STEM_OUT_CHANNELS,
+        norm=norm,
+        deep_stem=deep_stem,
+        stem_width=stem_width,
+    )
+    freeze_at = cfg.MODEL.BACKBONE.FREEZE_AT
+
+    if freeze_at >= 1:
+        for p in stem.parameters():
+            p.requires_grad = False
+        stem = FrozenBatchNorm2d.convert_frozen_batchnorm(stem)
+
+    # fmt: off
+    out_features        = cfg.MODEL.RESNETS.OUT_FEATURES
+    num_groups          = cfg.MODEL.RESNETS.NUM_GROUPS
+    width_per_group     = cfg.MODEL.RESNETS.WIDTH_PER_GROUP
+    bottleneck_channels = num_groups * width_per_group
+    in_channels         = cfg.MODEL.RESNETS.STEM_OUT_CHANNELS
+    out_channels        = cfg.MODEL.RESNETS.RES2_OUT_CHANNELS
+    stride_in_1x1       = cfg.MODEL.RESNETS.STRIDE_IN_1X1
+    res5_dilation       = cfg.MODEL.RESNETS.RES5_DILATION
+    deform_on_per_stage = cfg.MODEL.RESNETS.DEFORM_ON_PER_STAGE
+    deform_modulated    = cfg.MODEL.RESNETS.DEFORM_MODULATED
+    deform_num_groups   = cfg.MODEL.RESNETS.DEFORM_NUM_GROUPS
+    avd                 = cfg.MODEL.RESNETS.AVD or (radix > 1)
+    avg_down            = cfg.MODEL.RESNETS.AVG_DOWN or (radix > 1)
+    bottleneck_width    = cfg.MODEL.RESNETS.BOTTLENECK_WIDTH
+    # fmt: on
+    assert res5_dilation in {1, 2}, "res5_dilation cannot be {}.".format(res5_dilation)
+
+    num_blocks_per_stage = {
+        18: [2, 2, 2, 2],
+        34: [3, 4, 6, 3],
+        50: [3, 4, 6, 3],
+        101: [3, 4, 23, 3],
+        152: [3, 8, 36, 3],
+        200: [3, 24, 36, 3],
+        269: [3, 30, 48, 8],
+    }[depth]
+
+    if depth in [18, 34]:
+        assert out_channels == 64, "Must set MODEL.RESNETS.RES2_OUT_CHANNELS = 64 for R18/R34"
+        assert not any(
+            deform_on_per_stage
+        ), "MODEL.RESNETS.DEFORM_ON_PER_STAGE unsupported for R18/R34"
+        assert res5_dilation == 1, "Must set MODEL.RESNETS.RES5_DILATION = 1 for R18/R34"
+        assert num_groups == 1, "Must set MODEL.RESNETS.NUM_GROUPS = 1 for R18/R34"
+
+    stages = []
+
+    # Avoid creating variables without gradients
+    # It consumes extra memory and may cause allreduce to fail
+    out_stage_idx = [{"res2": 2, "res3": 3, "res4": 4, "res5": 5}[f] for f in out_features]
+    max_stage_idx = max(out_stage_idx)
+    in_channels = 2*stem_width if deep_stem else in_channels
+    for idx, stage_idx in enumerate(range(2, max_stage_idx + 1)):
+        dilation = res5_dilation if stage_idx == 5 else 1
+        first_stride = 1 if idx == 0 or (stage_idx == 5 and dilation == 2) else 2
+        stage_kargs = {
+            "num_blocks": num_blocks_per_stage[idx],
+            "first_stride": first_stride,
+            "in_channels": in_channels,
+            "out_channels": out_channels,
+            "norm": norm,
+            "avd": avd,
+            "avg_down": avg_down,
+            "radix": radix,
+            "bottleneck_width": bottleneck_width,
+        }
+        # Use BasicBlock for R18 and R34.
+        if depth in [18, 34]:
+            stage_kargs["block_class"] = BasicBlock
+        else:
+            stage_kargs["bottleneck_channels"] = bottleneck_channels
+            stage_kargs["stride_in_1x1"] = stride_in_1x1
+            stage_kargs["dilation"] = dilation
+            stage_kargs["num_groups"] = num_groups
+            if deform_on_per_stage[idx]:
+                stage_kargs["block_class"] = DeformBottleneckBlock
+                stage_kargs["deform_modulated"] = deform_modulated
+                stage_kargs["deform_num_groups"] = deform_num_groups
+            else:
+                stage_kargs["block_class"] = BottleneckBlock
+        blocks = make_stage(**stage_kargs)
+        in_channels = out_channels
+        out_channels *= 2
+        bottleneck_channels *= 2
+
+        if freeze_at >= stage_idx:
+            for block in blocks:
+                block.freeze()
+        stages.append(blocks)
+    return ResNeSt(stem, stages, out_features=out_features)
+
+@BACKBONE_REGISTRY.register()
+def build_resnest_fpn_backbone(cfg, input_shape: ShapeSpec):
+    """
+    Args:
+        cfg: a detectron2 CfgNode
+    Returns:
+        backbone (Backbone): backbone module, must be a subclass of :class:`Backbone`.
+    """
+    bottom_up = build_resnest_backbone(cfg, input_shape)
+    in_features = cfg.MODEL.FPN.IN_FEATURES
+    out_channels = cfg.MODEL.FPN.OUT_CHANNELS
+    backbone = FPN(
+        bottom_up=bottom_up,
+        in_features=in_features,
+        out_channels=out_channels,
+        norm=cfg.MODEL.FPN.NORM,
+        top_block=LastLevelMaxPool(),
+        fuse_type=cfg.MODEL.FPN.FUSE_TYPE,
+    )
+    return backbone
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/d2/splat.py b/final-project/model_zoo/pytorch_resnest/resnest/d2/splat.py
new file mode 100644
index 0000000..c60fcda
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/d2/splat.py
@@ -0,0 +1,183 @@
+"""Split-Attention"""
+
+import torch
+from torch import nn
+import torch.nn.functional as F
+from torch.nn import Module, Linear, BatchNorm2d, ReLU
+from torch.nn.modules.utils import _pair
+
+from detectron2.layers import (
+    Conv2d,
+    get_norm,
+)
+
+__all__ = ['SplAtConv2d', 'SplAtConv2d_dcn']
+
+class DropBlock2D(object):
+    def __init__(self, *args, **kwargs):
+        raise NotImplementedError
+
+class SplAtConv2d(Module):
+    """Split-Attention Conv2d
+    """
+    def __init__(self, in_channels, channels, kernel_size, stride=(1, 1), padding=(0, 0),
+                 dilation=(1, 1), groups=1, bias=True,
+                 radix=2, reduction_factor=4,
+                 rectify=False, rectify_avg=False, norm=None,
+                 dropblock_prob=0.0, **kwargs):
+        super(SplAtConv2d, self).__init__()
+        padding = _pair(padding)
+        self.rectify = rectify and (padding[0] > 0 or padding[1] > 0)
+        self.rectify_avg = rectify_avg
+        inter_channels = max(in_channels*radix//reduction_factor, 32)
+        self.radix = radix
+        self.cardinality = groups
+        self.channels = channels
+        self.dropblock_prob = dropblock_prob
+        if self.rectify:
+            from rfconv import RFConv2d
+            self.conv = RFConv2d(in_channels, channels*radix, kernel_size, stride, padding, dilation,
+                                 groups=groups*radix, bias=bias, average_mode=rectify_avg, **kwargs)
+        else:
+            self.conv = Conv2d(in_channels, channels*radix, kernel_size, stride, padding, dilation,
+                               groups=groups*radix, bias=bias, **kwargs)
+        self.use_bn = norm is not None
+        if self.use_bn:
+            self.bn0 = get_norm(norm, channels*radix)
+        self.relu = ReLU(inplace=True)
+        self.fc1 = Conv2d(channels, inter_channels, 1, groups=self.cardinality)
+        if self.use_bn:
+            self.bn1 = get_norm(norm, inter_channels)
+        self.fc2 = Conv2d(inter_channels, channels*radix, 1, groups=self.cardinality)
+        if dropblock_prob > 0.0:
+            self.dropblock = DropBlock2D(dropblock_prob, 3)
+        self.rsoftmax = rSoftMax(radix, groups)
+
+    def forward(self, x):
+        x = self.conv(x)
+        if self.use_bn:
+            x = self.bn0(x)
+        if self.dropblock_prob > 0.0:
+            x = self.dropblock(x)
+        x = self.relu(x)
+
+        batch, rchannel = x.shape[:2]
+        if self.radix > 1:
+            splited = torch.split(x, rchannel//self.radix, dim=1)
+            gap = sum(splited) 
+        else:
+            gap = x
+        gap = F.adaptive_avg_pool2d(gap, 1)
+        gap = self.fc1(gap)
+
+        if self.use_bn:
+            gap = self.bn1(gap)
+        gap = self.relu(gap)
+
+        atten = self.fc2(gap)
+        atten = self.rsoftmax(atten).view(batch, -1, 1, 1)
+
+        if self.radix > 1:
+            attens = torch.split(atten, rchannel//self.radix, dim=1)
+            out = sum([att*split for (att, split) in zip(attens, splited)])
+        else:
+            out = atten * x
+        return out.contiguous()
+
+class rSoftMax(nn.Module):
+    def __init__(self, radix, cardinality):
+        super().__init__()
+        self.radix = radix
+        self.cardinality = cardinality
+
+    def forward(self, x):
+        batch = x.size(0)
+        if self.radix > 1:
+            x = x.view(batch, self.cardinality, self.radix, -1).transpose(1, 2)
+            x = F.softmax(x, dim=1)
+            x = x.reshape(batch, -1)
+        else:
+            x = torch.sigmoid(x)
+        return x
+
+
+class SplAtConv2d_dcn(Module):
+    """Split-Attention Conv2d with dcn
+    """
+    def __init__(self, in_channels, channels, kernel_size, stride=(1, 1), padding=(0, 0),
+                 dilation=(1, 1), groups=1, bias=True,
+                 radix=2, reduction_factor=4,
+                 rectify=False, rectify_avg=False, norm=None,
+                 dropblock_prob=0.0, 
+                 deform_conv_op=None,
+                 deformable_groups=1,
+                 deform_modulated=False,
+                 **kwargs):
+        super(SplAtConv2d_dcn, self).__init__()
+        self.deform_modulated = deform_modulated
+
+        padding = _pair(padding)
+        self.rectify = rectify and (padding[0] > 0 or padding[1] > 0)
+        self.rectify_avg = rectify_avg
+        inter_channels = max(in_channels*radix//reduction_factor, 32)
+        self.radix = radix
+        self.cardinality = groups
+        self.channels = channels
+        self.dropblock_prob = dropblock_prob
+        if self.rectify:
+            from rfconv import RFConv2d
+            self.conv = RFConv2d(in_channels, channels*radix, kernel_size, stride, padding, dilation,
+                                 groups=groups*radix, bias=bias, average_mode=rectify_avg, **kwargs)
+        else:
+            self.conv = deform_conv_op(in_channels, channels*radix, kernel_size, stride, padding[0], dilation,
+                               groups=groups*radix, bias=bias, deformable_groups=deformable_groups, **kwargs)
+        self.use_bn = norm is not None
+        if self.use_bn:
+            self.bn0 = get_norm(norm, channels*radix)
+        self.relu = ReLU(inplace=True)
+        self.fc1 = Conv2d(channels, inter_channels, 1, groups=self.cardinality)
+        if self.use_bn:
+            self.bn1 = get_norm(norm, inter_channels)
+        self.fc2 = Conv2d(inter_channels, channels*radix, 1, groups=self.cardinality)
+        if dropblock_prob > 0.0:
+            self.dropblock = DropBlock2D(dropblock_prob, 3)
+        self.rsoftmax = rSoftMax(radix, groups)
+
+    def forward(self, x, offset_input):
+
+        if self.deform_modulated: 
+            offset_x, offset_y, mask = torch.chunk(offset_input, 3, dim=1)
+            offset = torch.cat((offset_x, offset_y), dim=1)
+            mask = mask.sigmoid() 
+            x = self.conv(x, offset, mask)
+        else:
+            x = self.conv(x, offset_input)
+
+        if self.use_bn:
+            x = self.bn0(x)
+        if self.dropblock_prob > 0.0:
+            x = self.dropblock(x)
+        x = self.relu(x)
+
+        batch, rchannel = x.shape[:2]
+        if self.radix > 1:
+            splited = torch.split(x, rchannel//self.radix, dim=1)
+            gap = sum(splited) 
+        else:
+            gap = x
+        gap = F.adaptive_avg_pool2d(gap, 1)
+        gap = self.fc1(gap)
+
+        if self.use_bn:
+            gap = self.bn1(gap)
+        gap = self.relu(gap)
+
+        atten = self.fc2(gap)
+        atten = self.rsoftmax(atten).view(batch, -1, 1, 1)
+
+        if self.radix > 1:
+            attens = torch.split(atten, rchannel//self.radix, dim=1)
+            out = sum([att*split for (att, split) in zip(attens, splited)])
+        else:
+            out = atten * x
+        return out.contiguous()
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/gluon/__init__.py b/final-project/model_zoo/pytorch_resnest/resnest/gluon/__init__.py
new file mode 100644
index 0000000..ec51eaa
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/gluon/__init__.py
@@ -0,0 +1,3 @@
+from .resnest import *
+from .ablation import *
+from .model_zoo import get_model
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/gluon/ablation.py b/final-project/model_zoo/pytorch_resnest/resnest/gluon/ablation.py
new file mode 100644
index 0000000..2b53616
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/gluon/ablation.py
@@ -0,0 +1,107 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+"""Ablation Study Models for ResNeSt"""
+from .resnet import ResNet, Bottleneck
+from mxnet import cpu
+
+__all__ = ['resnest50_fast_1s1x64d', 'resnest50_fast_2s1x64d', 'resnest50_fast_4s1x64d',
+           'resnest50_fast_1s2x40d', 'resnest50_fast_2s2x40d', 'resnest50_fast_4s2x40d',
+           'resnest50_fast_1s4x24d']
+
+def resnest50_fast_1s1x64d(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 6, 3],
+                   radix=1, cardinality=1, bottleneck_width=64,
+                   deep_stem=True, avg_down=True,
+                   avd=True, avd_first=True,
+                   use_splat=True, dropblock_prob=0.1,
+                   name_prefix='resnetv1f_', **kwargs)
+    if pretrained:
+        from .model_store import get_model_file
+        model.load_parameters(get_model_file('resnest50_fast_1s1x64d',
+                                             root=root), ctx=ctx)
+    return model
+
+def resnest50_fast_2s1x64d(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 6, 3],
+                   radix=2, cardinality=1, bottleneck_width=64,
+                   deep_stem=True, avg_down=True,
+                   avd=True, avd_first=True,
+                   use_splat=True, dropblock_prob=0.1,
+                   name_prefix='resnetv1f_', **kwargs)
+    if pretrained:
+        from .model_store import get_model_file
+        model.load_parameters(get_model_file('resnest50_fast_2s1x64d',
+                                             root=root), ctx=ctx)
+    return model
+
+def resnest50_fast_4s1x64d(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 6, 3],
+                   radix=4, cardinality=1, bottleneck_width=64,
+                   deep_stem=True, avg_down=True,
+                   avd=True, avd_first=True,
+                   use_splat=True, dropblock_prob=0.1,
+                   name_prefix='resnetv1f_', **kwargs)
+    if pretrained:
+        from .model_store import get_model_file
+        model.load_parameters(get_model_file('resnest50_fast_4s1x64d',
+                                             root=root), ctx=ctx)
+    return model
+
+def resnest50_fast_1s2x40d(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 6, 3],
+                   radix=1, cardinality=2, bottleneck_width=40,
+                   deep_stem=True, avg_down=True,
+                   avd=True, avd_first=True,
+                   use_splat=True, dropblock_prob=0.1,
+                   name_prefix='resnetv1f_', **kwargs)
+    if pretrained:
+        from .model_store import get_model_file
+        model.load_parameters(get_model_file('resnest50_fast_1s2x40d',
+                                             root=root), ctx=ctx)
+    return model
+
+def resnest50_fast_2s2x40d(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 6, 3],
+                   radix=2, cardinality=2, bottleneck_width=40,
+                   deep_stem=True, avg_down=True,
+                   avd=True, avd_first=True,
+                   use_splat=True, dropblock_prob=0.1,
+                   name_prefix='resnetv1f_', **kwargs)
+    if pretrained:
+        from .model_store import get_model_file
+        model.load_parameters(get_model_file('resnest50_fast_2s2x40d',
+                                             root=root), ctx=ctx)
+    return model
+
+def resnest50_fast_4s2x40d(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 6, 3],
+                   radix=4, cardinality=2, bottleneck_width=40,
+                   deep_stem=True, avg_down=True,
+                   avd=True, avd_first=True,
+                   use_splat=True, dropblock_prob=0.1,
+                   name_prefix='resnetv1f_', **kwargs)
+    if pretrained:
+        from .model_store import get_model_file
+        model.load_parameters(get_model_file('resnest50_fast_4s2x40d',
+                                             root=root), ctx=ctx)
+    return model
+
+def resnest50_fast_1s4x24d(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 6, 3],
+                   radix=1, cardinality=4, bottleneck_width=24,
+                   deep_stem=True, avg_down=True,
+                   avd=True, avd_first=True,
+                   use_splat=True, dropblock_prob=0.1,
+                   name_prefix='resnetv1f_', **kwargs)
+    if pretrained:
+        from .model_store import get_model_file
+        model.load_parameters(get_model_file('resnest50_fast_1s4x24d',
+                                             root=root), ctx=ctx)
+    return model
+
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/gluon/data_utils.py b/final-project/model_zoo/pytorch_resnest/resnest/gluon/data_utils.py
new file mode 100644
index 0000000..2b1a67a
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/gluon/data_utils.py
@@ -0,0 +1,63 @@
+from PIL import Image
+
+import mxnet as mx
+from mxnet.gluon import Block
+from ..transforms import *
+
+class RandAugment(object):
+    def __init__(self, n, m):
+        self.n = n
+        self.m = m
+        self.augment_list = rand_augment_list()
+        self.topil = ToPIL()
+
+    def __call__(self, img):
+        img = self.topil(img)
+        ops = random.choices(self.augment_list, k=self.n)
+        for op, minval, maxval in ops:
+            if random.random() > random.uniform(0.2, 0.8):
+                continue
+            val = (float(self.m) / 30) * float(maxval - minval) + minval
+            img = op(img, val)
+        return img
+
+
+class ToPIL(object):
+    """Convert image from ndarray format to PIL
+    """
+    def __call__(self, img):
+        x = Image.fromarray(img.asnumpy())
+        return x
+
+class ToNDArray(object):
+    def __call__(self, img):
+        x = mx.nd.array(np.array(img), mx.cpu(0))
+        return x
+
+class AugmentationBlock(Block):
+    r"""
+    AutoAugment Block
+
+    Example
+    -------
+    >>> from autogluon.utils.augment import AugmentationBlock, autoaug_imagenet_policies
+    >>> aa_transform = AugmentationBlock(autoaug_imagenet_policies())
+    """
+    def __init__(self, policies):
+        """
+        plicies : list of (name, pr, level)
+        """
+        super().__init__()
+        self.policies = policies
+        self.topil = ToPIL()
+        self.tond = ToNDArray()
+
+    def forward(self, img):
+        img = self.topil(img)
+        policy = random.choice(self.policies)
+        for name, pr, level in policy:
+            if random.random() > pr:
+                continue
+            img = apply_augment(img, name, level)
+        img = self.tond(img)
+        return img
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/gluon/dropblock.py b/final-project/model_zoo/pytorch_resnest/resnest/gluon/dropblock.py
new file mode 100644
index 0000000..6fcd81b
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/gluon/dropblock.py
@@ -0,0 +1,73 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+import mxnet as mx
+from functools import partial
+from mxnet.gluon.nn import MaxPool2D, Block, HybridBlock
+
+__all__ = ['DropBlock', 'set_drop_prob', 'DropBlockScheduler']
+
+class DropBlock(HybridBlock):
+    def __init__(self, drop_prob, block_size, c, h, w):
+        super().__init__()
+        self.drop_prob = drop_prob
+        self.block_size = block_size
+        self.c, self.h, self.w = c, h, w
+        self.numel = c * h * w
+        pad_h = max((block_size - 1), 0)
+        pad_w = max((block_size - 1), 0)
+        self.padding = (pad_h//2, pad_h-pad_h//2, pad_w//2, pad_w-pad_w//2)
+        self.dtype = 'float32'
+
+    def hybrid_forward(self, F, x):
+        if not mx.autograd.is_training() or self.drop_prob <= 0:
+            return x
+        gamma = self.drop_prob * (self.h * self.w) / (self.block_size ** 2) / \
+            ((self.w - self.block_size + 1) * (self.h - self.block_size + 1))
+        # generate mask
+        mask = F.random.uniform(0, 1, shape=(1, self.c, self.h, self.w), dtype=self.dtype) < gamma
+        mask = F.Pooling(mask, pool_type='max',
+                         kernel=(self.block_size, self.block_size), pad=self.padding)
+        mask = 1 - mask
+        y = F.broadcast_mul(F.broadcast_mul(x, mask),
+                            (1.0 * self.numel / mask.sum(axis=0, exclude=True).expand_dims(1).expand_dims(1).expand_dims(1)))
+        return y
+
+    def cast(self, dtype):
+        super(DropBlock, self).cast(dtype)
+        self.dtype = dtype
+
+    def __repr__(self):
+        reprstr = self.__class__.__name__ + '(' + \
+            'drop_prob: {}, block_size{}'.format(self.drop_prob, self.block_size) +')'
+        return reprstr
+
+def set_drop_prob(drop_prob, module):
+    """
+    Example:
+        from functools import partial
+        apply_drop_prob = partial(set_drop_prob, 0.1)
+        net.apply(apply_drop_prob)
+    """
+    if isinstance(module, DropBlock):
+        module.drop_prob = drop_prob
+
+
+class DropBlockScheduler(object):
+    def __init__(self, net, start_prob, end_prob, num_epochs):
+        self.net = net
+        self.start_prob = start_prob
+        self.end_prob = end_prob
+        self.num_epochs = num_epochs
+
+    def __call__(self, epoch):
+        ratio = self.start_prob + 1.0 * (self.end_prob - self.start_prob) * (epoch + 1) / self.num_epochs
+        assert (ratio >= 0 and ratio <= 1)
+        apply_drop_prob = partial(set_drop_prob, ratio)
+        self.net.apply(apply_drop_prob)
+        self.net.hybridize()
+
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/gluon/model_store.py b/final-project/model_zoo/pytorch_resnest/resnest/gluon/model_store.py
new file mode 100644
index 0000000..effc3f5
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/gluon/model_store.py
@@ -0,0 +1,105 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+"""Model store which provides pretrained models."""
+from __future__ import print_function
+
+__all__ = ['get_model_file', 'purge']
+
+import os
+import zipfile
+
+from ..utils import download, check_sha1
+
+_model_sha1 = {name: checksum for checksum, name in [
+    ('bcfefe1dd1dd1ef5cfed5563123c1490ea37b42e', 'resnest50'),
+    ('5da943b3230f071525a98639945a6b3b3a45ac95', 'resnest101'),
+    ('0c5d117df664ace220aa6fc2922c094bb079d381', 'resnest200'),
+    ('11ae7f5da2bcdbad05ba7e84f9b74383e717f3e3', 'resnest269'),
+    ('5e16dbe56f1fba8e1bc2faddd91f874bfbd74193', 'resnest50_fast_1s1x64d'),
+    ('85eb779a5e313d74b5e5390dae02aa8082a0f469', 'resnest50_fast_2s1x64d'),
+    ('3f215532c6d8e07a10df116309993d4479fc3e4b', 'resnest50_fast_4s1x64d'),
+    ('af3514c2ec757a3a9666a75b82f142ed47d55bee', 'resnest50_fast_1s2x40d'),
+    ('2db13245aa4967cf5e8617cb4911880dd41628a4', 'resnest50_fast_2s2x40d'),
+    ('b24d515797832e02da4da9c8a15effd5e44cfb56', 'resnest50_fast_4s2x40d'),
+    ('7318153ddb5e542a20cc6c58192f3c6209cff9ed', 'resnest50_fast_1s4x24d'),
+    ]}
+
+encoding_repo_url = 'https://github.com/zhanghang1989/ResNeSt/releases/download/weights_step2'
+_url_format = '{repo_url}/{file_name}.zip'
+
+def short_hash(name):
+    if name not in _model_sha1:
+        raise ValueError('Pretrained model for {name} is not available.'.format(name=name))
+    return _model_sha1[name][:8]
+
+def get_model_file(name, root=os.path.join('~', '.encoding', 'models')):
+    r"""Return location for the pretrained on local file system.
+    This function will download from online model zoo when model cannot be found or has mismatch.
+    The root directory will be created if it doesn't exist.
+    Parameters
+    ----------
+    name : str
+        Name of the model.
+    root : str, default '~/.encoding/models'
+        Location for keeping the model parameters.
+    Returns
+    -------
+    file_path
+        Path to the requested pretrained model file.
+    """
+    if name not in _model_sha1:
+        import gluoncv as gcv
+        return gcv.model_zoo.model_store.get_model_file(name, root=root)
+    file_name = '{name}-{short_hash}'.format(name=name, short_hash=short_hash(name))
+    root = os.path.expanduser(root)
+    file_path = os.path.join(root, file_name+'.params')
+    sha1_hash = _model_sha1[name]
+    if os.path.exists(file_path):
+        if check_sha1(file_path, sha1_hash):
+            return file_path
+        else:
+            print('Mismatch in the content of model file {} detected.' +
+                  ' Downloading again.'.format(file_path))
+    else:
+        print('Model file {} is not found. Downloading.'.format(file_path))
+
+    if not os.path.exists(root):
+        os.makedirs(root)
+
+    zip_file_path = os.path.join(root, file_name+'.zip')
+    repo_url = os.environ.get('ENCODING_REPO', encoding_repo_url)
+    if repo_url[-1] != '/':
+        repo_url = repo_url + '/'
+    download(_url_format.format(repo_url=repo_url, file_name=file_name),
+             path=zip_file_path,
+             overwrite=True)
+    with zipfile.ZipFile(zip_file_path) as zf:
+        zf.extractall(root)
+    os.remove(zip_file_path)
+
+    if check_sha1(file_path, sha1_hash):
+        return file_path
+    else:
+        raise ValueError('Downloaded file has different hash. Please try again.')
+
+def purge(root=os.path.join('~', '.encoding', 'models')):
+    r"""Purge all pretrained model files in local file store.
+    Parameters
+    ----------
+    root : str, default '~/.encoding/models'
+        Location for keeping the model parameters.
+    """
+    root = os.path.expanduser(root)
+    files = os.listdir(root)
+    for f in files:
+        if f.endswith(".params"):
+            os.remove(os.path.join(root, f))
+
+def pretrained_model_list():
+    return list(_model_sha1.keys())
+
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/gluon/model_zoo.py b/final-project/model_zoo/pytorch_resnest/resnest/gluon/model_zoo.py
new file mode 100644
index 0000000..df33a2f
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/gluon/model_zoo.py
@@ -0,0 +1,59 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+from .resnest import *
+from .ablation import *
+
+_all__ = ['get_model', 'get_model_list']
+
+models = {
+    'resnest50': resnest50,
+    'resnest101': resnest101,
+    'resnest200': resnest200,
+    'resnest269': resnest269,
+    'resnest50_fast_1s1x64d': resnest50_fast_1s1x64d,
+    'resnest50_fast_2s1x64d': resnest50_fast_2s1x64d,
+    'resnest50_fast_4s1x64d': resnest50_fast_4s1x64d,
+    'resnest50_fast_1s2x40d': resnest50_fast_1s2x40d,
+    'resnest50_fast_2s2x40d': resnest50_fast_2s2x40d,
+    'resnest50_fast_4s2x40d': resnest50_fast_4s2x40d,
+    'resnest50_fast_1s4x24d': resnest50_fast_1s4x24d,
+    }
+
+def get_model(name, **kwargs):
+    """Returns a pre-defined model by name
+    Parameters
+    ----------
+    name : str
+        Name of the model.
+    pretrained : bool
+        Whether to load the pretrained weights for model.
+    root : str, default '~/.encoding/models'
+        Location for keeping the model parameters.
+    Returns
+    -------
+    Module:
+        The model.
+    """
+
+    name = name.lower()
+    if name in models:
+        net = models[name](**kwargs)
+    else:
+        raise ValueError('%s\n\t%s' % (str(name), '\n\t'.join(sorted(models.keys()))))
+    return net
+
+def get_model_list():
+    """Get the entire list of model names in model_zoo.
+    Returns
+    -------
+    list of str
+        Entire list of model names in model_zoo.
+    """
+    return models.keys()
+
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/gluon/resnest.py b/final-project/model_zoo/pytorch_resnest/resnest/gluon/resnest.py
new file mode 100644
index 0000000..60eec62
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/gluon/resnest.py
@@ -0,0 +1,55 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+"""ResNeSt implemented in Gluon."""
+
+__all__ = ['resnest50', 'resnest101',
+           'resnest200', 'resnest269']
+
+from .resnet import ResNet, Bottleneck
+from mxnet import cpu
+
+def resnest50(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 6, 3],
+                      radix=2, cardinality=1, bottleneck_width=64,
+                      deep_stem=True, avg_down=True,
+                      avd=True, avd_first=False,
+                      use_splat=True, dropblock_prob=0.1,
+                      name_prefix='resnest_', **kwargs)
+    if pretrained:
+        from .model_store import get_model_file
+        model.load_parameters(get_model_file('resnest50', root=root), ctx=ctx)
+    return model
+
+def resnest101(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 23, 3],
+                      radix=2, cardinality=1, bottleneck_width=64,
+                      deep_stem=True, avg_down=True, stem_width=64,
+                      avd=True, avd_first=False, use_splat=True, dropblock_prob=0.1,
+                      name_prefix='resnest_', **kwargs)
+    if pretrained:
+        from .model_store import get_model_file
+        model.load_parameters(get_model_file('resnest101', root=root), ctx=ctx)
+    return model
+
+def resnest200(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs):
+    model = ResNet(Bottleneck, [3, 24, 36, 3], deep_stem=True, avg_down=True, stem_width=64,
+                      avd=True, use_splat=True, dropblock_prob=0.1, final_drop=0.2,
+                      name_prefix='resnest_', **kwargs)
+    if pretrained:
+        from .model_store import get_model_file
+        model.load_parameters(get_model_file('resnest200', root=root), ctx=ctx)
+    return model
+
+def resnest269(pretrained=False, root='~/.mxnet/models', ctx=cpu(0), **kwargs):
+    model = ResNet(Bottleneck, [3, 30, 48, 8], deep_stem=True, avg_down=True, stem_width=64,
+                      avd=True, use_splat=True, dropblock_prob=0.1, final_drop=0.2,
+                      name_prefix='resnest_', **kwargs)
+    if pretrained:
+        from .model_store import get_model_file
+        model.load_parameters(get_model_file('resnest269', root=root), ctx=ctx)
+    return model
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/gluon/resnet.py b/final-project/model_zoo/pytorch_resnest/resnest/gluon/resnet.py
new file mode 100644
index 0000000..5e1482e
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/gluon/resnet.py
@@ -0,0 +1,339 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+"""ResNets, implemented in Gluon."""
+# pylint: disable=arguments-differ,unused-argument,missing-docstring
+from __future__ import division
+
+import os
+import math
+from mxnet.context import cpu
+from mxnet.gluon.block import HybridBlock
+from mxnet.gluon import nn
+from mxnet.gluon.nn import BatchNorm
+
+from .dropblock import DropBlock
+from .splat import SplitAttentionConv
+
+__all__ = ['ResNet', 'Bottleneck']
+
+def _update_input_size(input_size, stride):
+    sh, sw = (stride, stride) if isinstance(stride, int) else stride
+    ih, iw = (input_size, input_size) if isinstance(input_size, int) else input_size
+    oh, ow = math.ceil(ih / sh), math.ceil(iw / sw)
+    input_size = (oh, ow)
+    return input_size
+
+class Bottleneck(HybridBlock):
+    """ResNet Bottleneck
+    """
+    # pylint: disable=unused-argument
+    expansion = 4
+    def __init__(self, channels, cardinality=1, bottleneck_width=64, strides=1, dilation=1,
+                 downsample=None, previous_dilation=1, norm_layer=None,
+                 norm_kwargs=None, last_gamma=False,
+                 dropblock_prob=0, input_size=None, use_splat=False,
+                 radix=2, avd=False, avd_first=False, in_channels=None, 
+                 split_drop_ratio=0, **kwargs):
+        super(Bottleneck, self).__init__()
+        group_width = int(channels * (bottleneck_width / 64.)) * cardinality
+        norm_kwargs = norm_kwargs if norm_kwargs is not None else {}
+        self.dropblock_prob = dropblock_prob
+        self.use_splat = use_splat
+        self.avd = avd and (strides > 1 or previous_dilation != dilation)
+        self.avd_first = avd_first
+        if self.dropblock_prob > 0:
+            self.dropblock1 = DropBlock(dropblock_prob, 3, group_width, *input_size)
+            if self.avd:
+                if avd_first:
+                    input_size = _update_input_size(input_size, strides)
+                self.dropblock2 = DropBlock(dropblock_prob, 3, group_width, *input_size)
+                if not avd_first:
+                    input_size = _update_input_size(input_size, strides)
+            else:
+                input_size = _update_input_size(input_size, strides)
+                self.dropblock2 = DropBlock(dropblock_prob, 3, group_width, *input_size)
+            self.dropblock3 = DropBlock(dropblock_prob, 3, channels*4, *input_size)
+        self.conv1 = nn.Conv2D(channels=group_width, kernel_size=1,
+                               use_bias=False, in_channels=in_channels)
+        self.bn1 = norm_layer(in_channels=group_width, **norm_kwargs)
+        self.relu1 = nn.Activation('relu')
+        if self.use_splat:
+            self.conv2 = SplitAttentionConv(channels=group_width, kernel_size=3, strides = 1 if self.avd else strides,
+                                              padding=dilation, dilation=dilation, groups=cardinality, use_bias=False,
+                                              in_channels=group_width, norm_layer=norm_layer, norm_kwargs=norm_kwargs,
+                                              radix=radix, drop_ratio=split_drop_ratio, **kwargs)
+        else:
+            self.conv2 = nn.Conv2D(channels=group_width, kernel_size=3, strides = 1 if self.avd else strides,
+                                   padding=dilation, dilation=dilation, groups=cardinality, use_bias=False,
+                                   in_channels=group_width, **kwargs)
+            self.bn2 = norm_layer(in_channels=group_width, **norm_kwargs)
+            self.relu2 = nn.Activation('relu')
+        self.conv3 = nn.Conv2D(channels=channels*4, kernel_size=1, use_bias=False, in_channels=group_width)
+        if not last_gamma:
+            self.bn3 = norm_layer(in_channels=channels*4, **norm_kwargs)
+        else:
+            self.bn3 = norm_layer(in_channels=channels*4, gamma_initializer='zeros',
+                                  **norm_kwargs)
+        if self.avd:
+            self.avd_layer = nn.AvgPool2D(3, strides, padding=1)
+        self.relu3 = nn.Activation('relu')
+        self.downsample = downsample
+        self.dilation = dilation
+        self.strides = strides
+
+    def hybrid_forward(self, F, x):
+        residual = x
+
+        out = self.conv1(x)
+        out = self.bn1(out)
+        if self.dropblock_prob > 0:
+            out = self.dropblock1(out)
+        out = self.relu1(out)
+
+        if self.avd and self.avd_first:
+            out = self.avd_layer(out)
+
+        if self.use_splat:
+            out = self.conv2(out)
+            if self.dropblock_prob > 0:
+                out = self.dropblock2(out)
+        else:
+            out = self.conv2(out)
+            out = self.bn2(out)
+            if self.dropblock_prob > 0:
+                out = self.dropblock2(out)
+            out = self.relu2(out)
+
+        if self.avd and not self.avd_first:
+            out = self.avd_layer(out)
+
+        out = self.conv3(out)
+        out = self.bn3(out)
+
+        if self.downsample is not None:
+            residual = self.downsample(x)
+
+        if self.dropblock_prob > 0:
+            out = self.dropblock3(out)
+
+        out = out + residual
+        out = self.relu3(out)
+
+        return out
+
+class ResNet(HybridBlock):
+    """ ResNet Variants Definations
+    Parameters
+    ----------
+    block : Block
+        Class for the residual block. Options are BasicBlockV1, BottleneckV1.
+    layers : list of int
+        Numbers of layers in each block
+    classes : int, default 1000
+        Number of classification classes.
+    dilated : bool, default False
+        Applying dilation strategy to pretrained ResNet yielding a stride-8 model,
+        typically used in Semantic Segmentation.
+    norm_layer : object
+        Normalization layer used (default: :class:`mxnet.gluon.nn.BatchNorm`)
+        Can be :class:`mxnet.gluon.nn.BatchNorm` or :class:`mxnet.gluon.contrib.nn.SyncBatchNorm`.
+    last_gamma : bool, default False
+        Whether to initialize the gamma of the last BatchNorm layer in each bottleneck to zero.
+    deep_stem : bool, default False
+        Whether to replace the 7x7 conv1 with 3 3x3 convolution layers.
+    avg_down : bool, default False
+        Whether to use average pooling for projection skip connection between stages/downsample.
+    final_drop : float, default 0.0
+        Dropout ratio before the final classification layer.
+    use_global_stats : bool, default False
+        Whether forcing BatchNorm to use global statistics instead of minibatch statistics;
+        optionally set to True if finetuning using ImageNet classification pretrained models.
+    Reference:
+        - He, Kaiming, et al. "Deep residual learning for image recognition."
+        Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
+        - Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions."
+    """
+    # pylint: disable=unused-variable
+    def __init__(self, block, layers, cardinality=1, bottleneck_width=64,
+                 classes=1000, dilated=False, dilation=1, norm_layer=BatchNorm,
+                 norm_kwargs=None, last_gamma=False, deep_stem=False, stem_width=32,
+                 avg_down=False, final_drop=0.0, use_global_stats=False,
+                 name_prefix='', dropblock_prob=0, input_size=224,
+                 use_splat=False, radix=2, avd=False, avd_first=False, split_drop_ratio=0):
+        self.cardinality = cardinality
+        self.bottleneck_width = bottleneck_width
+        self.inplanes = stem_width*2 if deep_stem else 64
+        self.radix = radix
+        self.split_drop_ratio = split_drop_ratio
+        self.avd_first = avd_first
+        super(ResNet, self).__init__(prefix=name_prefix)
+        norm_kwargs = norm_kwargs if norm_kwargs is not None else {}
+        if use_global_stats:
+            norm_kwargs['use_global_stats'] = True
+        self.norm_kwargs = norm_kwargs
+        with self.name_scope():
+            if not deep_stem:
+                self.conv1 = nn.Conv2D(channels=64, kernel_size=7, strides=2,
+                                       padding=3, use_bias=False, in_channels=3)
+            else:
+                self.conv1 = nn.HybridSequential(prefix='conv1')
+                self.conv1.add(nn.Conv2D(channels=stem_width, kernel_size=3, strides=2,
+                                         padding=1, use_bias=False, in_channels=3))
+                self.conv1.add(norm_layer(in_channels=stem_width, **norm_kwargs))
+                self.conv1.add(nn.Activation('relu'))
+                self.conv1.add(nn.Conv2D(channels=stem_width, kernel_size=3, strides=1,
+                                         padding=1, use_bias=False, in_channels=stem_width))
+                self.conv1.add(norm_layer(in_channels=stem_width, **norm_kwargs))
+                self.conv1.add(nn.Activation('relu'))
+                self.conv1.add(nn.Conv2D(channels=stem_width*2, kernel_size=3, strides=1,
+                                         padding=1, use_bias=False, in_channels=stem_width))
+            input_size = _update_input_size(input_size, 2)
+            self.bn1 = norm_layer(in_channels=64 if not deep_stem else stem_width*2,
+                                  **norm_kwargs)
+            self.relu = nn.Activation('relu')
+            self.maxpool = nn.MaxPool2D(pool_size=3, strides=2, padding=1)
+            input_size = _update_input_size(input_size, 2)
+            self.layer1 = self._make_layer(1, block, 64, layers[0], avg_down=avg_down,
+                                           norm_layer=norm_layer, last_gamma=last_gamma, use_splat=use_splat,
+                                           avd=avd)
+            self.layer2 = self._make_layer(2, block, 128, layers[1], strides=2, avg_down=avg_down,
+                                           norm_layer=norm_layer, last_gamma=last_gamma, use_splat=use_splat,
+                                           avd=avd)
+            input_size = _update_input_size(input_size, 2)
+            if dilated or dilation==4:
+                self.layer3 = self._make_layer(3, block, 256, layers[2], strides=1, dilation=2,
+                                               avg_down=avg_down, norm_layer=norm_layer,
+                                               last_gamma=last_gamma, dropblock_prob=dropblock_prob,
+                                               input_size=input_size, use_splat=use_splat, avd=avd)
+                self.layer4 = self._make_layer(4, block, 512, layers[3], strides=1, dilation=4, pre_dilation=2,
+                                               avg_down=avg_down, norm_layer=norm_layer,
+                                               last_gamma=last_gamma, dropblock_prob=dropblock_prob,
+                                               input_size=input_size, use_splat=use_splat, avd=avd)
+            elif dilation==3:
+                # special
+                self.layer3 = self._make_layer(3, block, 256, layers[2], strides=1, dilation=2,
+                                               avg_down=avg_down, norm_layer=norm_layer,
+                                               last_gamma=last_gamma, dropblock_prob=dropblock_prob,
+                                               input_size=input_size, use_splat=use_splat, avd=avd)
+                self.layer4 = self._make_layer(4, block, 512, layers[3], strides=2, dilation=2, pre_dilation=2,
+                                               avg_down=avg_down, norm_layer=norm_layer,
+                                               last_gamma=last_gamma, dropblock_prob=dropblock_prob,
+                                               input_size=input_size, use_splat=use_splat, avd=avd)
+            elif dilation==2:
+                self.layer3 = self._make_layer(3, block, 256, layers[2], strides=2,
+                                               avg_down=avg_down, norm_layer=norm_layer,
+                                               last_gamma=last_gamma, dropblock_prob=dropblock_prob,
+                                               input_size=input_size, use_splat=use_splat, avd=avd)
+                self.layer4 = self._make_layer(4, block, 512, layers[3], strides=1, dilation=2,
+                                               avg_down=avg_down, norm_layer=norm_layer,
+                                               last_gamma=last_gamma, dropblock_prob=dropblock_prob,
+                                               input_size=input_size, use_splat=use_splat, avd=avd)
+            else:
+                self.layer3 = self._make_layer(3, block, 256, layers[2], strides=2,
+                                               avg_down=avg_down, norm_layer=norm_layer,
+                                               last_gamma=last_gamma, dropblock_prob=dropblock_prob,
+                                               input_size=input_size, use_splat=use_splat, avd=avd)
+                input_size = _update_input_size(input_size, 2)
+                self.layer4 = self._make_layer(4, block, 512, layers[3], strides=2,
+                                               avg_down=avg_down, norm_layer=norm_layer,
+                                               last_gamma=last_gamma, dropblock_prob=dropblock_prob,
+                                               input_size=input_size, use_splat=use_splat, avd=avd)
+                input_size = _update_input_size(input_size, 2)
+            self.avgpool = nn.GlobalAvgPool2D()
+            self.flat = nn.Flatten()
+            self.drop = None
+            if final_drop > 0.0:
+                self.drop = nn.Dropout(final_drop)
+            self.fc = nn.Dense(in_units=512 * block.expansion, units=classes)
+
+    def _make_layer(self, stage_index, block, planes, blocks, strides=1, dilation=1,
+                    pre_dilation=1, avg_down=False, norm_layer=None,
+                    last_gamma=False,
+                    dropblock_prob=0, input_size=224, use_splat=False, avd=False):
+        downsample = None
+        if strides != 1 or self.inplanes != planes * block.expansion:
+            downsample = nn.HybridSequential(prefix='down%d_'%stage_index)
+            with downsample.name_scope():
+                if avg_down:
+                    if pre_dilation == 1:
+                        downsample.add(nn.AvgPool2D(pool_size=strides, strides=strides,
+                                                    ceil_mode=True, count_include_pad=False))
+                    elif strides==1:
+                        downsample.add(nn.AvgPool2D(pool_size=1, strides=1,
+                                                    ceil_mode=True, count_include_pad=False))
+                    else:
+                        downsample.add(nn.AvgPool2D(pool_size=pre_dilation*strides, strides=strides, padding=1,
+                                                    ceil_mode=True, count_include_pad=False))
+                    downsample.add(nn.Conv2D(channels=planes * block.expansion, kernel_size=1,
+                                             strides=1, use_bias=False, in_channels=self.inplanes))
+                    downsample.add(norm_layer(in_channels=planes * block.expansion,
+                                              **self.norm_kwargs))
+                else:
+                    downsample.add(nn.Conv2D(channels=planes * block.expansion,
+                                             kernel_size=1, strides=strides, use_bias=False,
+                                             in_channels=self.inplanes))
+                    downsample.add(norm_layer(in_channels=planes * block.expansion,
+                                              **self.norm_kwargs))
+
+        layers = nn.HybridSequential(prefix='layers%d_'%stage_index)
+        with layers.name_scope():
+            if dilation in (1, 2):
+                layers.add(block(planes, cardinality=self.cardinality,
+                                 bottleneck_width=self.bottleneck_width,
+                                 strides=strides, dilation=pre_dilation,
+                                 downsample=downsample, previous_dilation=dilation,
+                                 norm_layer=norm_layer, norm_kwargs=self.norm_kwargs,
+                                 last_gamma=last_gamma, dropblock_prob=dropblock_prob,
+                                 input_size=input_size, use_splat=use_splat, avd=avd, avd_first=self.avd_first,
+                                 radix=self.radix, in_channels=self.inplanes,
+                                 split_drop_ratio=self.split_drop_ratio))
+            elif dilation == 4:
+                layers.add(block(planes, cardinality=self.cardinality,
+                                 bottleneck_width=self.bottleneck_width,
+                                 strides=strides, dilation=pre_dilation,
+                                 downsample=downsample, previous_dilation=dilation,
+                                 norm_layer=norm_layer, norm_kwargs=self.norm_kwargs,
+                                 last_gamma=last_gamma, dropblock_prob=dropblock_prob,
+                                 input_size=input_size, use_splat=use_splat, avd=avd, avd_first=self.avd_first,
+                                 radix=self.radix, in_channels=self.inplanes,
+                                 split_drop_ratio=self.split_drop_ratio))
+            else:
+                raise RuntimeError("=> unknown dilation size: {}".format(dilation))
+
+            input_size = _update_input_size(input_size, strides)
+            self.inplanes = planes * block.expansion
+            for i in range(1, blocks):
+                layers.add(block(planes, cardinality=self.cardinality,
+                                 bottleneck_width=self.bottleneck_width, dilation=dilation,
+                                 previous_dilation=dilation, norm_layer=norm_layer,
+                                 norm_kwargs=self.norm_kwargs, last_gamma=last_gamma,
+                                 dropblock_prob=dropblock_prob, input_size=input_size,
+                                 use_splat=use_splat, avd=avd, avd_first=self.avd_first,
+                                 radix=self.radix, in_channels=self.inplanes,
+                                 split_drop_ratio=self.split_drop_ratio))
+
+        return layers
+
+    def hybrid_forward(self, F, x):
+        x = self.conv1(x)
+        x = self.bn1(x)
+        x = self.relu(x)
+        x = self.maxpool(x)
+
+        x = self.layer1(x)
+        x = self.layer2(x)
+        x = self.layer3(x)
+        x = self.layer4(x)
+
+        x = self.avgpool(x)
+        x = self.flat(x)
+        if self.drop is not None:
+            x = self.drop(x)
+        x = self.fc(x)
+
+        return x
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/gluon/splat.py b/final-project/model_zoo/pytorch_resnest/resnest/gluon/splat.py
new file mode 100644
index 0000000..54b7d4a
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/gluon/splat.py
@@ -0,0 +1,84 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+import mxnet as mx
+from mxnet.gluon import nn
+from mxnet.gluon.nn import Conv2D, Block, HybridBlock, Dense, BatchNorm, Activation
+
+__all__ = ['SplitAttentionConv']
+
+USE_BN = True
+
+class SplitAttentionConv(HybridBlock):
+    def __init__(self, channels, kernel_size, strides=(1, 1), padding=(0, 0),
+                 dilation=(1, 1), groups=1, radix=2, *args, in_channels=None, r=2,
+                 norm_layer=BatchNorm, norm_kwargs=None, drop_ratio=0, **kwargs):
+        super().__init__()
+        norm_kwargs = norm_kwargs if norm_kwargs is not None else {}
+        inter_channels = max(in_channels*radix//2//r, 32)
+        self.radix = radix
+        self.cardinality = groups
+        self.conv = Conv2D(channels*radix, kernel_size, strides, padding, dilation,
+                           groups=groups*radix, *args, in_channels=in_channels, **kwargs)
+        if USE_BN:
+            self.bn = norm_layer(in_channels=channels*radix, **norm_kwargs)
+        self.relu = Activation('relu')
+        self.fc1 = Conv2D(inter_channels, 1, in_channels=channels, groups=self.cardinality)
+        if USE_BN:
+            self.bn1 = norm_layer(in_channels=inter_channels, **norm_kwargs)
+        self.relu1 = Activation('relu')
+        if drop_ratio > 0:
+            self.drop = nn.Dropout(drop_ratio)
+        else:
+            self.drop = None
+        self.fc2 = Conv2D(channels*radix, 1, in_channels=inter_channels, groups=self.cardinality)
+        self.channels = channels
+        self.rsoftmax = rSoftMax(radix, groups)
+
+    def hybrid_forward(self, F, x):
+        x = self.conv(x)
+        if USE_BN:
+            x = self.bn(x)
+        x = self.relu(x)
+        if self.radix > 1:
+            splited = F.split(x, self.radix, axis=1)
+            gap = sum(splited)
+        else:
+            gap = x
+        gap = F.contrib.AdaptiveAvgPooling2D(gap, 1)
+        gap = self.fc1(gap)
+        if USE_BN:
+            gap = self.bn1(gap)
+        atten = self.relu1(gap)
+        if self.drop:
+            atten = self.drop(atten)
+        atten = self.fc2(atten).reshape((0, self.radix, self.channels))
+        atten = self.rsoftmax(atten).reshape((0, -1, 1, 1))
+        if self.radix > 1:
+            atten = F.split(atten, self.radix, axis=1)
+            outs = [F.broadcast_mul(att, split) for (att, split) in zip(atten, splited)]
+            out = sum(outs)
+        else:
+            out = F.broadcast_mul(atten, x)
+        return out
+
+
+class rSoftMax(nn.HybridBlock):
+    def __init__(self, radix, cardinality):
+        super().__init__()
+        self.radix = radix
+        self.cardinality = cardinality
+
+    def hybrid_forward(self, F, x):
+        if self.radix > 1:
+            x = x.reshape((0, self.cardinality, self.radix, -1)).swapaxes(1, 2)
+            x = F.softmax(x, axis=1)
+            x = x.reshape((0, -1))
+        else:
+            x = F.sigmoid(x)
+        return x
+
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/gluon/transforms.py b/final-project/model_zoo/pytorch_resnest/resnest/gluon/transforms.py
new file mode 100644
index 0000000..c1c7125
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/gluon/transforms.py
@@ -0,0 +1,411 @@
+# code adapted from:
+# https://github.com/kakaobrain/fast-autoaugment
+# https://github.com/rpmcruz/autoaugment
+import math
+import random
+
+import numpy as np
+from collections import defaultdict
+import PIL, PIL.ImageOps, PIL.ImageEnhance, PIL.ImageDraw
+from PIL import Image
+
+random_mirror = True
+
+RESAMPLE_MODE=Image.BICUBIC
+
+def ShearX(img, v):  # [-0.3, 0.3]
+    assert -0.3 <= v <= 0.3
+    if random_mirror and random.random() > 0.5:
+        v = -v
+    return img.transform(img.size, Image.AFFINE, (1, v, 0, 0, 1, 0),
+                         RESAMPLE_MODE)
+
+
+def ShearY(img, v):  # [-0.3, 0.3]
+    assert -0.3 <= v <= 0.3
+    if random_mirror and random.random() > 0.5:
+        v = -v
+    return img.transform(img.size, Image.AFFINE, (1, 0, 0, v, 1, 0),
+                         RESAMPLE_MODE)
+
+
+def TranslateX(img, v):  # [-150, 150] => percentage: [-0.45, 0.45]
+    assert -0.45 <= v <= 0.45
+    if random_mirror and random.random() > 0.5:
+        v = -v
+    v = v * img.size[0]
+    return img.transform(img.size, Image.AFFINE, (1, 0, v, 0, 1, 0),
+                         RESAMPLE_MODE)
+
+
+def TranslateY(img, v):  # [-150, 150] => percentage: [-0.45, 0.45]
+    assert -0.45 <= v <= 0.45
+    if random_mirror and random.random() > 0.5:
+        v = -v
+    v = v * img.size[1]
+    return img.transform(img.size, Image.AFFINE, (1, 0, 0, 0, 1, v),
+                         RESAMPLE_MODE)
+
+
+def TranslateXabs(img, v):  # [-150, 150] => percentage: [-0.45, 0.45]
+    assert 0 <= v
+    if random.random() > 0.5:
+        v = -v
+    return img.transform(img.size, Image.AFFINE, (1, 0, v, 0, 1, 0),
+                         RESAMPLE_MODE)
+
+
+def TranslateYabs(img, v):  # [-150, 150] => percentage: [-0.45, 0.45]
+    assert 0 <= v
+    if random.random() > 0.5:
+        v = -v
+    return img.transform(img.size, Image.AFFINE, (1, 0, 0, 0, 1, v),
+                         RESAMPLE_MODE)
+
+
+def Rotate(img, v):  # [-30, 30]
+    assert -30 <= v <= 30
+    if random_mirror and random.random() > 0.5:
+        v = -v
+    return img.rotate(v)
+
+
+def AutoContrast(img, _):
+    return PIL.ImageOps.autocontrast(img)
+
+
+def Invert(img, _):
+    return PIL.ImageOps.invert(img)
+
+
+def Equalize(img, _):
+    return PIL.ImageOps.equalize(img)
+
+
+def Flip(img, _):  # not from the paper
+    return PIL.ImageOps.mirror(img)
+
+
+def Solarize(img, v):  # [0, 256]
+    assert 0 <= v <= 256
+    return PIL.ImageOps.solarize(img, v)
+
+
+def SolarizeAdd(img, addition=0, threshold=128):
+    img_np = np.array(img).astype(np.int)
+    img_np = img_np + addition
+    img_np = np.clip(img_np, 0, 255)
+    img_np = img_np.astype(np.uint8)
+    img = Image.fromarray(img_np)
+    return PIL.ImageOps.solarize(img, threshold)
+
+
+def Posterize(img, v):  # [4, 8]
+    #assert 4 <= v <= 8
+    v = int(v)
+    return PIL.ImageOps.posterize(img, v)
+
+def Contrast(img, v):  # [0.1,1.9]
+    assert 0.1 <= v <= 1.9
+    return PIL.ImageEnhance.Contrast(img).enhance(v)
+
+
+def Color(img, v):  # [0.1,1.9]
+    assert 0.1 <= v <= 1.9
+    return PIL.ImageEnhance.Color(img).enhance(v)
+
+
+def Brightness(img, v):  # [0.1,1.9]
+    assert 0.1 <= v <= 1.9
+    return PIL.ImageEnhance.Brightness(img).enhance(v)
+
+
+def Sharpness(img, v):  # [0.1,1.9]
+    assert 0.1 <= v <= 1.9
+    return PIL.ImageEnhance.Sharpness(img).enhance(v)
+
+
+def CutoutAbs(img, v):  # [0, 60] => percentage: [0, 0.2]
+    # assert 0 <= v <= 20
+    if v < 0:
+        return img
+    w, h = img.size
+    x0 = np.random.uniform(w)
+    y0 = np.random.uniform(h)
+
+    x0 = int(max(0, x0 - v / 2.))
+    y0 = int(max(0, y0 - v / 2.))
+    x1 = min(w, x0 + v)
+    y1 = min(h, y0 + v)
+
+    xy = (x0, y0, x1, y1)
+    color = (125, 123, 114)
+    # color = (0, 0, 0)
+    img = img.copy()
+    PIL.ImageDraw.Draw(img).rectangle(xy, color)
+    return img
+
+
+def Cutout(img, v):  # [0, 60] => percentage: [0, 0.2]
+    assert 0.0 <= v <= 0.2
+    if v <= 0.:
+        return img
+
+    v = v * img.size[0]
+    return CutoutAbs(img, v)
+
+
+
+def TranslateYAbs(img, v):  # [-150, 150] => percentage: [-0.45, 0.45]
+    assert 0 <= v <= 10
+    if random.random() > 0.5:
+        v = -v
+    return img.transform(img.size, Image.AFFINE, (1, 0, 0, 0, 1, v),
+                         resample=RESAMPLE_MODE)
+
+
+def TranslateXAbs(img, v):  # [-150, 150] => percentage: [-0.45, 0.45]
+    assert 0 <= v <= 10
+    if random.random() > 0.5:
+        v = -v
+    return img.transform(img.size, Image.AFFINE, (1, 0, v, 0, 1, 0),
+                         resample=RESAMPLE_MODE)
+
+
+def Posterize2(img, v):  # [0, 4]
+    assert 0 <= v <= 4
+    v = int(v)
+    return PIL.ImageOps.posterize(img, v)
+
+
+
+def SamplePairing(imgs):  # [0, 0.4]
+    def f(img1, v):
+        i = np.random.choice(len(imgs))
+        img2 = Image.fromarray(imgs[i])
+        return Image.blend(img1, img2, v)
+
+    return f
+
+
+def augment_list(for_autoaug=True):  # 16 oeprations and their ranges
+    l = [
+        (ShearX, -0.3, 0.3),  # 0
+        (ShearY, -0.3, 0.3),  # 1
+        (TranslateX, -0.45, 0.45),  # 2
+        (TranslateY, -0.45, 0.45),  # 3
+        (Rotate, -30, 30),  # 4
+        (AutoContrast, 0, 1),  # 5
+        (Invert, 0, 1),  # 6
+        (Equalize, 0, 1),  # 7
+        (Solarize, 0, 256),  # 8
+        (Posterize, 4, 8),  # 9
+        (Contrast, 0.1, 1.9),  # 10
+        (Color, 0.1, 1.9),  # 11
+        (Brightness, 0.1, 1.9),  # 12
+        (Sharpness, 0.1, 1.9),  # 13
+        (Cutout, 0, 0.2),  # 14
+        # (SamplePairing(imgs), 0, 0.4),  # 15
+    ]
+    if for_autoaug:
+        l += [
+            (CutoutAbs, 0, 20),  # compatible with auto-augment
+            (Posterize2, 0, 4),  # 9
+            (TranslateXAbs, 0, 10),  # 9
+            (TranslateYAbs, 0, 10),  # 9
+        ]
+    return l
+
+
+augment_dict = {fn.__name__: (fn, v1, v2) for fn, v1, v2 in augment_list()}
+
+PARAMETER_MAX = 10
+
+
+def float_parameter(level, maxval):
+    return float(level) * maxval / PARAMETER_MAX
+
+
+def int_parameter(level, maxval):
+    return int(float_parameter(level, maxval))
+
+
+def autoaug2fastaa(f):
+    def autoaug():
+        mapper = defaultdict(lambda: lambda x: x)
+        mapper.update({
+            'ShearX': lambda x: float_parameter(x, 0.3),
+            'ShearY': lambda x: float_parameter(x, 0.3),
+            'TranslateX': lambda x: int_parameter(x, 10),
+            'TranslateY': lambda x: int_parameter(x, 10),
+            'Rotate': lambda x: int_parameter(x, 30),
+            'Solarize': lambda x: 256 - int_parameter(x, 256),
+            'Posterize2': lambda x: 4 - int_parameter(x, 4),
+            'Contrast': lambda x: float_parameter(x, 1.8) + .1,
+            'Color': lambda x: float_parameter(x, 1.8) + .1,
+            'Brightness': lambda x: float_parameter(x, 1.8) + .1,
+            'Sharpness': lambda x: float_parameter(x, 1.8) + .1,
+            'CutoutAbs': lambda x: int_parameter(x, 20)
+        })
+
+        def low_high(name, prev_value):
+            _, low, high = get_augment(name)
+            return float(prev_value - low) / (high - low)
+
+        policies = f()
+        new_policies = []
+        for policy in policies:
+            new_policies.append([(name, pr, low_high(name, mapper[name](level))) for name, pr, level in policy])
+        return new_policies
+
+    return autoaug
+
+
+@autoaug2fastaa
+def autoaug_imagenet_policies():
+    return [
+        [('Posterize2', 0.4, 8), ('Rotate', 0.6, 9)],
+        [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
+        [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
+        [('Posterize2', 0.6, 7), ('Posterize2', 0.6, 6)],
+        [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
+        [('Equalize', 0.4, 4), ('Rotate', 0.8, 8)],
+        [('Solarize', 0.6, 3), ('Equalize', 0.6, 7)],
+        [('Posterize2', 0.8, 5), ('Equalize', 1.0, 2)],
+        [('Rotate', 0.2, 3), ('Solarize', 0.6, 8)],
+        [('Equalize', 0.6, 8), ('Posterize2', 0.4, 6)],
+        [('Rotate', 0.8, 8), ('Color', 0.4, 0)],
+        [('Rotate', 0.4, 9), ('Equalize', 0.6, 2)],
+        [('Equalize', 0.0, 7), ('Equalize', 0.8, 8)],
+        [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
+        [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
+        [('Rotate', 0.8, 8), ('Color', 1.0, 0)],
+        [('Color', 0.8, 8), ('Solarize', 0.8, 7)],
+        [('Sharpness', 0.4, 7), ('Invert', 0.6, 8)],
+        [('ShearX', 0.6, 5), ('Equalize', 1.0, 9)],
+        [('Color', 0.4, 0), ('Equalize', 0.6, 3)],
+        [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
+        [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
+        [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
+        [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
+        [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
+    ]
+
+
+def get_augment(name):
+    return augment_dict[name]
+
+
+def apply_augment(img, name, level):
+    augment_fn, low, high = get_augment(name)
+    return augment_fn(img.copy(), level * (high - low) + low)
+
+
+def rand_augment_list():  # 16 oeprations and their ranges
+    l = [
+        (AutoContrast, 0, 1),
+        (Equalize, 0, 1),
+        (Invert, 0, 1),
+        (Rotate, 0, 30),
+        (Posterize, 0, 4),
+        (Solarize, 0, 256),
+        (SolarizeAdd, 0, 110),
+        (Color, 0.1, 1.9),
+        (Contrast, 0.1, 1.9),
+        (Brightness, 0.1, 1.9),
+        (Sharpness, 0.1, 1.9),
+        (ShearX, 0., 0.3),
+        (ShearY, 0., 0.3),
+        (CutoutAbs, 0, 40),
+        (TranslateXabs, 0., 100),
+        (TranslateYabs, 0., 100),
+    ]
+
+    return l
+
+
+
+class ERandomCrop:
+    # pylint: disable=misplaced-comparison-constant
+    def __init__(self, imgsize, min_covered=0.1, aspect_ratio_range=(3./4, 4./3),
+                 area_range=(0.1, 1.0), max_attempts=10):
+        assert 0.0 < min_covered
+        assert 0 < aspect_ratio_range[0] <= aspect_ratio_range[1]
+        assert 0 < area_range[0] <= area_range[1]
+        assert 1 <= max_attempts
+
+        self.min_covered = min_covered
+        self.aspect_ratio_range = aspect_ratio_range
+        self.area_range = area_range
+        self.max_attempts = max_attempts
+        self._fallback = ECenterCrop(imgsize)
+
+    def __call__(self, img):
+        # https://github.com/tensorflow/tensorflow/blob/9274bcebb31322370139467039034f8ff852b004/tensorflow/core/kernels/sample_distorted_bounding_box_op.cc#L111
+        original_width, original_height = img.size
+        min_area = self.area_range[0] * (original_width * original_height)
+        max_area = self.area_range[1] * (original_width * original_height)
+
+        for _ in range(self.max_attempts):
+            aspect_ratio = random.uniform(*self.aspect_ratio_range)
+            height = int(round(math.sqrt(min_area / aspect_ratio)))
+            max_height = int(round(math.sqrt(max_area / aspect_ratio)))
+
+            if max_height * aspect_ratio > original_width:
+                max_height = (original_width + 0.5 - 1e-7) / aspect_ratio
+                max_height = int(max_height)
+                if max_height * aspect_ratio > original_width:
+                    max_height -= 1
+
+            if max_height > original_height:
+                max_height = original_height
+
+            if height >= max_height:
+                height = max_height
+
+            height = int(round(random.uniform(height, max_height)))
+            width = int(round(height * aspect_ratio))
+            area = width * height
+
+            if area < min_area or area > max_area:
+                continue
+            if width > original_width or height > original_height:
+                continue
+            if area < self.min_covered * (original_width * original_height):
+                continue
+            if width == original_width and height == original_height:
+                return self._fallback(img)
+
+            x = random.randint(0, original_width - width)
+            y = random.randint(0, original_height - height)
+            return img.crop((x, y, x + width, y + height))
+
+        return self._fallback(img)
+
+class ECenterCrop:
+    """Crop the given PIL Image and resize it to desired size.
+    Args:
+        img (PIL Image): Image to be cropped. (0,0) denotes the top left corner of the image.
+        output_size (sequence or int): (height, width) of the crop box. If int,
+            it is used for both directions
+    Returns:
+        PIL Image: Cropped image.
+    """
+    def __init__(self, imgsize):
+        self.imgsize = imgsize
+        import torchvision.transforms as pth_transforms
+        self.resize_method = pth_transforms.Resize((imgsize, imgsize), interpolation=RESAMPLE_MODE)
+
+    def __call__(self, img):
+        image_width, image_height = img.size
+        image_short = min(image_width, image_height)
+
+        crop_size = float(self.imgsize) / (self.imgsize + 32) * image_short
+
+        crop_height, crop_width = crop_size, crop_size
+        crop_top = int(round((image_height - crop_height) / 2.))
+        crop_left = int(round((image_width - crop_width) / 2.))
+        img = img.crop((crop_left, crop_top, crop_left + crop_width, crop_top + crop_height))
+        return self.resize_method(img)
+
+
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/torch/__init__.py b/final-project/model_zoo/pytorch_resnest/resnest/torch/__init__.py
new file mode 100644
index 0000000..aed4fa3
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/torch/__init__.py
@@ -0,0 +1 @@
+from .models import *
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/torch/config.py b/final-project/model_zoo/pytorch_resnest/resnest/torch/config.py
new file mode 100644
index 0000000..50590cc
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/torch/config.py
@@ -0,0 +1,53 @@
+import os
+from fvcore.common.config import CfgNode as _CfgNode
+from .utils import PathManager
+
+class CN(_CfgNode):
+    @classmethod
+    def _open_cfg(cls, filename):
+        return PathManager.open(filename, "r")
+
+CfgNode = CN
+
+_C = CN()
+
+_C.SEED = 1
+
+## data related
+_C.DATA = CN()
+_C.DATA.DATASET = 'ImageNet'
+# assuming you've set up the dataset using provided script
+_C.DATA.ROOT = os.path.expanduser('~/.encoding/data/ILSVRC2012')
+_C.DATA.BASE_SIZE = None
+_C.DATA.CROP_SIZE = 224
+_C.DATA.LABEL_SMOOTHING = 0.0
+_C.DATA.MIXUP = 0.0
+_C.DATA.RAND_AUG = False
+
+## model related
+_C.MODEL = CN()
+_C.MODEL.NAME = 'resnet50'
+_C.MODEL.FINAL_DROP = False
+
+## training params 
+_C.TRAINING = CN()
+# (per-gpu batch size)
+_C.TRAINING.BATCH_SIZE = 64
+_C.TRAINING.TEST_BATCH_SIZE = 256
+_C.TRAINING.LAST_GAMMA = False
+_C.TRAINING.EPOCHS = 120
+_C.TRAINING.START_EPOCHS = 0
+_C.TRAINING.WORKERS = 4
+
+## optimizer params
+_C.OPTIMIZER = CN()
+# (per-gpu lr)
+_C.OPTIMIZER.LR = 0.025
+_C.OPTIMIZER.LR_SCHEDULER = 'cos'
+_C.OPTIMIZER.MOMENTUM = 0.9
+_C.OPTIMIZER.WEIGHT_DECAY = 1e-4
+_C.OPTIMIZER.DISABLE_BN_WD = False
+_C.OPTIMIZER.WARMUP_EPOCHS = 0
+
+def get_cfg() -> CN:
+    return _C.clone()
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/torch/datasets/__init__.py b/final-project/model_zoo/pytorch_resnest/resnest/torch/datasets/__init__.py
new file mode 100644
index 0000000..ec907ff
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/torch/datasets/__init__.py
@@ -0,0 +1,2 @@
+from .build import get_dataset, RESNEST_DATASETS_REGISTRY
+from .imagenet import ImageNet
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/torch/datasets/build.py b/final-project/model_zoo/pytorch_resnest/resnest/torch/datasets/build.py
new file mode 100644
index 0000000..f00936d
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/torch/datasets/build.py
@@ -0,0 +1,6 @@
+from fvcore.common.registry import Registry
+
+RESNEST_DATASETS_REGISTRY = Registry('RESNEST_DATASETS')
+
+def get_dataset(dataset_name):
+    return RESNEST_DATASETS_REGISTRY.get(dataset_name)
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/torch/datasets/imagenet.py b/final-project/model_zoo/pytorch_resnest/resnest/torch/datasets/imagenet.py
new file mode 100644
index 0000000..bb42b36
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/torch/datasets/imagenet.py
@@ -0,0 +1,25 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2018
+##
+## This source code is licensed under the MIT-style license found in the
+## LICENSE file in the root directory of this source tree
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+import os
+import torchvision.transforms as transforms
+import torchvision.datasets as datasets
+
+import warnings
+warnings.filterwarnings("ignore", "(Possibly )?corrupt EXIF data", UserWarning)
+
+from .build import RESNEST_DATASETS_REGISTRY
+
+@RESNEST_DATASETS_REGISTRY.register()
+class ImageNet(datasets.ImageFolder):
+    def __init__(self, root=os.path.expanduser('~/.encoding/data/ILSVRC2012'), transform=None,
+                 target_transform=None, train=True, **kwargs):
+        split='train' if train == True else 'val'
+        root = os.path.join(root, split)
+        super().__init__(root, transform, target_transform)
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/torch/loss.py b/final-project/model_zoo/pytorch_resnest/resnest/torch/loss.py
new file mode 100644
index 0000000..8c31655
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/torch/loss.py
@@ -0,0 +1,63 @@
+import torch
+import torch.nn.functional as F
+import torch.nn as nn
+from torch.autograd import Variable
+from resnest.torch.utils import MixUpWrapper
+
+__all__ = ['LabelSmoothing', 'NLLMultiLabelSmooth', 'get_criterion']
+
+def get_criterion(cfg, train_loader, gpu):
+    if cfg.DATA.MIXUP > 0:
+        train_loader = MixUpWrapper(cfg.DATA.MIXUP, 1000, train_loader, gpu)
+        criterion = NLLMultiLabelSmooth(cfg.DATA.LABEL_SMOOTHING)
+    elif cfg.DATA.LABEL_SMOOTHING > 0.0:
+        criterion = LabelSmoothing(cfg.DATA.LABEL_SMOOTHING)
+    else:
+        criterion = torch.nn.CrossEntropyLoss()
+    return criterion, train_loader
+
+class LabelSmoothing(nn.Module):
+    """
+    NLL loss with label smoothing.
+    """
+    def __init__(self, smoothing=0.1):
+        """
+        Constructor for the LabelSmoothing module.
+        :param smoothing: label smoothing factor
+        """
+        super(LabelSmoothing, self).__init__()
+        self.confidence = 1.0 - smoothing
+        self.smoothing = smoothing
+
+    def forward(self, x, target):
+        logprobs = torch.nn.functional.log_softmax(x, dim=-1)
+
+        nll_loss = -logprobs.gather(dim=-1, index=target.unsqueeze(1))
+        nll_loss = nll_loss.squeeze(1)
+        smooth_loss = -logprobs.mean(dim=-1)
+        loss = self.confidence * nll_loss + self.smoothing * smooth_loss
+        return loss.mean()
+
+class NLLMultiLabelSmooth(nn.Module):
+    def __init__(self, smoothing = 0.1):
+        super(NLLMultiLabelSmooth, self).__init__()
+        self.confidence = 1.0 - smoothing
+        self.smoothing = smoothing
+
+    def forward(self, x, target):
+        if self.training:
+            x = x.float()
+            target = target.float()
+            logprobs = torch.nn.functional.log_softmax(x, dim = -1)
+    
+            nll_loss = -logprobs * target
+            nll_loss = nll_loss.sum(-1)
+    
+            smooth_loss = -logprobs.mean(dim=-1)
+    
+            loss = self.confidence * nll_loss + self.smoothing * smooth_loss
+    
+            return loss.mean()
+        else:
+            return torch.nn.functional.cross_entropy(x, target)
+
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/torch/models/__init__.py b/final-project/model_zoo/pytorch_resnest/resnest/torch/models/__init__.py
new file mode 100644
index 0000000..2acf216
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/torch/models/__init__.py
@@ -0,0 +1,2 @@
+from .resnest import *
+from .ablation import *
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/torch/models/ablation.py b/final-project/model_zoo/pytorch_resnest/resnest/torch/models/ablation.py
new file mode 100644
index 0000000..f91fe6a
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/torch/models/ablation.py
@@ -0,0 +1,106 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+"""ResNeSt ablation study models"""
+
+import torch
+from .resnet import ResNet, Bottleneck
+
+__all__ = ['resnest50_fast_1s1x64d', 'resnest50_fast_2s1x64d', 'resnest50_fast_4s1x64d',
+           'resnest50_fast_1s2x40d', 'resnest50_fast_2s2x40d', 'resnest50_fast_4s2x40d',
+           'resnest50_fast_1s4x24d']
+
+_url_format = 'https://github.com/zhanghang1989/ResNeSt/releases/download/weights_step1/{}-{}.pth'
+
+_model_sha256 = {name: checksum for checksum, name in [
+    ('d8fbf808', 'resnest50_fast_1s1x64d'),
+    ('44938639', 'resnest50_fast_2s1x64d'),
+    ('f74f3fc3', 'resnest50_fast_4s1x64d'),
+    ('32830b84', 'resnest50_fast_1s2x40d'),
+    ('9d126481', 'resnest50_fast_2s2x40d'),
+    ('41d14ed0', 'resnest50_fast_4s2x40d'),
+    ('d4a4f76f', 'resnest50_fast_1s4x24d'),
+    ]}
+
+def short_hash(name):
+    if name not in _model_sha256:
+        raise ValueError('Pretrained model for {name} is not available.'.format(name=name))
+    return _model_sha256[name][:8]
+
+resnest_model_urls = {name: _url_format.format(name, short_hash(name)) for
+    name in _model_sha256.keys()
+}
+
+def resnest50_fast_1s1x64d(pretrained=False, root='~/.encoding/models', **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 6, 3],
+                   radix=1, groups=1, bottleneck_width=64,
+                   deep_stem=True, stem_width=32, avg_down=True,
+                   avd=True, avd_first=True, **kwargs)
+    if pretrained:
+        model.load_state_dict(torch.hub.load_state_dict_from_url(
+            resnest_model_urls['resnest50_fast_1s1x64d'], progress=True, check_hash=True))
+    return model
+
+def resnest50_fast_2s1x64d(pretrained=False, root='~/.encoding/models', **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 6, 3],
+                   radix=2, groups=1, bottleneck_width=64,
+                   deep_stem=True, stem_width=32, avg_down=True,
+                   avd=True, avd_first=True, **kwargs)
+    if pretrained:
+        model.load_state_dict(torch.hub.load_state_dict_from_url(
+            resnest_model_urls['resnest50_fast_2s1x64d'], progress=True, check_hash=True))
+    return model
+
+def resnest50_fast_4s1x64d(pretrained=False, root='~/.encoding/models', **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 6, 3],
+                   radix=4, groups=1, bottleneck_width=64,
+                   deep_stem=True, stem_width=32, avg_down=True,
+                   avd=True, avd_first=True, **kwargs)
+    if pretrained:
+        model.load_state_dict(torch.hub.load_state_dict_from_url(
+            resnest_model_urls['resnest50_fast_4s1x64d'], progress=True, check_hash=True))
+    return model
+
+def resnest50_fast_1s2x40d(pretrained=False, root='~/.encoding/models', **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 6, 3],
+                   radix=1, groups=2, bottleneck_width=40,
+                   deep_stem=True, stem_width=32, avg_down=True,
+                   avd=True, avd_first=True, **kwargs)
+    if pretrained:
+        model.load_state_dict(torch.hub.load_state_dict_from_url(
+            resnest_model_urls['resnest50_fast_1s2x40d'], progress=True, check_hash=True))
+    return model
+
+def resnest50_fast_2s2x40d(pretrained=False, root='~/.encoding/models', **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 6, 3],
+                   radix=2, groups=2, bottleneck_width=40,
+                   deep_stem=True, stem_width=32, avg_down=True,
+                   avd=True, avd_first=True, **kwargs)
+    if pretrained:
+        model.load_state_dict(torch.hub.load_state_dict_from_url(
+            resnest_model_urls['resnest50_fast_2s2x40d'], progress=True, check_hash=True))
+    return model
+
+def resnest50_fast_4s2x40d(pretrained=False, root='~/.encoding/models', **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 6, 3],
+                   radix=4, groups=2, bottleneck_width=40,
+                   deep_stem=True, stem_width=32, avg_down=True,
+                   avd=True, avd_first=True, **kwargs)
+    if pretrained:
+        model.load_state_dict(torch.hub.load_state_dict_from_url(
+            resnest_model_urls['resnest50_fast_4s2x40d'], progress=True, check_hash=True))
+    return model
+
+def resnest50_fast_1s4x24d(pretrained=False, root='~/.encoding/models', **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 6, 3],
+                   radix=1, groups=4, bottleneck_width=24,
+                   deep_stem=True, stem_width=32, avg_down=True,
+                   avd=True, avd_first=True, **kwargs)
+    if pretrained:
+        model.load_state_dict(torch.hub.load_state_dict_from_url(
+            resnest_model_urls['resnest50_fast_1s4x24d'], progress=True, check_hash=True))
+    return model
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/torch/models/build.py b/final-project/model_zoo/pytorch_resnest/resnest/torch/models/build.py
new file mode 100644
index 0000000..26e7239
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/torch/models/build.py
@@ -0,0 +1,6 @@
+from fvcore.common.registry import Registry
+
+RESNEST_MODELS_REGISTRY = Registry('RESNEST_MODELS')
+
+def get_model(model_name):
+    return RESNEST_MODELS_REGISTRY.get(model_name)
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/torch/models/resnest.py b/final-project/model_zoo/pytorch_resnest/resnest/torch/models/resnest.py
new file mode 100644
index 0000000..6308a21
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/torch/models/resnest.py
@@ -0,0 +1,76 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+"""ResNeSt models"""
+
+import torch
+from .resnet import ResNet, Bottleneck
+
+__all__ = ['resnest50', 'resnest101', 'resnest200', 'resnest269']
+from .build import RESNEST_MODELS_REGISTRY
+
+_url_format = 'https://github.com/zhanghang1989/ResNeSt/releases/download/weights_step1/{}-{}.pth'
+
+_model_sha256 = {name: checksum for checksum, name in [
+    ('528c19ca', 'resnest50'),
+    ('22405ba7', 'resnest101'),
+    ('75117900', 'resnest200'),
+    ('0cc87c48', 'resnest269'),
+    ]}
+
+def short_hash(name):
+    if name not in _model_sha256:
+        raise ValueError('Pretrained model for {name} is not available.'.format(name=name))
+    return _model_sha256[name][:8]
+
+resnest_model_urls = {name: _url_format.format(name, short_hash(name)) for
+    name in _model_sha256.keys()
+}
+
+@RESNEST_MODELS_REGISTRY.register()
+def resnest50(pretrained=False, root='~/.encoding/models', **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 6, 3],
+                   radix=2, groups=1, bottleneck_width=64,
+                   deep_stem=True, stem_width=32, avg_down=True,
+                   avd=True, avd_first=False, **kwargs)
+    if pretrained:
+        model.load_state_dict(torch.hub.load_state_dict_from_url(
+            resnest_model_urls['resnest50'], progress=True, check_hash=True))
+    return model
+
+@RESNEST_MODELS_REGISTRY.register()
+def resnest101(pretrained=False, root='~/.encoding/models', **kwargs):
+    model = ResNet(Bottleneck, [3, 4, 23, 3],
+                   radix=2, groups=1, bottleneck_width=64,
+                   deep_stem=True, stem_width=64, avg_down=True,
+                   avd=True, avd_first=False, **kwargs)
+    if pretrained:
+        model.load_state_dict(torch.hub.load_state_dict_from_url(
+            resnest_model_urls['resnest101'], progress=True, check_hash=True))
+    return model
+
+@RESNEST_MODELS_REGISTRY.register()
+def resnest200(pretrained=False, root='~/.encoding/models', **kwargs):
+    model = ResNet(Bottleneck, [3, 24, 36, 3],
+                   radix=2, groups=1, bottleneck_width=64,
+                   deep_stem=True, stem_width=64, avg_down=True,
+                   avd=True, avd_first=False, **kwargs)
+    if pretrained:
+        model.load_state_dict(torch.hub.load_state_dict_from_url(
+            resnest_model_urls['resnest200'], progress=True, check_hash=True))
+    return model
+
+@RESNEST_MODELS_REGISTRY.register()
+def resnest269(pretrained=False, root='~/.encoding/models', **kwargs):
+    model = ResNet(Bottleneck, [3, 30, 48, 8],
+                   radix=2, groups=1, bottleneck_width=64,
+                   deep_stem=True, stem_width=64, avg_down=True,
+                   avd=True, avd_first=False, **kwargs)
+    if pretrained:
+        model.load_state_dict(torch.hub.load_state_dict_from_url(
+            resnest_model_urls['resnest269'], progress=True, check_hash=True))
+    return model
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/torch/models/resnet.py b/final-project/model_zoo/pytorch_resnest/resnest/torch/models/resnet.py
new file mode 100644
index 0000000..de4ef50
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/torch/models/resnet.py
@@ -0,0 +1,354 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+"""ResNet variants"""
+import math
+import torch
+import torch.nn as nn
+
+from .splat import SplAtConv2d, DropBlock2D
+from .build import RESNEST_MODELS_REGISTRY
+
+__all__ = ['ResNet', 'Bottleneck']
+
+_url_format = 'https://s3.us-west-1.wasabisys.com/resnest/torch/{}-{}.pth'
+
+_model_sha256 = {name: checksum for checksum, name in [
+    ]}
+
+
+def short_hash(name):
+    if name not in _model_sha256:
+        raise ValueError('Pretrained model for {name} is not available.'.format(name=name))
+    return _model_sha256[name][:8]
+
+resnest_model_urls = {name: _url_format.format(name, short_hash(name)) for
+    name in _model_sha256.keys()
+}
+
+class GlobalAvgPool2d(nn.Module):
+    def __init__(self):
+        """Global average pooling over the input's spatial dimensions"""
+        super(GlobalAvgPool2d, self).__init__()
+
+    def forward(self, inputs):
+        return nn.functional.adaptive_avg_pool2d(inputs, 1).view(inputs.size(0), -1)
+
+class Bottleneck(nn.Module):
+    """ResNet Bottleneck
+    """
+    # pylint: disable=unused-argument
+    expansion = 4
+    def __init__(self, inplanes, planes, stride=1, downsample=None,
+                 radix=1, cardinality=1, bottleneck_width=64,
+                 avd=False, avd_first=False, dilation=1, is_first=False,
+                 rectified_conv=False, rectify_avg=False,
+                 norm_layer=None, dropblock_prob=0.0, last_gamma=False):
+        super(Bottleneck, self).__init__()
+        group_width = int(planes * (bottleneck_width / 64.)) * cardinality
+        self.conv1 = nn.Conv2d(inplanes, group_width, kernel_size=1, bias=False)
+        self.bn1 = norm_layer(group_width)
+        self.dropblock_prob = dropblock_prob
+        self.radix = radix
+        self.avd = avd and (stride > 1 or is_first)
+        self.avd_first = avd_first
+
+        if self.avd:
+            self.avd_layer = nn.AvgPool2d(3, stride, padding=1)
+            stride = 1
+
+        if dropblock_prob > 0.0:
+            self.dropblock1 = DropBlock2D(dropblock_prob, 3)
+            if radix == 1:
+                self.dropblock2 = DropBlock2D(dropblock_prob, 3)
+            self.dropblock3 = DropBlock2D(dropblock_prob, 3)
+
+        if radix >= 1:
+            self.conv2 = SplAtConv2d(
+                group_width, group_width, kernel_size=3,
+                stride=stride, padding=dilation,
+                dilation=dilation, groups=cardinality, bias=False,
+                radix=radix, rectify=rectified_conv,
+                rectify_avg=rectify_avg,
+                norm_layer=norm_layer,
+                dropblock_prob=dropblock_prob)
+        elif rectified_conv:
+            from rfconv import RFConv2d
+            self.conv2 = RFConv2d(
+                group_width, group_width, kernel_size=3, stride=stride,
+                padding=dilation, dilation=dilation,
+                groups=cardinality, bias=False,
+                average_mode=rectify_avg)
+            self.bn2 = norm_layer(group_width)
+        else:
+            self.conv2 = nn.Conv2d(
+                group_width, group_width, kernel_size=3, stride=stride,
+                padding=dilation, dilation=dilation,
+                groups=cardinality, bias=False)
+            self.bn2 = norm_layer(group_width)
+
+        self.conv3 = nn.Conv2d(
+            group_width, planes * 4, kernel_size=1, bias=False)
+        self.bn3 = norm_layer(planes*4)
+
+        if last_gamma:
+            from torch.nn.init import zeros_
+            zeros_(self.bn3.weight)
+        self.relu = nn.ReLU(inplace=True)
+        self.downsample = downsample
+        self.dilation = dilation
+        self.stride = stride
+
+    def forward(self, x):
+        residual = x
+
+        out = self.conv1(x)
+        out = self.bn1(out)
+        if self.dropblock_prob > 0.0:
+            out = self.dropblock1(out)
+        out = self.relu(out)
+
+        if self.avd and self.avd_first:
+            out = self.avd_layer(out)
+
+        out = self.conv2(out)
+        if self.radix == 0:
+            out = self.bn2(out)
+            if self.dropblock_prob > 0.0:
+                out = self.dropblock2(out)
+            out = self.relu(out)
+
+        if self.avd and not self.avd_first:
+            out = self.avd_layer(out)
+
+        out = self.conv3(out)
+        out = self.bn3(out)
+        if self.dropblock_prob > 0.0:
+            out = self.dropblock3(out)
+
+        if self.downsample is not None:
+            residual = self.downsample(x)
+
+        out += residual
+        out = self.relu(out)
+
+        return out
+
+class ResNet(nn.Module):
+    """ResNet Variants
+
+    Parameters
+    ----------
+    block : Block
+        Class for the residual block. Options are BasicBlockV1, BottleneckV1.
+    layers : list of int
+        Numbers of layers in each block
+    classes : int, default 1000
+        Number of classification classes.
+    dilated : bool, default False
+        Applying dilation strategy to pretrained ResNet yielding a stride-8 model,
+        typically used in Semantic Segmentation.
+    norm_layer : object
+        Normalization layer used in backbone network (default: :class:`mxnet.gluon.nn.BatchNorm`;
+        for Synchronized Cross-GPU BachNormalization).
+
+    Reference:
+
+        - He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
+
+        - Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions."
+    """
+    # pylint: disable=unused-variable
+    def __init__(self, block, layers, radix=1, groups=1, bottleneck_width=64,
+                 num_classes=1000, dilated=False, dilation=1,
+                 deep_stem=False, stem_width=64, avg_down=False,
+                 rectified_conv=False, rectify_avg=False,
+                 avd=False, avd_first=False,
+                 final_drop=0.0, dropblock_prob=0,
+                 last_gamma=False, norm_layer=nn.BatchNorm2d):
+        self.cardinality = groups
+        self.bottleneck_width = bottleneck_width
+        # ResNet-D params
+        self.inplanes = stem_width*2 if deep_stem else 64
+        self.avg_down = avg_down
+        self.last_gamma = last_gamma
+        # ResNeSt params
+        self.radix = radix
+        self.avd = avd
+        self.avd_first = avd_first
+
+        super(ResNet, self).__init__()
+        self.rectified_conv = rectified_conv
+        self.rectify_avg = rectify_avg
+        if rectified_conv:
+            from rfconv import RFConv2d
+            conv_layer = RFConv2d
+        else:
+            conv_layer = nn.Conv2d
+        conv_kwargs = {'average_mode': rectify_avg} if rectified_conv else {}
+        if deep_stem:
+            self.conv1 = nn.Sequential(
+                conv_layer(3, stem_width, kernel_size=3, stride=2, padding=1, bias=False, **conv_kwargs),
+                norm_layer(stem_width),
+                nn.ReLU(inplace=True),
+                conv_layer(stem_width, stem_width, kernel_size=3, stride=1, padding=1, bias=False, **conv_kwargs),
+                norm_layer(stem_width),
+                nn.ReLU(inplace=True),
+                conv_layer(stem_width, stem_width*2, kernel_size=3, stride=1, padding=1, bias=False, **conv_kwargs),
+            )
+        else:
+            self.conv1 = conv_layer(3, 64, kernel_size=7, stride=2, padding=3,
+                                   bias=False, **conv_kwargs)
+        self.bn1 = norm_layer(self.inplanes)
+        self.relu = nn.ReLU(inplace=True)
+        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
+        self.layer1 = self._make_layer(block, 64, layers[0], norm_layer=norm_layer, is_first=False)
+        self.layer2 = self._make_layer(block, 128, layers[1], stride=2, norm_layer=norm_layer)
+        if dilated or dilation == 4:
+            self.layer3 = self._make_layer(block, 256, layers[2], stride=1,
+                                           dilation=2, norm_layer=norm_layer,
+                                           dropblock_prob=dropblock_prob)
+            self.layer4 = self._make_layer(block, 512, layers[3], stride=1,
+                                           dilation=4, norm_layer=norm_layer,
+                                           dropblock_prob=dropblock_prob)
+        elif dilation==2:
+            self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
+                                           dilation=1, norm_layer=norm_layer,
+                                           dropblock_prob=dropblock_prob)
+            self.layer4 = self._make_layer(block, 512, layers[3], stride=1,
+                                           dilation=2, norm_layer=norm_layer,
+                                           dropblock_prob=dropblock_prob)
+        else:
+            self.layer3 = self._make_layer(block, 256, layers[2], stride=2,
+                                           norm_layer=norm_layer,
+                                           dropblock_prob=dropblock_prob)
+            self.layer4 = self._make_layer(block, 512, layers[3], stride=2,
+                                           norm_layer=norm_layer,
+                                           dropblock_prob=dropblock_prob)
+        self.avgpool = GlobalAvgPool2d()
+        self.drop = nn.Dropout(final_drop) if final_drop > 0.0 else None
+        self.fc = nn.Linear(512 * block.expansion, num_classes)
+
+        for m in self.modules():
+            if isinstance(m, nn.Conv2d):
+                n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
+                m.weight.data.normal_(0, math.sqrt(2. / n))
+            elif isinstance(m, norm_layer):
+                m.weight.data.fill_(1)
+                m.bias.data.zero_()
+
+    def _make_layer(self, block, planes, blocks, stride=1, dilation=1, norm_layer=None,
+                    dropblock_prob=0.0, is_first=True):
+        downsample = None
+        if stride != 1 or self.inplanes != planes * block.expansion:
+            down_layers = []
+            if self.avg_down:
+                if dilation == 1:
+                    down_layers.append(nn.AvgPool2d(kernel_size=stride, stride=stride,
+                                                    ceil_mode=True, count_include_pad=False))
+                else:
+                    down_layers.append(nn.AvgPool2d(kernel_size=1, stride=1,
+                                                    ceil_mode=True, count_include_pad=False))
+                down_layers.append(nn.Conv2d(self.inplanes, planes * block.expansion,
+                                             kernel_size=1, stride=1, bias=False))
+            else:
+                down_layers.append(nn.Conv2d(self.inplanes, planes * block.expansion,
+                                             kernel_size=1, stride=stride, bias=False))
+            down_layers.append(norm_layer(planes * block.expansion))
+            downsample = nn.Sequential(*down_layers)
+
+        layers = []
+        if dilation == 1 or dilation == 2:
+            layers.append(block(self.inplanes, planes, stride, downsample=downsample,
+                                radix=self.radix, cardinality=self.cardinality,
+                                bottleneck_width=self.bottleneck_width,
+                                avd=self.avd, avd_first=self.avd_first,
+                                dilation=1, is_first=is_first, rectified_conv=self.rectified_conv,
+                                rectify_avg=self.rectify_avg,
+                                norm_layer=norm_layer, dropblock_prob=dropblock_prob,
+                                last_gamma=self.last_gamma))
+        elif dilation == 4:
+            layers.append(block(self.inplanes, planes, stride, downsample=downsample,
+                                radix=self.radix, cardinality=self.cardinality,
+                                bottleneck_width=self.bottleneck_width,
+                                avd=self.avd, avd_first=self.avd_first,
+                                dilation=2, is_first=is_first, rectified_conv=self.rectified_conv,
+                                rectify_avg=self.rectify_avg,
+                                norm_layer=norm_layer, dropblock_prob=dropblock_prob,
+                                last_gamma=self.last_gamma))
+        else:
+            raise RuntimeError("=> unknown dilation size: {}".format(dilation))
+
+        self.inplanes = planes * block.expansion
+        for i in range(1, blocks):
+            layers.append(block(self.inplanes, planes,
+                                radix=self.radix, cardinality=self.cardinality,
+                                bottleneck_width=self.bottleneck_width,
+                                avd=self.avd, avd_first=self.avd_first,
+                                dilation=dilation, rectified_conv=self.rectified_conv,
+                                rectify_avg=self.rectify_avg,
+                                norm_layer=norm_layer, dropblock_prob=dropblock_prob,
+                                last_gamma=self.last_gamma))
+
+        return nn.Sequential(*layers)
+
+    def forward(self, x):
+        x = self.conv1(x)
+        x = self.bn1(x)
+        x = self.relu(x)
+        x = self.maxpool(x)
+
+        x = self.layer1(x)
+        x = self.layer2(x)
+        x = self.layer3(x)
+        x = self.layer4(x)
+
+        x = self.avgpool(x)
+        x = torch.flatten(x, 1)
+        if self.drop:
+            x = self.drop(x)
+        x = self.fc(x)
+
+        return x
+
+@RESNEST_MODELS_REGISTRY.register()
+def resnet50(pretrained=False, root='~/.encoding/models', **kwargs):
+    """Constructs a ResNet-50 model.
+    Args:
+        pretrained (bool): If True, returns a model pre-trained on ImageNet
+    """
+    model = ResNet(Bottleneck, [3, 4, 6, 3], **kwargs)
+    if pretrained:
+        model.load_state_dict(torch.hub.load_state_dict_from_url(
+            resnest_model_urls['resnet50'], progress=True, check_hash=True))
+    return model
+
+
+@RESNEST_MODELS_REGISTRY.register()
+def resnet101(pretrained=False, root='~/.encoding/models', **kwargs):
+    """Constructs a ResNet-101 model.
+    Args:
+        pretrained (bool): If True, returns a model pre-trained on ImageNet
+    """
+    model = ResNet(Bottleneck, [3, 4, 23, 3], **kwargs)
+    if pretrained:
+        model.load_state_dict(torch.hub.load_state_dict_from_url(
+            resnest_model_urls['resnet101'], progress=True, check_hash=True))
+    return model
+
+
+@RESNEST_MODELS_REGISTRY.register()
+def resnet152(pretrained=False, root='~/.encoding/models', **kwargs):
+    """Constructs a ResNet-152 model.
+    Args:
+        pretrained (bool): If True, returns a model pre-trained on ImageNet
+    """
+    model = ResNet(Bottleneck, [3, 8, 36, 3], **kwargs)
+    if pretrained:
+        model.load_state_dict(torch.hub.load_state_dict_from_url(
+            resnest_model_urls['resnet152'], progress=True, check_hash=True))
+    return model
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/torch/models/splat.py b/final-project/model_zoo/pytorch_resnest/resnest/torch/models/splat.py
new file mode 100644
index 0000000..6a7df9e
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/torch/models/splat.py
@@ -0,0 +1,103 @@
+"""Split-Attention"""
+
+import torch
+from torch import nn
+import torch.nn.functional as F
+from torch.nn import Conv2d, Module, Linear, BatchNorm2d, ReLU
+from torch.nn.modules.utils import _pair
+
+__all__ = ['SplAtConv2d', 'DropBlock2D']
+
+class DropBlock2D(object):
+    def __init__(self, *args, **kwargs):
+        raise NotImplementedError
+
+class SplAtConv2d(Module):
+    """Split-Attention Conv2d
+    """
+    def __init__(self, in_channels, channels, kernel_size, stride=(1, 1), padding=(0, 0),
+                 dilation=(1, 1), groups=1, bias=True,
+                 radix=2, reduction_factor=4,
+                 rectify=False, rectify_avg=False, norm_layer=None,
+                 dropblock_prob=0.0, **kwargs):
+        super(SplAtConv2d, self).__init__()
+        padding = _pair(padding)
+        self.rectify = rectify and (padding[0] > 0 or padding[1] > 0)
+        self.rectify_avg = rectify_avg
+        inter_channels = max(in_channels*radix//reduction_factor, 32)
+        self.radix = radix
+        self.cardinality = groups
+        self.channels = channels
+        self.dropblock_prob = dropblock_prob
+        if self.rectify:
+            from rfconv import RFConv2d
+            self.conv = RFConv2d(in_channels, channels*radix, kernel_size, stride, padding, dilation,
+                                 groups=groups*radix, bias=bias, average_mode=rectify_avg, **kwargs)
+        else:
+            self.conv = Conv2d(in_channels, channels*radix, kernel_size, stride, padding, dilation,
+                               groups=groups*radix, bias=bias, **kwargs)
+        self.use_bn = norm_layer is not None
+        if self.use_bn:
+            self.bn0 = norm_layer(channels*radix)
+        self.relu = ReLU(inplace=True)
+        self.fc1 = Conv2d(channels, inter_channels, 1, groups=self.cardinality)
+        if self.use_bn:
+            self.bn1 = norm_layer(inter_channels)
+        self.fc2 = Conv2d(inter_channels, channels*radix, 1, groups=self.cardinality)
+        if dropblock_prob > 0.0:
+            self.dropblock = DropBlock2D(dropblock_prob, 3)
+        self.rsoftmax = rSoftMax(radix, groups)
+
+    def forward(self, x):
+        x = self.conv(x)
+        if self.use_bn:
+            x = self.bn0(x)
+        if self.dropblock_prob > 0.0:
+            x = self.dropblock(x)
+        x = self.relu(x)
+
+        batch, rchannel = x.shape[:2]
+        if self.radix > 1:
+            if torch.__version__ < '1.5':
+                splited = torch.split(x, int(rchannel//self.radix), dim=1)
+            else:
+                splited = torch.split(x, rchannel//self.radix, dim=1)
+            gap = sum(splited) 
+        else:
+            gap = x
+        gap = F.adaptive_avg_pool2d(gap, 1)
+        gap = self.fc1(gap)
+
+        if self.use_bn:
+            gap = self.bn1(gap)
+        gap = self.relu(gap)
+
+        atten = self.fc2(gap)
+        atten = self.rsoftmax(atten).view(batch, -1, 1, 1)
+
+        if self.radix > 1:
+            if torch.__version__ < '1.5':
+                attens = torch.split(atten, int(rchannel//self.radix), dim=1)
+            else:
+                attens = torch.split(atten, rchannel//self.radix, dim=1)
+            out = sum([att*split for (att, split) in zip(attens, splited)])
+        else:
+            out = atten * x
+        return out.contiguous()
+
+class rSoftMax(nn.Module):
+    def __init__(self, radix, cardinality):
+        super().__init__()
+        self.radix = radix
+        self.cardinality = cardinality
+
+    def forward(self, x):
+        batch = x.size(0)
+        if self.radix > 1:
+            x = x.view(batch, self.cardinality, self.radix, -1).transpose(1, 2)
+            x = F.softmax(x, dim=1)
+            x = x.reshape(batch, -1)
+        else:
+            x = torch.sigmoid(x)
+        return x
+
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/torch/transforms/__init__.py b/final-project/model_zoo/pytorch_resnest/resnest/torch/transforms/__init__.py
new file mode 100644
index 0000000..f5e6f5c
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/torch/transforms/__init__.py
@@ -0,0 +1 @@
+from .build import get_transform, RESNEST_TRANSFORMS_REGISTRY
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/torch/transforms/autoaug.py b/final-project/model_zoo/pytorch_resnest/resnest/torch/transforms/autoaug.py
new file mode 100644
index 0000000..3ecc697
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/torch/transforms/autoaug.py
@@ -0,0 +1,197 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+# code adapted from:
+# https://github.com/kakaobrain/fast-autoaugment
+# https://github.com/rpmcruz/autoaugment
+import math
+import random
+
+import numpy as np
+from collections import defaultdict
+import PIL, PIL.ImageOps, PIL.ImageEnhance, PIL.ImageDraw
+
+RESAMPLE_MODE=PIL.Image.BICUBIC
+
+RANDOM_MIRROR = True
+
+def ShearX(img, v, resample=RESAMPLE_MODE):  # [-0.3, 0.3]
+    assert -0.3 <= v <= 0.3
+    if RANDOM_MIRROR and random.random() > 0.5:
+        v = -v
+    return img.transform(img.size, PIL.Image.AFFINE, (1, v, 0, 0, 1, 0),
+                         resample=resample)
+
+def ShearY(img, v, resample=RESAMPLE_MODE):  # [-0.3, 0.3]
+    assert -0.3 <= v <= 0.3
+    if RANDOM_MIRROR and random.random() > 0.5:
+        v = -v
+    return img.transform(img.size, PIL.Image.AFFINE, (1, 0, 0, v, 1, 0),
+                         resample=resample)
+
+
+def TranslateX(img, v, resample=RESAMPLE_MODE):  # [-150, 150] => percentage: [-0.45, 0.45]
+    assert -0.45 <= v <= 0.45
+    if RANDOM_MIRROR and random.random() > 0.5:
+        v = -v
+    v = v * img.size[0]
+    return img.transform(img.size, PIL.Image.AFFINE, (1, 0, v, 0, 1, 0),
+                         resample=resample)
+
+
+def TranslateY(img, v, resample=RESAMPLE_MODE):  # [-150, 150] => percentage: [-0.45, 0.45]
+    assert -0.45 <= v <= 0.45
+    if RANDOM_MIRROR and random.random() > 0.5:
+        v = -v
+    v = v * img.size[1]
+    return img.transform(img.size, PIL.Image.AFFINE, (1, 0, 0, 0, 1, v),
+                         resample=resample)
+
+
+def TranslateXabs(img, v, resample=RESAMPLE_MODE):  # [-150, 150] => percentage: [-0.45, 0.45]
+    assert 0 <= v
+    if random.random() > 0.5:
+        v = -v
+    return img.transform(img.size, PIL.Image.AFFINE, (1, 0, v, 0, 1, 0),
+                         resample=resample)
+
+
+def TranslateYabs(img, v, resample=RESAMPLE_MODE):  # [-150, 150] => percentage: [-0.45, 0.45]
+    assert 0 <= v
+    if random.random() > 0.5:
+        v = -v
+    return img.transform(img.size, PIL.Image.AFFINE, (1, 0, 0, 0, 1, v),
+                         resample=resample)
+
+
+def Rotate(img, v):  # [-30, 30]
+    assert -30 <= v <= 30
+    if RANDOM_MIRROR and random.random() > 0.5:
+        v = -v
+    return img.rotate(v)
+
+
+def AutoContrast(img, _):
+    return PIL.ImageOps.autocontrast(img)
+
+
+def Invert(img, _):
+    return PIL.ImageOps.invert(img)
+
+
+def Equalize(img, _):
+    return PIL.ImageOps.equalize(img)
+
+
+def Flip(img, _):  # not from the paper
+    return PIL.ImageOps.mirror(img)
+
+
+def Solarize(img, v):  # [0, 256]
+    assert 0 <= v <= 256
+    return PIL.ImageOps.solarize(img, v)
+
+
+def SolarizeAdd(img, addition=0, threshold=128):
+    img_np = np.array(img).astype(np.int)
+    img_np = img_np + addition
+    img_np = np.clip(img_np, 0, 255)
+    img_np = img_np.astype(np.uint8)
+    img = PIL.Image.fromarray(img_np)
+    return PIL.ImageOps.solarize(img, threshold)
+
+
+def Posterize(img, v):  # [4, 8]
+    #assert 4 <= v <= 8
+    v = int(v)
+    return PIL.ImageOps.posterize(img, v)
+
+def Contrast(img, v):  # [0.1,1.9]
+    assert 0.1 <= v <= 1.9
+    return PIL.ImageEnhance.Contrast(img).enhance(v)
+
+
+def Color(img, v):  # [0.1,1.9]
+    assert 0.1 <= v <= 1.9
+    return PIL.ImageEnhance.Color(img).enhance(v)
+
+
+def Brightness(img, v):  # [0.1,1.9]
+    assert 0.1 <= v <= 1.9
+    return PIL.ImageEnhance.Brightness(img).enhance(v)
+
+
+def Sharpness(img, v):  # [0.1,1.9]
+    assert 0.1 <= v <= 1.9
+    return PIL.ImageEnhance.Sharpness(img).enhance(v)
+
+
+def CutoutAbs(img, v):  # [0, 60] => percentage: [0, 0.2]
+    # assert 0 <= v <= 20
+    if v < 0:
+        return img
+    w, h = img.size
+    x0 = np.random.uniform(w)
+    y0 = np.random.uniform(h)
+
+    x0 = int(max(0, x0 - v / 2.))
+    y0 = int(max(0, y0 - v / 2.))
+    x1 = min(w, x0 + v)
+    y1 = min(h, y0 + v)
+
+    xy = (x0, y0, x1, y1)
+    color = (125, 123, 114)
+    # color = (0, 0, 0)
+    img = img.copy()
+    PIL.ImageDraw.Draw(img).rectangle(xy, color)
+    return img
+
+
+def Cutout(img, v):  # [0, 60] => percentage: [0, 0.2]
+    assert 0.0 <= v <= 0.2
+    if v <= 0.:
+        return img
+
+    v = v * img.size[0]
+    return CutoutAbs(img, v)
+
+def rand_augment_list():  # 16 oeprations and their ranges
+    l = [
+        (AutoContrast, 0, 1),
+        (Equalize, 0, 1),
+        (Invert, 0, 1),
+        (Rotate, 0, 30),
+        (Posterize, 0, 4),
+        (Solarize, 0, 256),
+        (SolarizeAdd, 0, 110),
+        (Color, 0.1, 1.9),
+        (Contrast, 0.1, 1.9),
+        (Brightness, 0.1, 1.9),
+        (Sharpness, 0.1, 1.9),
+        (ShearX, 0., 0.3),
+        (ShearY, 0., 0.3),
+        (CutoutAbs, 0, 40),
+        (TranslateXabs, 0., 100),
+        (TranslateYabs, 0., 100),
+    ]
+
+    return l
+
+class RandAugment(object):
+    def __init__(self, n, m):
+        self.n = n
+        self.m = m
+        self.augment_list = rand_augment_list()
+
+    def __call__(self, img):
+        ops = random.choices(self.augment_list, k=self.n)
+        for op, minval, maxval in ops:
+            if random.random() > random.uniform(0.2, 0.8):
+                continue
+            val = (float(self.m) / 30) * float(maxval - minval) + minval
+            img = op(img, val)
+        return img
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/torch/transforms/build.py b/final-project/model_zoo/pytorch_resnest/resnest/torch/transforms/build.py
new file mode 100644
index 0000000..95eaf33
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/torch/transforms/build.py
@@ -0,0 +1,53 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+import torch
+from torchvision.transforms import *
+from .transforms import *
+from fvcore.common.registry import Registry
+
+RESNEST_TRANSFORMS_REGISTRY = Registry('RESNEST_TRANSFORMS')
+
+def get_transform(dataset_name):
+    return RESNEST_TRANSFORMS_REGISTRY.get(dataset_name.lower())
+
+@RESNEST_TRANSFORMS_REGISTRY.register()
+def imagenet(base_size=None, crop_size=224, rand_aug=False):
+    normalize = Normalize(mean=[0.485, 0.456, 0.406],
+                          std=[0.229, 0.224, 0.225])
+    base_size = base_size if base_size is not None else int(1.0 * crop_size / 0.875)
+    train_transforms = []
+    val_transforms = []
+    if rand_aug:
+        from .autoaug import RandAugment
+        train_transforms.append(RandAugment(2, 12))
+
+    train_transforms.extend([
+        ERandomCrop(crop_size),
+        RandomHorizontalFlip(),
+        ColorJitter(0.4, 0.4, 0.4),
+        ToTensor(),
+        Lighting(0.1, _imagenet_pca['eigval'], _imagenet_pca['eigvec']),
+        normalize,
+    ])
+    val_transforms.extend([
+        ECenterCrop(crop_size),
+        ToTensor(),
+        normalize,
+    ])
+    transform_train = Compose(train_transforms)
+    transform_val = Compose(val_transforms)
+    return transform_train, transform_val
+
+_imagenet_pca = {
+    'eigval': torch.Tensor([0.2175, 0.0188, 0.0045]),
+    'eigvec': torch.Tensor([
+        [-0.5675,  0.7192,  0.4009],
+        [-0.5808, -0.0045, -0.8140],
+        [-0.5836, -0.6948,  0.4203],
+    ])
+}
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/torch/transforms/transforms.py b/final-project/model_zoo/pytorch_resnest/resnest/torch/transforms/transforms.py
new file mode 100644
index 0000000..6ecd589
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/torch/transforms/transforms.py
@@ -0,0 +1,122 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+import math
+import random
+
+from PIL import Image
+from torchvision.transforms import Resize, InterpolationMode
+
+__all__ = ['Lighting', 'ERandomCrop', 'ECenterCrop']
+
+class Lighting(object):
+    """Lighting noise(AlexNet - style PCA - based noise)"""
+
+    def __init__(self, alphastd, eigval, eigvec):
+        self.alphastd = alphastd
+        self.eigval = eigval
+        self.eigvec = eigvec
+
+    def __call__(self, img):
+        if self.alphastd == 0:
+            return img
+
+        alpha = img.new().resize_(3).normal_(0, self.alphastd)
+        rgb = self.eigvec.type_as(img).clone()\
+            .mul(alpha.view(1, 3).expand(3, 3))\
+            .mul(self.eigval.view(1, 3).expand(3, 3))\
+            .sum(1).squeeze()
+
+        return img.add(rgb.view(3, 1, 1).expand_as(img))
+
+
+#https://github.com/kakaobrain/fast-autoaugment/blob/master/FastAutoAugment/data.py
+class ERandomCrop:
+    def __init__(self, imgsize, min_covered=0.1, aspect_ratio_range=(3./4, 4./3),
+                 area_range=(0.1, 1.0), max_attempts=10):
+        assert 0.0 < min_covered
+        assert 0 < aspect_ratio_range[0] <= aspect_ratio_range[1]
+        assert 0 < area_range[0] <= area_range[1]
+        assert 1 <= max_attempts
+
+        self.imgsize = imgsize
+        self.min_covered = min_covered
+        self.aspect_ratio_range = aspect_ratio_range
+        self.area_range = area_range
+        self.max_attempts = max_attempts
+        self._fallback = ECenterCrop(imgsize)
+        self.resize_method = Resize((imgsize, imgsize),
+                                    interpolation=InterpolationMode.BILINEAR)
+
+    def __call__(self, img):
+        original_width, original_height = img.size
+        min_area = self.area_range[0] * (original_width * original_height)
+        max_area = self.area_range[1] * (original_width * original_height)
+
+        for _ in range(self.max_attempts):
+            aspect_ratio = random.uniform(*self.aspect_ratio_range)
+            height = int(round(math.sqrt(min_area / aspect_ratio)))
+            max_height = int(round(math.sqrt(max_area / aspect_ratio)))
+
+            if max_height * aspect_ratio > original_width:
+                max_height = (original_width + 0.5 - 1e-7) / aspect_ratio
+                max_height = int(max_height)
+                if max_height * aspect_ratio > original_width:
+                    max_height -= 1
+
+            if max_height > original_height:
+                max_height = original_height
+
+            if height >= max_height:
+                height = max_height
+
+            height = int(round(random.uniform(height, max_height)))
+            width = int(round(height * aspect_ratio))
+            area = width * height
+
+            if area < min_area or area > max_area:
+                continue
+            if width > original_width or height > original_height:
+                continue
+            if area < self.min_covered * (original_width * original_height):
+                continue
+            if width == original_width and height == original_height:
+                return self._fallback(img)
+
+            x = random.randint(0, original_width - width)
+            y = random.randint(0, original_height - height)
+            img = img.crop((x, y, x + width, y + height))
+            return self.resize_method(img)
+
+        return self._fallback(img)
+
+
+class ECenterCrop:
+    """Crop the given PIL Image and resize it to desired size.
+    Args:
+        img (PIL Image): Image to be cropped. (0,0) denotes the top left corner of the image.
+        output_size (sequence or int): (height, width) of the crop box. If int,
+            it is used for both directions
+    Returns:
+        PIL Image: Cropped image.
+    """
+    def __init__(self, imgsize):
+        self.imgsize = imgsize
+        self.resize_method = Resize((imgsize, imgsize),
+                                    interpolation=InterpolationMode.BILINEAR)
+
+    def __call__(self, img):
+        image_width, image_height = img.size
+        image_short = min(image_width, image_height)
+
+        crop_size = float(self.imgsize) / (self.imgsize + 32) * image_short
+
+        crop_height, crop_width = crop_size, crop_size
+        crop_top = int(round((image_height - crop_height) / 2.))
+        crop_left = int(round((image_width - crop_width) / 2.))
+        img = img.crop((crop_left, crop_top, crop_left + crop_width, crop_top + crop_height))
+        return self.resize_method(img)
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/torch/utils.py b/final-project/model_zoo/pytorch_resnest/resnest/torch/utils.py
new file mode 100644
index 0000000..91dc9df
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/torch/utils.py
@@ -0,0 +1,225 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## ECE Department, Rutgers University
+## Email: zhang.hang@rutgers.edu
+## Copyright (c) 2017
+##
+## This source code is licensed under the MIT-style license found in the
+## LICENSE file in the root directory of this source tree
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+import os
+import math
+import atexit
+import shutil
+import functools
+import threading
+import numpy as np
+import torch
+
+from iopath.common.file_io import PathManager as PathManagerBase
+
+__all__ = ['accuracy', 'AverageMeter', 'LR_Scheduler', 'mkdir',
+           'torch_dist_sum', 'MixUpWrapper', 'save_checkpoint',
+           'cached_log_stream', 'PathManager']
+
+PathManager = PathManagerBase()
+
+def accuracy(output, target, topk=(1,)):
+    """Computes the accuracy over the k top predictions for the specified values of k"""
+    with torch.no_grad():
+        maxk = max(topk)
+        batch_size = target.size(0)
+
+        _, pred = output.topk(maxk, 1, True, True)
+        pred = pred.t()
+        correct = pred.eq(target.view(1, -1).expand_as(pred))
+
+        res = []
+        for k in topk:
+            correct_k = correct[:k].reshape(-1).float().sum(0, keepdim=True)
+            res.append(correct_k.mul_(100.0 / batch_size))
+        return res
+
+
+class AverageMeter(object):
+    """Computes and stores the average and current value"""
+    def __init__(self):
+        self.reset()
+
+    def reset(self):
+        #self.val = 0
+        self.sum = 0
+        self.count = 0
+
+    def update(self, val, n=1):
+        #self.val = val
+        self.sum += val * n
+        self.count += n
+
+    @property
+    def avg(self):
+        avg = 0 if self.count == 0 else self.sum / self.count
+        return avg
+
+
+def torch_dist_sum(gpu, *args):
+    process_group = torch.distributed.group.WORLD
+    tensor_args = []
+    pending_res = []
+    for arg in args:
+        if isinstance(arg, torch.Tensor):
+            tensor_arg = arg.clone().reshape(-1).detach().cuda(gpu)
+        else:
+            tensor_arg = torch.tensor(arg).reshape(-1).cuda(gpu)
+        tensor_args.append(tensor_arg)
+        pending_res.append(torch.distributed.all_reduce(tensor_arg, group=process_group, async_op=True))
+    for res in pending_res:
+        res.wait()
+    return tensor_args
+
+def get_rank():
+    if torch.distributed.is_initialized():
+        rank = torch.distributed.get_rank()
+    else:
+        rank = 0
+    return rank
+
+def master_only(func):
+    @functools.wraps(func)
+    def wrapper(*args, **kwargs):
+        if get_rank() == 0:
+            return func(*args, **kwargs)
+        else:
+            return None
+    return wrapper
+
+@master_only
+def master_only_print(*args):
+    """master-only print"""
+    print(*args)
+
+class LR_Scheduler(object):
+    """Learning Rate Scheduler
+
+    Step mode: ``lr = baselr * 0.1 ^ {floor(epoch-1 / lr_step)}``
+
+    Cosine mode: ``lr = baselr * 0.5 * (1 + cos(iter/maxiter))``
+
+    Poly mode: ``lr = baselr * (1 - iter/maxiter) ^ 0.9``
+
+    Args:
+        args:  :attr:`args.lr_scheduler` lr scheduler mode (`cos`, `poly`),
+          :attr:`args.lr` base learning rate, :attr:`args.epochs` number of epochs,
+          :attr:`args.lr_step`
+
+        iters_per_epoch: number of iterations per epoch
+    """
+    def __init__(self, mode, base_lr, num_epochs, iters_per_epoch=0,
+                 lr_step=0, warmup_epochs=0, quiet=False,
+                 logger=None):
+        self.mode = mode
+        self.quiet = quiet
+        self.logger = logger
+        if not quiet:
+            msg = 'Using {} LR scheduler with warm-up epochs of {}!'.format(self.mode, warmup_epochs)
+            if self.logger:
+                self.logger.info(msg)
+            else:
+                master_only_print()
+        if mode == 'step':
+            assert lr_step
+        self.base_lr = base_lr
+        self.lr_step = lr_step
+        self.iters_per_epoch = iters_per_epoch
+        self.epoch = -1
+        self.warmup_iters = warmup_epochs * iters_per_epoch
+        self.total_iters = (num_epochs - warmup_epochs) * iters_per_epoch
+
+    def __call__(self, optimizer, i, epoch, best_pred):
+        T = epoch * self.iters_per_epoch + i
+        # warm up lr schedule
+        if self.warmup_iters > 0 and T < self.warmup_iters:
+            lr = self.base_lr * 1.0 * T / self.warmup_iters
+        elif self.mode == 'cos':
+            T = T - self.warmup_iters
+            lr = 0.5 * self.base_lr * (1 + math.cos(1.0 * T / self.total_iters * math.pi))
+        elif self.mode == 'poly':
+            T = T - self.warmup_iters
+            lr = self.base_lr * pow((1 - 1.0 * T / self.total_iters), 0.9)
+        elif self.mode == 'step':
+            lr = self.base_lr * (0.1 ** (epoch // self.lr_step))
+        else:
+            raise NotImplementedError
+        if epoch > self.epoch and (epoch == 0 or best_pred > 0.0):
+            if not self.quiet:
+                msg = '\n=>Epoch %i, learning rate = %.4f, \
+                    previous best = %.4f' % (epoch, lr, best_pred)
+                if self.logger:
+                    self.logger.info(msg)
+                else:
+                    master_only_print()
+            self.epoch = epoch
+        assert lr >= 0
+        self._adjust_learning_rate(optimizer, lr)
+
+    def _adjust_learning_rate(self, optimizer, lr):
+        for i in range(len(optimizer.param_groups)):
+            optimizer.param_groups[i]['lr'] = lr
+
+
+class MixUpWrapper(object):
+    def __init__(self, alpha, num_classes, dataloader, device):
+        self.alpha = alpha
+        self.dataloader = dataloader
+        self.num_classes = num_classes
+        self.device = device
+
+    def mixup_loader(self, loader):
+        def mixup(alpha, num_classes, data, target):
+            with torch.no_grad():
+                bs = data.size(0)
+                c = np.random.beta(alpha, alpha)
+                perm = torch.randperm(bs).cuda()
+
+                md = c * data + (1-c) * data[perm, :]
+                mt = c * target + (1-c) * target[perm, :]
+                return md, mt
+
+        for input, target in loader:
+            input, target = input.cuda(self.device), target.cuda(self.device)
+            target = torch.nn.functional.one_hot(target, self.num_classes)
+            i, t = mixup(self.alpha, self.num_classes, input, target)
+            yield i, t
+
+    def __len__(self):
+        return len(self.dataloader)
+
+    def __iter__(self):
+        return self.mixup_loader(self.dataloader)
+
+@master_only
+def save_checkpoint(state, directory, is_best, filename='checkpoint.pth'):
+    """Saves checkpoint to disk"""
+    mkdir(directory)
+    filename = os.path.join(directory, filename)
+    with PathManager.open(filename, "wb") as f:
+        torch.save(state, f)
+    best_filename = os.path.join(directory, 'model_best.pth')
+    if is_best:
+        with PathManager.open(best_filename, "wb") as f:
+            torch.save(state, f)
+
+# cache the opened file object, so that different calls to `setup_logger`
+# with the same file name can safely write to the same file.
+@functools.lru_cache(maxsize=None)
+def cached_log_stream(filename):
+    # use 1K buffer if writing to cloud storage
+    io = PathManager.open(filename, "a", buffering=1024 if "://" in filename else -1)
+    atexit.register(io.close)
+    return io
+
+def mkdir(path):
+    """Make directory at the specified local path with special error handling.
+    """
+    PathManager.mkdirs(path)
diff --git a/final-project/model_zoo/pytorch_resnest/resnest/utils.py b/final-project/model_zoo/pytorch_resnest/resnest/utils.py
new file mode 100644
index 0000000..409db63
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/resnest/utils.py
@@ -0,0 +1,131 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+import os
+from pathlib import Path
+import requests
+import errno
+import shutil
+import hashlib
+import zipfile
+import logging
+from tqdm import tqdm
+
+logger = logging.getLogger(__name__)
+
+__all__ = ['unzip', 'download', 'mkdir', 'check_sha1', 'raise_num_file']
+
+def unzip(zip_file_path, root=os.path.expanduser('./')):
+    """Unzips files located at `zip_file_path` into parent directory specified by `root`.
+    """
+    folders = []
+    with zipfile.ZipFile(zip_file_path) as zf:
+        zf.extractall(root)
+        for name in zf.namelist():
+            folder = Path(name).parts[0]
+            if folder not in folders:
+                folders.append(folder)
+    folders = folders[0] if len(folders) == 1 else tuple(folders)
+    return folders
+
+def download(url, path=None, overwrite=False, sha1_hash=None):
+    """Download files from a given URL.
+
+    Parameters
+    ----------
+    url : str
+        URL where file is located
+    path : str, optional
+        Destination path to store downloaded file. By default stores to the
+        current directory with same name as in url.
+    overwrite : bool, optional
+        Whether to overwrite destination file if one already exists at this location.
+    sha1_hash : str, optional
+        Expected sha1 hash in hexadecimal digits (will ignore existing file when hash is specified
+        but doesn't match).
+
+    Returns
+    -------
+    str
+        The file path of the downloaded file.
+    """
+    if path is None:
+        fname = url.split('/')[-1]
+    else:
+        path = os.path.expanduser(path)
+        if os.path.isdir(path):
+            fname = os.path.join(path, url.split('/')[-1])
+        else:
+            fname = path
+
+    if overwrite or not os.path.exists(fname) or (sha1_hash and not check_sha1(fname, sha1_hash)):
+        dirname = os.path.dirname(os.path.abspath(os.path.expanduser(fname)))
+        if not os.path.exists(dirname):
+            os.makedirs(dirname)
+
+        logger.info('Downloading %s from %s...'%(fname, url))
+        r = requests.get(url, stream=True)
+        if r.status_code != 200:
+            raise RuntimeError("Failed downloading url %s"%url)
+        total_length = r.headers.get('content-length')
+        with open(fname, 'wb') as f:
+            if total_length is None: # no content length header
+                for chunk in r.iter_content(chunk_size=1024):
+                    if chunk: # filter out keep-alive new chunks
+                        f.write(chunk)
+            else:
+                total_length = int(total_length)
+                for chunk in tqdm(r.iter_content(chunk_size=1024),
+                                  total=int(total_length / 1024. + 0.5),
+                                  unit='KB', unit_scale=False, dynamic_ncols=True):
+                    f.write(chunk)
+
+        if sha1_hash and not check_sha1(fname, sha1_hash):
+            raise UserWarning('File {} is downloaded but the content hash does not match. ' \
+                              'The repo may be outdated or download may be incomplete. ' \
+                              'If the "repo_url" is overridden, consider switching to ' \
+                              'the default repo.'.format(fname))
+
+    return fname
+
+
+def check_sha1(filename, sha1_hash):
+    """Check whether the sha1 hash of the file content matches the expected hash.
+
+    Parameters
+    ----------
+    filename : str
+        Path to the file.
+    sha1_hash : str
+        Expected sha1 hash in hexadecimal digits.
+
+    Returns
+    -------
+    bool
+        Whether the file content matches the expected hash.
+    """
+    sha1 = hashlib.sha1()
+    with open(filename, 'rb') as f:
+        while True:
+            data = f.read(1048576)
+            if not data:
+                break
+            sha1.update(data)
+
+    return sha1.hexdigest() == sha1_hash
+
+
+def mkdir(path):
+    """Make directory at the specified local path with special error handling.
+    """
+    try:
+        os.makedirs(path)
+    except OSError as exc:  # Python >2.5
+        if exc.errno == errno.EEXIST and os.path.isdir(path):
+            pass
+        else:
+            raise
diff --git a/final-project/model_zoo/pytorch_resnest/scripts/dataset/prepare_imagenet.py b/final-project/model_zoo/pytorch_resnest/scripts/dataset/prepare_imagenet.py
new file mode 100644
index 0000000..eb9efe7
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/scripts/dataset/prepare_imagenet.py
@@ -0,0 +1,90 @@
+"""Prepare the ImageNet dataset"""
+import os
+import argparse
+import tarfile
+import pickle
+import gzip
+import subprocess
+from tqdm import tqdm
+import subprocess
+from resnest.utils import check_sha1, download, mkdir
+
+_TARGET_DIR = os.path.expanduser('~/.encoding/data/ILSVRC2012')
+_TRAIN_TAR = 'ILSVRC2012_img_train.tar'
+_TRAIN_TAR_SHA1 = '43eda4fe35c1705d6606a6a7a633bc965d194284'
+_VAL_TAR = 'ILSVRC2012_img_val.tar'
+_VAL_TAR_SHA1 = '5f3f73da3395154b60528b2b2a2caf2374f5f178'
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='Setup the ImageNet dataset.',
+        formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+    parser.add_argument('--download-dir', required=True,
+                        help="The directory that contains downloaded tar files")
+    parser.add_argument('--target-dir', default=_TARGET_DIR,
+                        help="The directory to store extracted images")
+    parser.add_argument('--checksum', action='store_true',
+                        help="If check integrity before extracting.")
+    parser.add_argument('--with-rec', action='store_true',
+                        help="If build image record files.")
+    parser.add_argument('--num-thread', type=int, default=1,
+                        help="Number of threads to use when building image record file.")
+    args = parser.parse_args()
+    return args
+
+def check_file(filename, checksum, sha1):
+    if not os.path.exists(filename):
+        raise ValueError('File not found: '+filename)
+    if checksum and not check_sha1(filename, sha1):
+        raise ValueError('Corrupted file: '+filename)
+
+def extract_train(tar_fname, target_dir, with_rec=False, num_thread=1):
+    mkdir(target_dir)
+    with tarfile.open(tar_fname) as tar:
+        print("Extracting "+tar_fname+"...")
+        # extract each class one-by-one
+        pbar = tqdm(total=len(tar.getnames()))
+        for class_tar in tar:
+            pbar.set_description('Extract '+class_tar.name)
+            tar.extract(class_tar, target_dir)
+            class_fname = os.path.join(target_dir, class_tar.name)
+            class_dir = os.path.splitext(class_fname)[0]
+            os.mkdir(class_dir)
+            with tarfile.open(class_fname) as f:
+                f.extractall(class_dir)
+            os.remove(class_fname)
+            pbar.update(1)
+        pbar.close()
+
+def extract_val(tar_fname, target_dir, with_rec=False, num_thread=1):
+    mkdir(target_dir)
+    print('Extracting ' + tar_fname)
+    with tarfile.open(tar_fname) as tar:
+        tar.extractall(target_dir)
+    # build rec file before images are moved into subfolders
+    # move images to proper subfolders
+    subprocess.call(["wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash"],
+                    cwd=target_dir, shell=True)
+    
+
+def main():
+    args = parse_args()
+
+    target_dir = os.path.expanduser(args.target_dir)
+    #if os.path.exists(target_dir):
+    #    raise ValueError('Target dir ['+target_dir+'] exists. Remove it first')
+
+    download_dir = os.path.expanduser(args.download_dir)
+    train_tar_fname = os.path.join(download_dir, _TRAIN_TAR)
+    check_file(train_tar_fname, args.checksum, _TRAIN_TAR_SHA1)
+    val_tar_fname = os.path.join(download_dir, _VAL_TAR)
+    check_file(val_tar_fname, args.checksum, _VAL_TAR_SHA1)
+
+    build_rec = args.with_rec
+    if build_rec:
+        os.makedirs(os.path.join(target_dir, 'rec'))
+    extract_train(train_tar_fname, os.path.join(target_dir, 'train'), build_rec, args.num_thread)
+    extract_val(val_tar_fname, os.path.join(target_dir, 'val'), build_rec, args.num_thread)
+
+if __name__ == '__main__':
+    main()
diff --git a/final-project/model_zoo/pytorch_resnest/scripts/gluon/README.md b/final-project/model_zoo/pytorch_resnest/scripts/gluon/README.md
new file mode 100644
index 0000000..3362011
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/scripts/gluon/README.md
@@ -0,0 +1,44 @@
+## Train ResNeSt with MXNet Gluon
+
+For training with PyTorch, please visit [PyTorch Encoding Toolkit](https://hangzhang.org/PyTorch-Encoding/model_zoo/imagenet.html)
+
+### Install MXNet with Horovod
+
+```bash
+# assuming you have CUDA 10.0 on your machine
+pip install mxnet-cu100
+HOROVOD_GPU_ALLREDUCE=NCCL pip install -v --no-cache-dir horovod
+pip install --no-cache mpi4py
+```
+
+### Prepare ImageNet recordio data format
+
+- Unfortunately ,this is required for training using MXNet Gluon. Please follow the [GluonCV tutorial](https://gluon-cv.mxnet.io/build/examples_datasets/recordio.html) to prepare the data.
+- Copy the data into ramdisk (optional):
+	
+	```
+	cd ~/
+	sudo mkdir -p /media/ramdisk
+	sudo mount -t tmpfs -o size=200G tmpfs /media/ramdisk
+	cp -r /home/ubuntu/data/ILSVRC2012/ /media/ramdisk
+	```
+
+### Training command
+
+Using ResNeSt-50 as the target model:
+
+```bash
+horovodrun -np 64 --hostfile hosts python train.py \
+--rec-train /media/ramdisk/ILSVRC2012/train.rec \
+--rec-val /media/ramdisk/ILSVRC2012/val.rec \
+--model resnest50 --lr 0.05 --num-epochs 270 --batch-size 128 \
+--use-rec --dtype float32 --warmup-epochs 5 --last-gamma --no-wd \
+--label-smoothing --mixup --save-dir params_ resnest50 \
+--log-interval 50 --eval-frequency 5 --auto_aug --input-size 224
+```
+
+### Verify pretrained model
+
+```bash
+python verify.py --model resnest50 --crop-size 224 --resume params_ resnest50/imagenet-resnest50-269.params
+```
\ No newline at end of file
diff --git a/final-project/model_zoo/pytorch_resnest/scripts/gluon/train.py b/final-project/model_zoo/pytorch_resnest/scripts/gluon/train.py
new file mode 100644
index 0000000..97ede40
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/scripts/gluon/train.py
@@ -0,0 +1,477 @@
+# Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
+#
+# Permission is hereby granted, free of charge, to any person obtaining a copy
+# of this software and associated documentation files (the "Software"), to deal
+# in the Software without restriction, including without limitation the rights
+# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+# copies of the Software, and to permit persons to whom the Software is
+# furnished to do so.
+#
+# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+# SOFTWARE.
+
+import os
+
+# disable autotune
+os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT'] = '1'
+#os.environ['MXNET_GPU_MEM_POOL_TYPE'] = 'Round'
+os.environ['MXNET_GPU_MEM_POOL_ROUND_LINEAR_CUTOFF'] = '26'
+os.environ['MXNET_EXEC_BULK_EXEC_MAX_NODE_TRAIN_FWD'] = '999'
+os.environ['MXNET_EXEC_BULK_EXEC_MAX_NODE_TRAIN_BWD'] = '25'
+os.environ['MXNET_GPU_COPY_NTHREADS'] = '1'
+os.environ['MXNET_OPTIMIZER_AGGREGATION_SIZE'] = '54'
+
+import argparse
+import logging
+import math
+import time
+import random
+from PIL import Image
+
+import horovod.mxnet as hvd
+import mxnet as mx
+import numpy as np
+from mxnet import autograd, gluon, lr_scheduler
+from mxnet.io import DataBatch, DataIter
+from mxnet.gluon.data.vision import transforms
+
+from resnest.gluon import get_model
+from resnest.utils import mkdir
+from resnest.gluon.transforms import ERandomCrop, ECenterCrop
+from torchvision.transforms import transforms as pth_transforms
+
+try:
+    from mpi4py import MPI
+except ImportError:
+    logging.info('mpi4py is not installed. Use "pip install --no-cache mpi4py" to install')
+    MPI = None
+
+# Training settings
+parser = argparse.ArgumentParser(description='MXNet ImageNet Example',
+                                 formatter_class=argparse.ArgumentDefaultsHelpFormatter)
+parser.add_argument('--use-rec', action='store_true', default=False,
+                    help='use image record iter for data input (default: False)')
+parser.add_argument('--data-nthreads', type=int, default=8,
+                    help='number of threads for data decoding (default: 2)')
+parser.add_argument('--rec-train', type=str, default='',
+                    help='the training data')
+parser.add_argument('--rec-val', type=str, default='',
+                    help='the validation data')
+parser.add_argument('--batch-size', type=int, default=128,
+                    help='training batch size per device (default: 128)')
+parser.add_argument('--dtype', type=str, default='float32',
+                    help='data type for training (default: float32)')
+parser.add_argument('--num-epochs', type=int, default=90,
+                    help='number of training epochs (default: 90)')
+parser.add_argument('--lr', type=float, default=0.05,
+                    help='learning rate for a single GPU (default: 0.05)')
+parser.add_argument('--momentum', type=float, default=0.9,
+                    help='momentum value for optimizer (default: 0.9)')
+parser.add_argument('--wd', type=float, default=0.0001,
+                    help='weight decay rate (default: 0.0001)')
+parser.add_argument('--warmup-lr', type=float, default=0.0,
+                    help='starting warmup learning rate (default: 0.0)')
+parser.add_argument('--warmup-epochs', type=int, default=10,
+                    help='number of warmup epochs (default: 10)')
+parser.add_argument('--last-gamma', action='store_true', default=False,
+                    help='whether to init gamma of the last BN layer in \
+                    each bottleneck to 0 (default: False)')
+parser.add_argument('--mixup', action='store_true',
+                    help='whether train the model with mix-up. default is false.')
+parser.add_argument('--mixup-alpha', type=float, default=0.2,
+                    help='beta distribution parameter for mixup sampling, default is 0.2.')
+parser.add_argument('--mixup-off-epoch', type=int, default=0,
+                    help='how many last epochs to train without mixup, default is 0.')
+parser.add_argument('--label-smoothing', action='store_true',
+                    help='use label smoothing or not in training. default is false.')
+parser.add_argument('--no-wd', action='store_true',
+                    help='whether to remove weight decay on bias, and beta/gamma for batchnorm layers.')
+
+parser.add_argument('--model', type=str, default='resnet50_v1',
+                    help='type of model to use. see vision_model for options.')
+parser.add_argument('--use-pretrained', action='store_true', default=False,
+                    help='load pretrained model weights (default: False)')
+parser.add_argument('--no-cuda', action='store_true', default=False,
+                    help='disables CUDA training (default: False)')
+parser.add_argument('--eval-frequency', type=int, default=0,
+                    help='frequency of evaluating validation accuracy \
+                    when training with gluon mode (default: 0)')
+parser.add_argument('--log-interval', type=int, default=40,
+                    help='number of batches to wait before logging (default: 40)')
+parser.add_argument('--save-frequency', type=int, default=20,
+                    help='frequency of model saving (default: 0)')
+parser.add_argument('--save-dir', type=str, default='params',
+                    help='directory of saved models')
+# data
+parser.add_argument('--input-size', type=int, default=224,
+                    help='size of the input image size. default is 224')
+parser.add_argument('--crop-ratio', type=float, default=0.875,
+                    help='Crop ratio during validation. default is 0.875')
+# resume
+parser.add_argument('--resume-epoch', type=int, default=0,
+                    help='epoch to resume training from.')
+parser.add_argument('--resume-params', type=str, default='',
+                    help='path of parameters to load from.')
+parser.add_argument('--resume-states', type=str, default='',
+                    help='path of trainer state to load from.')
+# new tricks
+parser.add_argument('--dropblock-prob', type=float, default=0,
+                    help='DropBlock prob. default is 0.')
+parser.add_argument('--auto_aug', action='store_true',
+                    help='use auto_aug. default is false.')
+args = parser.parse_args()
+
+# Horovod: initialize Horovod
+hvd.init()
+num_workers = hvd.size()
+rank = hvd.rank()
+local_rank = hvd.local_rank()
+
+if rank==0:
+    logging.basicConfig(level=logging.INFO)
+    logging.info(args)
+
+num_classes = 1000
+num_training_samples = 1281167
+batch_size = args.batch_size
+epoch_size = \
+    int(math.ceil(int(num_training_samples // num_workers) / batch_size))
+
+
+lr_sched = lr_scheduler.CosineScheduler(
+    args.num_epochs * epoch_size,
+    base_lr=(args.lr * num_workers),
+    warmup_steps=(args.warmup_epochs * epoch_size),
+    warmup_begin_lr=args.warmup_lr
+)
+
+
+class SplitSampler(mx.gluon.data.sampler.Sampler):
+    """ Split the dataset into `num_parts` parts and sample from the part with
+    index `part_index`
+ 
+    Parameters
+    ----------
+    length: int
+      Number of examples in the dataset
+    num_parts: int
+      Partition the data into multiple parts
+    part_index: int
+      The index of the part to read from
+    """
+    def __init__(self, length, num_parts=1, part_index=0, random=True):
+        # Compute the length of each partition
+        self.part_len = length // num_parts
+        # Compute the start index for this partition
+        self.start = self.part_len * part_index
+        # Compute the end index for this partition
+        self.end = self.start + self.part_len
+        self.random = random
+ 
+    def __iter__(self):
+        # Extract examples between `start` and `end`, shuffle and return them.
+        indices = list(range(self.start, self.end))
+        if self.random:
+            random.shuffle(indices)
+        return iter(indices)
+ 
+    def __len__(self):
+        return self.part_len
+
+def get_train_data(rec_train, batch_size, data_nthreads, input_size, crop_ratio, args):
+    def train_batch_fn(batch, ctx):
+        data = batch[0].as_in_context(ctx)
+        label = batch[1].as_in_context(ctx)
+        return data, label
+
+    jitter_param = 0.4
+    lighting_param = 0.1
+    resize = int(math.ceil(input_size / crop_ratio))
+
+    train_transforms = []
+    if args.auto_aug:
+        print('Using AutoAugment')
+        from resnest.gluon.data_utils import AugmentationBlock, autoaug_imagenet_policies
+        train_transforms.append(AugmentationBlock(autoaug_imagenet_policies()))
+
+    if input_size >= 320:
+        train_transforms.extend([
+            ERandomCrop(input_size),
+            pth_transforms.Resize((input_size, input_size), interpolation=Image.BICUBIC),
+            pth_transforms.RandomHorizontalFlip(),
+            pth_transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4),
+            transforms.RandomLighting(lighting_param),
+            transforms.ToTensor(),
+            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
+        ])
+    else:
+        train_transforms.extend([
+            transforms.RandomResizedCrop(input_size),
+            transforms.RandomFlipLeftRight(),
+            transforms.RandomColorJitter(brightness=jitter_param, contrast=jitter_param,
+                                         saturation=jitter_param),
+            transforms.RandomLighting(lighting_param),
+            transforms.ToTensor(),
+            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
+        ])
+        
+    transform_train = transforms.Compose(train_transforms)
+
+    train_set = mx.gluon.data.vision.ImageRecordDataset(rec_train).transform_first(transform_train)
+    train_sampler = SplitSampler(len(train_set), num_parts=num_workers, part_index=rank)
+
+    train_data = gluon.data.DataLoader(train_set, batch_size=batch_size,# shuffle=True,
+                                       last_batch='discard', num_workers=data_nthreads,
+                                       sampler=train_sampler)
+    return train_data, train_batch_fn
+
+
+def get_val_data(rec_val, batch_size, data_nthreads, input_size, crop_ratio):
+    def val_batch_fn(batch, ctx):
+        data = batch[0].as_in_context(ctx)
+        label = batch[1].as_in_context(ctx)
+        return data, label
+
+    normalize = transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
+    crop_ratio = crop_ratio if crop_ratio > 0 else 0.875
+    resize = int(math.ceil(input_size/crop_ratio))
+
+
+    if input_size >= 320:
+        transform_test = transforms.Compose([
+            pth_transforms.ToPIL(),
+            ECenterCrop(input_size),
+            pth_transforms.Resize((input_size, input_size), interpolation=Image.BICUBIC),
+            pth_transforms.ToNDArray(),
+            transforms.ToTensor(),
+            normalize
+        ])
+    else:
+        transform_test = transforms.Compose([
+            transforms.Resize(resize, keep_ratio=True),
+            transforms.CenterCrop(input_size),
+            transforms.ToTensor(),
+            normalize
+        ])
+
+    val_set = mx.gluon.data.vision.ImageRecordDataset(rec_val).transform_first(transform_test)
+
+    val_sampler = SplitSampler(len(val_set), num_parts=num_workers, part_index=rank)
+    val_data = gluon.data.DataLoader(val_set, batch_size=batch_size,
+                                     num_workers=data_nthreads,
+                                     sampler=val_sampler)
+
+    return val_data, val_batch_fn
+
+# Horovod: pin GPU to local rank
+context = mx.cpu(local_rank) if args.no_cuda else mx.gpu(local_rank)
+
+train_data, train_batch_fn = get_train_data(args.rec_train, batch_size, args.data_nthreads,
+                                            args.input_size, args.crop_ratio, args)
+val_data, val_batch_fn = get_val_data(args.rec_val, batch_size, args.data_nthreads, args.input_size,
+                                      args.crop_ratio)
+
+# Get model from GluonCV model zoo
+# https://gluon-cv.mxnet.io/model_zoo/index.html
+kwargs = {'ctx': context,
+          'pretrained': args.use_pretrained,
+          'classes': num_classes,
+          'input_size': args.input_size}
+
+if args.last_gamma:
+    kwargs['last_gamma'] = True
+
+if args.dropblock_prob > 0:
+        kwargs['dropblock_prob'] = args.dropblock_prob
+
+net = get_model(args.model, **kwargs)
+net.cast(args.dtype)
+
+from resnest.gluon.dropblock import DropBlockScheduler
+# does not impact normal model
+drop_scheduler = DropBlockScheduler(net, 0, 0.1, args.num_epochs)
+
+if rank==0:
+    logging.info(net)
+
+# Create initializer
+initializer = mx.init.Xavier(rnd_type='gaussian', factor_type="in", magnitude=2)
+
+def train_gluon():
+    if args.save_dir:
+        save_dir = args.save_dir
+        save_dir = os.path.expanduser(save_dir)
+        mkdir(save_dir)
+    else:
+        save_dir = './'
+        save_frequency = 0
+
+    def evaluate(epoch):
+        acc_top1 = mx.metric.Accuracy()
+        acc_top5 = mx.metric.TopKAccuracy(5)
+        for _, batch in enumerate(val_data):
+            data, label = val_batch_fn(batch, context)
+            output = net(data.astype(args.dtype, copy=False))
+            acc_top1.update([label], [output])
+            acc_top5.update([label], [output])
+
+        top1_name, top1_acc = acc_top1.get()
+        top5_name, top5_acc = acc_top5.get()
+        if MPI is not None:
+            comm = MPI.COMM_WORLD
+            res1 = comm.gather(top1_acc, root=0)
+            res2 = comm.gather(top5_acc, root=0)
+        if rank==0:
+            if MPI is not None:
+                #logging.info('MPI gather res1: {}'.format(res1))
+                top1_acc = sum(res1) / len(res1)
+                top5_acc = sum(res2) / len(res2)
+            logging.info('Epoch[%d] Rank[%d]\tValidation-%s=%f\tValidation-%s=%f',
+                         epoch, rank, top1_name, top1_acc, top5_name, top5_acc)
+
+    # Hybridize and initialize model
+    net.hybridize()
+    if args.resume_params is not '':
+        net.load_parameters(args.resume_params, ctx = context)
+
+    else:
+        net.initialize(initializer, ctx=context)
+
+    if args.no_wd:
+        for k, v in net.collect_params('.*beta|.*gamma|.*bias').items():
+            v.wd_mult = 0.0
+
+    # Horovod: fetch and broadcast parameters
+    params = net.collect_params()
+    if params is not None:
+        hvd.broadcast_parameters(params, root_rank=0)
+
+    # Create optimizer
+    optimizer = 'nag'
+    optimizer_params = {'wd': args.wd,
+                        'momentum': args.momentum,
+                        'lr_scheduler': lr_sched}
+    if args.dtype == 'float16':
+        optimizer_params['multi_precision'] = True
+    opt = mx.optimizer.create(optimizer, **optimizer_params)
+
+    # Horovod: create DistributedTrainer, a subclass of gluon.Trainer
+    trainer = hvd.DistributedTrainer(params, opt)
+    if args.resume_states is not '':
+        trainer.load_states(args.resume_states)
+
+    # Create loss function and train metric
+    if args.label_smoothing or args.mixup:
+        sparse_label_loss = False
+    else:
+        sparse_label_loss = True
+
+    loss_fn = gluon.loss.SoftmaxCrossEntropyLoss(sparse_label=sparse_label_loss)
+    if args.mixup:
+        train_metric = mx.metric.RMSE()
+    else:
+        train_metric = mx.metric.Accuracy()
+
+    def mixup_transform(label, classes, lam=1, eta=0.0):
+        if isinstance(label, mx.nd.NDArray):
+            label = [label]
+        res = []
+        for l in label:
+            y1 = l.one_hot(classes, on_value = 1 - eta + eta/classes, off_value = eta/classes)
+            y2 = l[::-1].one_hot(classes, on_value = 1 - eta + eta/classes, off_value = eta/classes)
+            res.append(lam*y1 + (1-lam)*y2)
+        return res
+
+    def smooth(label, classes, eta=0.1):
+        if isinstance(label, mx.NDArray):
+            label = [label]
+        smoothed = []
+        for l in label:
+            res = l.one_hot(classes, on_value = 1 - eta + eta/classes, off_value = eta/classes)
+            smoothed.append(res)
+        return smoothed
+
+    # Train model
+    for epoch in range(args.resume_epoch, args.num_epochs):
+        drop_scheduler(epoch)
+        tic = time.time()
+        train_metric.reset()
+
+        btic = time.time()
+        for nbatch, batch in enumerate(train_data, start=1):
+            data, label = train_batch_fn(batch, context)
+            data, label = [data], [label]
+            if args.mixup:
+                lam = np.random.beta(args.mixup_alpha, args.mixup_alpha)
+                if epoch >= args.num_epochs - args.mixup_off_epoch:
+                    lam = 1
+                data = [lam*X + (1-lam)*X[::-1] for X in data]
+
+                if args.label_smoothing:
+                    eta = 0.1
+                else:
+                    eta = 0.0
+                label = mixup_transform(label, num_classes, lam, eta)
+
+            elif args.label_smoothing:
+                hard_label = label
+                label = smooth(label, num_classes)
+
+            with autograd.record():
+                outputs = [net(X.astype(args.dtype, copy=False)) for X in data]
+                loss = [loss_fn(yhat, y.astype(args.dtype, copy=False)) for yhat, y in zip(outputs, label)]
+            for l in loss:
+                l.backward()
+            trainer.step(batch_size)
+
+            if args.mixup:
+                output_softmax = [mx.nd.SoftmaxActivation(out.astype('float32', copy=False)) \
+                                  for out in outputs]
+                train_metric.update(label, output_softmax)
+            else:
+                if args.label_smoothing:
+                    train_metric.update(hard_label, outputs)
+                else:
+                    train_metric.update(label, outputs)
+
+            if args.log_interval and nbatch % args.log_interval == 0:
+                if rank == 0:
+                    logging.info('Epoch[%d] Batch[%d] Loss[%.3f]', epoch, nbatch,
+                                 loss[0].mean().asnumpy()[0])
+
+                    train_metric_name, train_metric_score = train_metric.get()
+                    logging.info('Epoch[%d] Rank[%d] Batch[%d]\t%s=%f\tlr=%f',
+                                 epoch, rank, nbatch, train_metric_name, train_metric_score, trainer.learning_rate)
+                btic = time.time()
+
+        # Report metrics
+        elapsed = time.time() - tic
+        _, acc = train_metric.get()
+        if rank == 0:
+            logging.info('Epoch[%d] Rank[%d] Batch[%d]\tTime cost=%.2f\tTrain-metric=%f',
+                         epoch, rank, nbatch, elapsed, acc)
+            epoch_speed = num_workers * batch_size * nbatch / elapsed
+            logging.info('Epoch[%d]\tSpeed: %.2f samples/sec', epoch, epoch_speed)
+
+        # Evaluate performance
+        if args.eval_frequency and (epoch + 1) % args.eval_frequency == 0:
+            evaluate(epoch)
+
+        # Save model
+        if args.save_frequency and (epoch + 1) % args.save_frequency == 0:
+            net.save_parameters('%s/imagenet-%s-%d.params'%(save_dir, args.model, epoch))
+            trainer.save_states('%s/imagenet-%s-%d.states'%(save_dir, args.model, epoch))
+
+    # Evaluate performance at the end of training
+    evaluate(epoch)
+
+    net.save_parameters('%s/imagenet-%s-%d.params'%(save_dir, args.model, args.num_epochs-1))
+    trainer.save_states('%s/imagenet-%s-%d.states'%(save_dir, args.model, args.num_epochs-1))
+
+if __name__ == '__main__':
+    train_gluon()
diff --git a/final-project/model_zoo/pytorch_resnest/scripts/gluon/verify.py b/final-project/model_zoo/pytorch_resnest/scripts/gluon/verify.py
new file mode 100644
index 0000000..f79b59c
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/scripts/gluon/verify.py
@@ -0,0 +1,155 @@
+
+#os.environ['MXNET_CUDNN_AUTOTUNE_DEFAULT'] = '0'
+
+import argparse, os, math, time, sys
+
+import mxnet as mx
+from mxnet import gluon
+from mxnet.gluon.data.vision import transforms
+from mxnet.contrib.quantization import *
+
+from resnest.gluon import get_model
+
+from PIL import Image
+
+# CLI
+def parse_args():
+    parser = argparse.ArgumentParser(description='Train a model for image classification.')
+    parser.add_argument('--data-dir', type=str, default='~/.encoding/data/ILSVRC2012/',
+                        help='Imagenet directory for validation.')
+    parser.add_argument('--rec-dir', type=str, default=None,
+                        help='recio directory for validation.')
+    parser.add_argument('--batch-size', type=int, default=32,
+                        help='training batch size per device (CPU/GPU).')
+    parser.add_argument('--num-gpus', type=int, default=8,
+                        help='number of gpus to use.')
+    parser.add_argument('-j', '--num-data-workers', dest='num_workers', default=32, type=int,
+                        help='number of preprocessing workers')
+    parser.add_argument('--model', type=str, default='model', required=False,
+                        help='type of model to use. see vision_model for options.')
+    parser.add_argument('--resume', type=str, default=None,
+                        help='put the path to resuming file if needed')
+    parser.add_argument('--crop-size', type=int, default=224,
+                        help='input shape of the image, default is 224.')
+    parser.add_argument('--crop-ratio', type=float, default=0.875,
+                        help='The ratio for crop and input size, for validation dataset only')
+    parser.add_argument('--dtype', type=str,
+                        help='training data type')
+    parser.add_argument('--dilation', type=int, default=1,
+                        help='network dilation. default 1 (no-dilation)')
+    opt = parser.parse_args()
+    return opt
+
+def test(network, ctx, val_data, batch_fn):
+    acc_top1 = mx.metric.Accuracy()
+    acc_top5 = mx.metric.TopKAccuracy(5)
+    acc_top1.reset()
+    acc_top5.reset()
+    num_batch = len(val_data)
+    num = 0
+    start = time.time()
+ 
+    iterator = enumerate(val_data)
+    next_i, next_batch = next(iterator)
+    next_data, next_label = batch_fn(next_batch, ctx)
+    stop = False
+    while not stop:
+        i = next_i
+        data = next_data
+        label = next_label
+        outputs = [network(X.astype(opt.dtype, copy=False)) for X in data]
+        try:
+            next_i, next_batch = next(iterator)
+            next_data, next_label = batch_fn(next_batch, ctx)
+            if next_i == 5:
+                # warm-up
+                num = 0
+                mx.nd.waitall()
+                start = time.time()
+        except StopIteration:
+            stop = True
+        acc_top1.update(label, outputs)
+        acc_top5.update(label, outputs)
+        _, top1 = acc_top1.get()
+        _, top5 = acc_top5.get()
+        print('%d / %d : %.8f, %.8f'%(i, num_batch, 1-top1, 1-top5))
+        num += batch_size
+
+    end = time.time()
+    speed = num / (end - start)
+    print('Throughput is %f img/sec.'% speed)
+
+    _, top1 = acc_top1.get()
+    _, top5 = acc_top5.get()
+    return (1-top1, 1-top5)
+
+
+if __name__ == '__main__':
+    opt = parse_args()
+
+    batch_size = opt.batch_size
+    classes = 1000
+
+    num_gpus = opt.num_gpus
+    if num_gpus > 0:
+        batch_size *= num_gpus
+    ctx = [mx.gpu(i) for i in range(num_gpus)] if num_gpus > 0 else [mx.cpu()]
+    num_workers = opt.num_workers
+
+    input_size = opt.crop_size
+    model_name = opt.model
+    pretrained = True if not opt.resume else False
+
+    kwargs = {'ctx': ctx, 'pretrained': pretrained, 'classes': classes}
+
+    if opt.dilation > 1:
+        kwargs['dilation'] = opt.dilation
+
+    net = get_model(model_name, **kwargs)
+    net.cast(opt.dtype)
+    if opt.resume:
+        net.load_parameters(opt.resume, ctx=ctx)
+    else:
+        net.hybridize()
+
+    normalize = transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
+    crop_ratio = opt.crop_ratio if opt.crop_ratio > 0 else 0.875
+    resize = int(math.ceil(input_size/crop_ratio))
+
+    if input_size >= 320:
+        from resnest.gluon.transforms import ECenterCrop
+        from resnest.gluon.data_utils import ToPIL, ToNDArray
+        transform_test = transforms.Compose([
+            ToPIL(),
+            ECenterCrop(input_size),
+            ToNDArray(),
+            transforms.ToTensor(),
+            normalize
+        ])
+    else:
+        transform_test = transforms.Compose([
+            transforms.Resize(resize, keep_ratio=True),
+            transforms.CenterCrop(input_size),
+            transforms.ToTensor(),
+            normalize
+        ])
+
+    if not opt.rec_dir:
+        from gluoncv.data import imagenet
+        val_data = gluon.data.DataLoader(
+            imagenet.classification.ImageNet(opt.data_dir, train=False).transform_first(transform_test),
+            batch_size=batch_size, shuffle=False, num_workers=num_workers)
+    else:
+        imgrec = os.path.join(opt.rec_dir, 'val.rec')
+        imgidx = os.path.join(opt.rec_dir, 'val.idx')
+        val_data = gluon.data.DataLoader(
+            mx.gluon.data.vision.ImageRecordDataset(imgrec).transform_first(transform_test),
+            batch_size=batch_size, shuffle=False, num_workers=num_workers)
+
+    def batch_fn(batch, ctx):
+        data = gluon.utils.split_and_load(batch[0], ctx_list=ctx, batch_axis=0)
+        label = gluon.utils.split_and_load(batch[1], ctx_list=ctx, batch_axis=0)
+        return data, label
+
+    err_top1_val, err_top5_val = test(net, ctx, val_data, batch_fn)
+    print(err_top1_val, err_top5_val)
diff --git a/final-project/model_zoo/pytorch_resnest/scripts/torch/train.py b/final-project/model_zoo/pytorch_resnest/scripts/torch/train.py
new file mode 100644
index 0000000..5d03d62
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/scripts/torch/train.py
@@ -0,0 +1,321 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+import os
+import time
+import json
+import logging
+import argparse
+
+import torch
+import torch.distributed as dist
+import torch.multiprocessing as mp
+from torch.nn.parallel import DistributedDataParallel
+
+from resnest.torch.config import get_cfg
+from resnest.torch.models.build import get_model
+from resnest.torch.datasets import get_dataset
+from resnest.torch.transforms import get_transform
+from resnest.torch.loss import get_criterion
+from resnest.torch.utils import (save_checkpoint, accuracy,
+        AverageMeter, LR_Scheduler, torch_dist_sum, mkdir,
+        cached_log_stream, PathManager)
+
+logger = logging.getLogger(__name__)
+logger.setLevel(logging.INFO)
+
+class Options():
+    def __init__(self):
+        # data settings
+        parser = argparse.ArgumentParser(description='ResNeSt Training')
+        parser.add_argument('--config-file', type=str, default=None,
+                            help='training configs')
+        parser.add_argument('--outdir', type=str, default='output',
+                            help='output directory')
+        # checking point
+        parser.add_argument('--resume', type=str, default=None,
+                            help='put the path to resuming file if needed')
+        # distributed
+        parser.add_argument('--world-size', default=1, type=int,
+                            help='number of nodes for distributed training')
+        parser.add_argument('--rank', default=0, type=int,
+                            help='node rank for distributed training')
+        parser.add_argument('--dist-url', default='tcp://localhost:23456', type=str,
+                            help='url used to set up distributed training')
+        parser.add_argument('--dist-backend', default='nccl', type=str,
+                            help='distributed backend')
+        # evaluation option
+        parser.add_argument('--eval-only', action='store_true', default= False,
+                            help='evaluating')
+        parser.add_argument('--export', type=str, default=None,
+                            help='put the path to resuming file if needed')
+        self.parser = parser
+
+    def parse(self):
+        args = self.parser.parse_args()
+        return args
+
+def main():
+    args = Options().parse()
+    ngpus_per_node = torch.cuda.device_count()
+    args.world_size = ngpus_per_node * args.world_size
+
+    # load config
+    cfg = get_cfg()
+    cfg.merge_from_file(args.config_file)
+
+    cfg.OPTIMIZER.LR = cfg.OPTIMIZER.LR * args.world_size
+    mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args, cfg))
+
+# global variable
+best_pred = 0.0
+acclist_train = []
+acclist_val = []
+
+def main_worker(gpu, ngpus_per_node, args, cfg):
+    args.gpu = gpu
+    args.rank = args.rank * ngpus_per_node + gpu
+    logger.info(f'rank: {args.rank} / {args.world_size}')
+    dist.init_process_group(backend=args.dist_backend,
+                            init_method=args.dist_url,
+                            world_size=args.world_size,
+                            rank=args.rank)
+    torch.cuda.set_device(args.gpu)
+    if args.gpu == 0:
+        mkdir(args.outdir)
+        filename = os.path.join(args.outdir, 'log.txt')
+        fh = logging.StreamHandler(cached_log_stream(filename))
+        fh.setLevel(logging.INFO)
+        logger.addHandler(fh)
+        plain_formatter = logging.Formatter(
+            "[%(asctime)s] %(name)s %(levelname)s: %(message)s", datefmt="%m/%d %H:%M:%S"
+        )
+        fh.setFormatter(plain_formatter)
+        logger.info(args)
+
+    # init the global
+    global best_pred, acclist_train, acclist_val
+
+    # seed
+    torch.manual_seed(cfg.SEED)
+    torch.cuda.manual_seed(cfg.SEED)
+
+    # init dataloader
+    transform_train, transform_val = get_transform(cfg.DATA.DATASET)(
+            cfg.DATA.BASE_SIZE, cfg.DATA.CROP_SIZE, cfg.DATA.RAND_AUG)
+    trainset = get_dataset(cfg.DATA.DATASET)(root=cfg.DATA.ROOT,
+                                             transform=transform_train,
+                                             train=True,
+                                             download=True)
+    valset = get_dataset(cfg.DATA.DATASET)(root=cfg.DATA.ROOT,
+                                           transform=transform_val,
+                                           train=False,
+                                           download=True)
+
+    train_sampler = torch.utils.data.distributed.DistributedSampler(trainset)
+    train_loader = torch.utils.data.DataLoader(
+        trainset, batch_size=cfg.TRAINING.BATCH_SIZE, shuffle=False,
+        num_workers=cfg.TRAINING.WORKERS, pin_memory=True,
+        sampler=train_sampler)
+
+    val_sampler = torch.utils.data.distributed.DistributedSampler(valset, shuffle=False)
+    val_loader = torch.utils.data.DataLoader(
+        valset, batch_size=cfg.TRAINING.TEST_BATCH_SIZE, shuffle=False,
+        num_workers=cfg.TRAINING.WORKERS, pin_memory=True,
+        sampler=val_sampler)
+    
+    # init the model
+    model_kwargs = {}
+    if cfg.MODEL.FINAL_DROP > 0.0:
+        model_kwargs['final_drop'] = cfg.MODEL.FINAL_DROP
+
+    if cfg.TRAINING.LAST_GAMMA:
+        model_kwargs['last_gamma'] = True
+
+    model = get_model(cfg.MODEL.NAME)(**model_kwargs)
+
+    if args.gpu == 0:
+        logger.info(model)
+
+    criterion, train_loader = get_criterion(cfg, train_loader, args.gpu)
+
+    model.cuda(args.gpu)
+    criterion.cuda(args.gpu)
+    model = DistributedDataParallel(model, device_ids=[args.gpu])
+
+    # criterion and optimizer
+    if cfg.OPTIMIZER.DISABLE_BN_WD:
+        parameters = model.named_parameters()
+        param_dict = {}
+        for k, v in parameters:
+            param_dict[k] = v
+        bn_params = [v for n, v in param_dict.items() if ('bn' in n or 'bias' in n)]
+        rest_params = [v for n, v in param_dict.items() if not ('bn' in n or 'bias' in n)]
+        if args.gpu == 0:
+            logger.info(" Weight decay NOT applied to BN parameters ")
+            logger.info(f'len(parameters): {len(list(model.parameters()))} = {len(bn_params)} + {len(rest_params)}')
+        optimizer = torch.optim.SGD([{'params': bn_params, 'weight_decay': 0 },
+                                     {'params': rest_params, 'weight_decay': cfg.OPTIMIZER.WEIGHT_DECAY}],
+                                    lr=cfg.OPTIMIZER.LR,
+                                    momentum=cfg.OPTIMIZER.MOMENTUM,
+                                    weight_decay=cfg.OPTIMIZER.WEIGHT_DECAY)
+    else:
+        optimizer = torch.optim.SGD(model.parameters(),
+                                    lr=cfg.OPTIMIZER.LR,
+                                    momentum=cfg.OPTIMIZER.MOMENTUM,
+                                    weight_decay=cfg.OPTIMIZER.WEIGHT_DECAY)
+    # check point
+    if args.resume is not None:
+        if os.path.isfile(args.resume):
+            if args.gpu == 0:
+                logger.info(f"=> loading checkpoint '{args.resume}'")
+            with PathManager.open(args.resume, "rb") as f:
+                checkpoint = torch.load(f)
+            cfg.TRAINING.START_EPOCHS = checkpoint['epoch'] + 1 if cfg.TRAINING.START_EPOCHS == 0 \
+                    else cfg.TRAINING.START_EPOCHS
+            best_pred = checkpoint['best_pred']
+            acclist_train = checkpoint['acclist_train']
+            acclist_val = checkpoint['acclist_val']
+            model.module.load_state_dict(checkpoint['state_dict'])
+            optimizer.load_state_dict(checkpoint['optimizer'])
+            if args.gpu == 0:
+                logger.info(f"=> loaded checkpoint '{args.resume}' (epoch {checkpoint['epoch']})")
+        else:
+            raise RuntimeError (f"=> no resume checkpoint found at '{args.resume}'")
+
+    scheduler = LR_Scheduler(cfg.OPTIMIZER.LR_SCHEDULER,
+                             base_lr=cfg.OPTIMIZER.LR,
+                             num_epochs=cfg.TRAINING.EPOCHS,
+                             iters_per_epoch=len(train_loader),
+                             warmup_epochs=cfg.OPTIMIZER.WARMUP_EPOCHS)
+    def train(epoch):
+        train_sampler.set_epoch(epoch)
+        model.train()
+        losses = AverageMeter()
+        top1 = AverageMeter()
+        global best_pred, acclist_train
+        for batch_idx, (data, target) in enumerate(train_loader):
+            scheduler(optimizer, batch_idx, epoch, best_pred)
+            if not cfg.DATA.MIXUP:
+                data, target = data.cuda(args.gpu), target.cuda(args.gpu)
+            optimizer.zero_grad()
+            output = model(data)
+            loss = criterion(output, target)
+            loss.backward()
+            optimizer.step()
+
+            if not cfg.DATA.MIXUP:
+                acc1 = accuracy(output, target, topk=(1,))
+                top1.update(acc1[0], data.size(0))
+
+            losses.update(loss.item(), data.size(0))
+            if batch_idx % 100 == 0 and args.gpu == 0:
+                if cfg.DATA.MIXUP:
+                    logger.info('Batch: %d| Loss: %.3f'%(batch_idx, losses.avg))
+                else:
+                    logger.info('Batch: %d| Loss: %.3f | Top1: %.3f'%(batch_idx, losses.avg, top1.avg))
+
+        acclist_train += [top1.avg]
+
+    def validate(epoch):
+        model.eval()
+        top1 = AverageMeter()
+        top5 = AverageMeter()
+        global best_pred, acclist_train, acclist_val
+        is_best = False
+        for batch_idx, (data, target) in enumerate(val_loader):
+            data, target = data.cuda(args.gpu), target.cuda(args.gpu)
+            with torch.no_grad():
+                output = model(data)
+                acc1, acc5 = accuracy(output, target, topk=(1, 5))
+                top1.update(acc1[0], data.size(0))
+                top5.update(acc5[0], data.size(0))
+
+        # sum all
+        sum1, cnt1, sum5, cnt5 = torch_dist_sum(args.gpu, top1.sum, top1.count, top5.sum, top5.count)
+        top1_acc = sum(sum1) / sum(cnt1)
+        top5_acc = sum(sum5) / sum(cnt5)
+
+        if args.gpu == 0:
+            logger.info('Validation: Top1: %.3f | Top5: %.3f'%(top1_acc, top5_acc))
+            if args.eval_only:
+                return top1_acc, top5_acc
+
+            # save checkpoint
+            acclist_val += [top1_acc]
+            if top1_acc > best_pred:
+                best_pred = top1_acc 
+                is_best = True
+            save_checkpoint({
+                    'epoch': epoch,
+                    'state_dict': model.module.state_dict(),
+                    'optimizer': optimizer.state_dict(),
+                    'best_pred': best_pred,
+                    'acclist_train':acclist_train,
+                    'acclist_val':acclist_val,
+                },
+                directory=args.outdir,
+                is_best=False,
+                filename=f'checkpoint_{epoch}.pth')
+        return top1_acc.item(), top5_acc.item()
+
+    if args.export:
+        if args.gpu == 0:
+            with PathManager.open(args.export + '.pth', "wb") as f:
+                torch.save(model.module.state_dict(), f)
+        return
+
+    if args.eval_only:
+        top1_acc, top5_acc = validate(cfg.TRAINING.START_EPOCHS)
+        metrics = {
+            "top1": top1_acc,
+            "top5": top5_acc,
+        }
+        if args.gpu == 0:
+            with PathManager.open(os.path.join(args.outdir, 'metrics.json'), "w") as f:
+                json.dump(metrics, f)
+        return
+
+    for epoch in range(cfg.TRAINING.START_EPOCHS, cfg.TRAINING.EPOCHS):
+        tic = time.time()
+        train(epoch)
+        if epoch % 10 == 0:
+            top1_acc, top5_acc = validate(epoch)
+        elapsed = time.time() - tic
+        if args.gpu == 0:
+            logger.info(f'Epoch: {epoch}, Time cost: {elapsed}')
+
+    # final evaluation
+    top1_acc, top5_acc = validate(cfg.TRAINING.START_EPOCHS - 1)
+    if args.gpu == 0:
+        # save final checkpoint
+        save_checkpoint({
+                'epoch': cfg.TRAINING.EPOCHS - 1,
+                'state_dict': model.module.state_dict(),
+                'optimizer': optimizer.state_dict(),
+                'best_pred': best_pred,
+                'acclist_train':acclist_train,
+                'acclist_val':acclist_val,
+            },
+            directory=args.outdir,
+            is_best=False,
+            filename='checkpoint_final.pth')
+
+        # save final model weights
+        with PathManager.open(os.path.join(args.outdir, 'model_weights.pth'), "wb") as f:
+            torch.save(model.module.state_dict(), f)
+
+        metrics = {
+            "top1": top1_acc,
+            "top5": top5_acc,
+        }
+        with PathManager.open(os.path.join(args.outdir, 'metrics.json'), "w") as f:
+            json.dump(metrics, f)
+
+if __name__ == "__main__":
+    main()
diff --git a/final-project/model_zoo/pytorch_resnest/scripts/torch/verify.py b/final-project/model_zoo/pytorch_resnest/scripts/torch/verify.py
new file mode 100644
index 0000000..ec96fa0
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/scripts/torch/verify.py
@@ -0,0 +1,198 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## This source code is licensed under the MIT-style license found in the
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+from __future__ import print_function
+import os
+import argparse
+from tqdm import tqdm
+
+import torch
+import torch.nn as nn
+
+import PIL
+import torchvision.transforms as transforms
+import torchvision.datasets as datasets
+
+import warnings
+warnings.filterwarnings("ignore", "(Possibly )?corrupt EXIF data", UserWarning)
+
+class Options():
+    def __init__(self):
+        # data settings
+        parser = argparse.ArgumentParser(description='Deep Encoding')
+        parser.add_argument('--base-size', type=int, default=None,
+                            help='base image size')
+        parser.add_argument('--crop-size', type=int, default=224,
+                            help='crop image size')
+        # model params 
+        parser.add_argument('--model', type=str, default='densenet',
+                            help='network model type (default: densenet)')
+        # training hyper params
+        parser.add_argument('--batch-size', type=int, default=128, metavar='N',
+                            help='batch size for training (default: 128)')
+        parser.add_argument('--workers', type=int, default=32,
+                            metavar='N', help='dataloader threads')
+        # cuda, seed and logging
+        parser.add_argument('--no-cuda', action='store_true', 
+                            default=False, help='disables CUDA training')
+        parser.add_argument('--seed', type=int, default=1, metavar='S',
+                            help='random seed (default: 1)')
+        # checking point
+        parser.add_argument('--resume', type=str, default=None,
+                            help='put the path to resuming file if needed')
+        parser.add_argument('--verify', type=str, default=None,
+                            help='put the path to resuming file if needed')
+        self.parser = parser
+
+    def parse(self):
+        args = self.parser.parse_args()
+        return args
+
+
+def main():
+    # init the args
+    args = Options().parse()
+    args.cuda = not args.no_cuda and torch.cuda.is_available()
+    print(args)
+    torch.manual_seed(args.seed)
+    if args.cuda:
+        torch.cuda.manual_seed(args.seed)
+    # init dataloader
+    interp = PIL.Image.BILINEAR if args.crop_size < 320 else PIL.Image.BICUBIC
+    base_size = args.base_size if args.base_size is not None else int(1.0 * args.crop_size / 0.875)
+    transform_val = transforms.Compose([
+        ECenterCrop(args.crop_size),
+        transforms.ToTensor(),
+        transforms.Normalize(mean=[0.485, 0.456, 0.406],
+                             std=[0.229, 0.224, 0.225]),
+    ])
+    valset = ImageNetDataset(transform=transform_val, train=False)
+    val_loader = torch.utils.data.DataLoader(
+        valset, batch_size=args.batch_size, shuffle=False,
+        num_workers=args.workers, pin_memory=True if args.cuda else False)
+    
+    # init the model
+    model_kwargs = {}
+
+    assert args.model in torch.hub.list('zhanghang1989/ResNeSt', force_reload=True)
+    model = torch.hub.load('zhanghang1989/ResNeSt', args.model, pretrained=True)
+    print(model)
+
+    if args.cuda:
+        model.cuda()
+        # Please use CUDA_VISIBLE_DEVICES to control the number of gpus
+        model = nn.DataParallel(model)
+
+    # checkpoint
+    if args.verify:
+        if os.path.isfile(args.verify):
+            print("=> loading checkpoint '{}'".format(args.verify))
+            model.module.load_state_dict(torch.load(args.verify))
+        else:
+            raise RuntimeError ("=> no verify checkpoint found at '{}'".\
+                format(args.verify))
+    elif args.resume is not None:
+        if os.path.isfile(args.resume):
+            print("=> loading checkpoint '{}'".format(args.resume))
+            checkpoint = torch.load(args.resume)
+            model.module.load_state_dict(checkpoint['state_dict'])
+        else:
+            raise RuntimeError ("=> no resume checkpoint found at '{}'".\
+                format(args.resume))
+
+    model.eval()
+    top1 = AverageMeter()
+    top5 = AverageMeter()
+    is_best = False
+    tbar = tqdm(val_loader, desc='\r')
+    for batch_idx, (data, target) in enumerate(tbar):
+        if args.cuda:
+            data, target = data.cuda(), target.cuda()
+        with torch.no_grad():
+            output = model(data)
+            acc1, acc5 = accuracy(output, target, topk=(1, 5))
+            top1.update(acc1[0], data.size(0))
+            top5.update(acc5[0], data.size(0))
+
+        tbar.set_description('Top1: %.3f | Top5: %.3f'%(top1.avg, top5.avg))
+
+    print('Top1 Acc: %.3f | Top5 Acc: %.3f '%(top1.avg, top5.avg))
+
+class ECenterCrop:
+    """Crop the given PIL Image and resize it to desired size.
+    Args:
+        img (PIL Image): Image to be cropped. (0,0) denotes the top left corner of the image.
+        output_size (sequence or int): (height, width) of the crop box. If int,
+            it is used for both directions
+    Returns:
+        PIL Image: Cropped image.
+    """
+    def __init__(self, imgsize):
+        self.imgsize = imgsize
+        self.resize_method = transforms.Resize((imgsize, imgsize), interpolation=PIL.Image.BICUBIC)
+
+    def __call__(self, img):
+        image_width, image_height = img.size
+        image_short = min(image_width, image_height)
+
+        crop_size = float(self.imgsize) / (self.imgsize + 32) * image_short
+
+        crop_height, crop_width = crop_size, crop_size
+        crop_top = int(round((image_height - crop_height) / 2.))
+        crop_left = int(round((image_width - crop_width) / 2.))
+        img = img.crop((crop_left, crop_top, crop_left + crop_width, crop_top + crop_height))
+        return self.resize_method(img)
+
+class ImageNetDataset(datasets.ImageFolder):
+    BASE_DIR = "ILSVRC2012"
+    def __init__(self, root=os.path.expanduser('~/.encoding/data'), transform=None,
+                 target_transform=None, train=True, **kwargs):
+        split='train' if train == True else 'val'
+        root = os.path.join(root, self.BASE_DIR, split)
+        super(ImageNetDataset, self).__init__(root, transform, target_transform)
+
+def accuracy(output, target, topk=(1,)):
+    """Computes the accuracy over the k top predictions for the specified values of k"""
+    with torch.no_grad():
+        maxk = max(topk)
+        batch_size = target.size(0)
+
+        _, pred = output.topk(maxk, 1, True, True)
+        pred = pred.t()
+        correct = pred.eq(target.view(1, -1).expand_as(pred))
+
+        res = []
+        for k in topk:
+            correct_k = correct[:k].view(-1).float().sum(0, keepdim=True)
+            res.append(correct_k.mul_(100.0 / batch_size))
+        return res
+
+class AverageMeter(object):
+    """Computes and stores the average and current value"""
+    def __init__(self):
+        self.reset()
+
+    def reset(self):
+        #self.val = 0
+        self.sum = 0
+        self.count = 0
+
+    def update(self, val, n=1):
+        #self.val = val
+        self.sum += val * n
+        self.count += n
+
+    @property
+    def avg(self):
+        avg = 0 if self.count == 0 else self.sum / self.count
+        return avg
+
+if __name__ == "__main__":
+    main()
+
diff --git a/final-project/model_zoo/pytorch_resnest/setup.py b/final-project/model_zoo/pytorch_resnest/setup.py
new file mode 100644
index 0000000..654b339
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/setup.py
@@ -0,0 +1,65 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+import io
+import os
+import subprocess
+
+from setuptools import setup, find_packages
+
+cwd = os.path.dirname(os.path.abspath(__file__))
+
+version = '0.0.6'
+try:
+    if not os.getenv('RELEASE'):
+        from datetime import date
+        today = date.today()
+        day = today.strftime("b%Y%m%d")
+        version += day
+except Exception:
+    pass
+
+def create_version_file():
+    global version, cwd
+    print('-- Building version ' + version)
+    version_path = os.path.join(cwd, 'resnest', 'version.py')
+    with open(version_path, 'w') as f:
+        f.write('"""This is resnest version file."""\n')
+        f.write("__version__ = '{}'\n".format(version))
+
+requirements = [
+    'numpy',
+    'tqdm',
+    'nose',
+    'torch>=1.0.0',
+    'Pillow',
+    'scipy',
+    'requests',
+    'iopath',
+    'fvcore',
+]
+
+if __name__ == '__main__':
+    create_version_file()
+    setup(
+        name="resnest",
+        version=version,
+        author="Hang Zhang",
+        author_email="zhanghang0704@gmail.com",
+        url="https://github.com/zhanghang1989/ResNeSt",
+        description="ResNeSt",
+        long_description=open('README.md').read(),
+        long_description_content_type='text/markdown',
+        license='Apache-2.0',
+        install_requires=requirements,
+        packages=find_packages(exclude=["scripts", "examples", "tests"]),
+        package_data={'resnest': [
+            'LICENSE',
+        ]},
+    )
+
diff --git a/final-project/model_zoo/pytorch_resnest/tests/test_gluon.py b/final-project/model_zoo/pytorch_resnest/tests/test_gluon.py
new file mode 100644
index 0000000..e823306
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/tests/test_gluon.py
@@ -0,0 +1,28 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## This source code is licensed under the MIT-style license found in the
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+import mxnet as mx
+
+def test_model_inference():
+    # get all models
+    from resnest.gluon.model_store import _model_sha1
+    from resnest.gluon import get_model
+
+    model_list = _model_sha1.keys()
+
+    x = mx.random.uniform(shape=(1, 3, 224, 224))
+    for model_name in model_list:
+        print('Doing: ', model_name)
+        model = get_model(model_name, pretrained=True)
+        y = model(x)
+
+if __name__ == "__main__":
+    import nose
+    nose.runmodule()
+
diff --git a/final-project/model_zoo/pytorch_resnest/tests/test_radix_major.py b/final-project/model_zoo/pytorch_resnest/tests/test_radix_major.py
new file mode 100644
index 0000000..391fc95
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/tests/test_radix_major.py
@@ -0,0 +1,133 @@
+import numpy as np
+import torch
+from torch import nn
+import torch.nn.functional as F
+from torch.nn import Conv2d, Module, Linear, BatchNorm2d, ReLU
+from torch.nn.modules.utils import _pair
+
+from resnest.torch.models.splat import SplAtConv2d, DropBlock2D
+
+class RadixMajorNaiveImp(Module):
+    """Split-Attention Conv2d
+    """
+    def __init__(self, in_channels, channels, kernel_size, stride=(1, 1), padding=(0, 0),
+                 dilation=(1, 1), groups=1, bias=True,
+                 radix=2, reduction_factor=4,
+                 rectify=False, rectify_avg=False, norm_layer=None,
+                 dropblock_prob=0.0, **kwargs):
+        super(RadixMajorNaiveImp, self).__init__()
+        padding = _pair(padding)
+        self.rectify = rectify and (padding[0] > 0 or padding[1] > 0)
+        self.rectify_avg = rectify_avg
+        inter_channels = max(in_channels*radix//reduction_factor, 32)
+        self.radix = radix
+        self.cardinality = groups
+        self.channels = channels
+        self.dropblock_prob = dropblock_prob
+        if self.rectify:
+            from rfconv import RFConv2d
+            self.conv = RFConv2d(in_channels, channels*radix, kernel_size, stride, padding, dilation,
+                                 groups=groups*radix, bias=bias, average_mode=rectify_avg, **kwargs)
+        else:
+            self.conv = Conv2d(in_channels, channels*radix, kernel_size, stride, padding, dilation,
+                               groups=groups*radix, bias=bias, **kwargs)
+        self.use_bn = norm_layer is not None
+        assert not self.use_bn
+
+        self.relu = ReLU(inplace=True)
+        cardinal_group_width = channels // groups
+        cardinal_inter_channels = inter_channels // groups
+
+        self.fc1 = nn.ModuleList([nn.Linear(cardinal_group_width, cardinal_inter_channels) for _ in range(groups)])
+        self.fc2 = nn.ModuleList([nn.Linear(cardinal_inter_channels, cardinal_group_width*radix) for _ in range(groups)])
+
+        if dropblock_prob > 0.0:
+            self.dropblock = DropBlock2D(dropblock_prob, 3)
+
+    def forward(self, x):
+        x = self.conv(x)
+        if self.dropblock_prob > 0.0:
+            x = self.dropblock(x)
+        x = self.relu(x)
+
+        batch, channel = x.shape[:2]
+        cardinality = self.cardinality
+        radix = self.radix
+
+        tiny_group_width = channel//radix//cardinality
+        all_groups = torch.split(x, tiny_group_width, dim=1)
+
+        out = []
+        for k in range(cardinality):
+            U_k = [all_groups[r * cardinality + k] for r in range(radix)]
+            U_k = sum(U_k)
+            gap_k = F.adaptive_avg_pool2d(U_k, 1).squeeze()
+            atten_k = self.fc2[k](self.fc1[k](gap_k))
+            if radix > 1:
+                x_k = [all_groups[r * cardinality + k] for r in range(radix)]
+                x_k = torch.cat(x_k, dim=1)
+                atten_k = atten_k.view(batch, radix, -1)
+                atten_k = F.softmax(atten_k, dim=1)
+            else:
+                x_k = all_groups[k]
+                atten_k = torch.sigmoid(atten_k)
+            attended_k = x_k * atten_k.view(batch, -1, 1, 1)
+            out_k = sum(torch.split(attended_k, attended_k.size(1)//self.radix, dim=1))
+            out.append(out_k)
+ 
+        return torch.cat(out, dim=1).contiguous()
+
+@torch.no_grad()
+def sync_weigths(m1, m2):
+    m1.conv.weight.copy_(torch.from_numpy(m2.conv.weight.data.numpy()))
+    nn.init.ones_(m1.fc1.weight)
+    nn.init.ones_(m1.fc2.weight)
+    nn.init.zeros_(m1.fc1.bias)
+    nn.init.zeros_(m1.fc2.bias)
+    for m in m2.fc1:
+        nn.init.ones_(m.weight)
+        nn.init.zeros_(m.bias)
+    for m in m2.fc2:
+        nn.init.ones_(m.weight)
+        nn.init.zeros_(m.bias)
+
+def _AssertTensorClose(a, b, atol=1e-3, rtol=1e-3):
+    npa, npb = a.cpu().detach().numpy(), b.cpu().detach().numpy()
+    assert np.allclose(npa, npb, atol=atol), \
+        'Tensor close check failed\n{}\n{}\nadiff={}, rdiff={}'.format(
+            a, b, np.abs(npa - npb).max(), np.abs((npa - npb) / np.fmax(npa, 1e-5)).max())
+
+def test_radix_major():
+    device = torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
+    def compare_two_imp(batch, height, width,
+                        in_channels, channels,
+                        kernel_size, stride, padding,
+                        radix, groups):
+        layer1 = SplAtConv2d(in_channels, channels, kernel_size, stride, padding, radix=radix, groups=groups, bias=False)
+        layer2 = RadixMajorNaiveImp(in_channels, channels, kernel_size, stride, padding, radix=radix, groups=groups, bias=False)
+        sync_weigths(layer1, layer2)
+        layer1 = layer1.to(device)
+        layer2 = layer2.to(device)
+        x = torch.rand(batch, in_channels, height, width).to(device)
+        y1 = layer1(x)
+        y2 = layer2(x)
+        _AssertTensorClose(y1, y2)
+
+    for batch in [2, 4, 8, 32]:
+        for height in [7, 14, 28, 56]:
+            width = height
+            for in_channels in [16, 64, 128]:
+                channels = in_channels
+                for kernel_size in [3, 5]:
+                     padding = kernel_size // 2
+                     for stride in [1, 2]:
+                        for radix in [1, 2, 4]:
+                            for groups in [1, 2, 4]:
+                                compare_two_imp(
+                                    batch, height, width, in_channels,
+                                    channels, kernel_size, stride, padding,
+                                    radix, groups)
+
+if __name__ == "__main__":
+    import nose
+    nose.runmodule()
diff --git a/final-project/model_zoo/pytorch_resnest/tests/test_torch.py b/final-project/model_zoo/pytorch_resnest/tests/test_torch.py
new file mode 100644
index 0000000..430d9e9
--- /dev/null
+++ b/final-project/model_zoo/pytorch_resnest/tests/test_torch.py
@@ -0,0 +1,31 @@
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+## Created by: Hang Zhang
+## Email: zhanghang0704@gmail.com
+## Copyright (c) 2020
+##
+## This source code is licensed under the MIT-style license found in the
+## LICENSE file in the root directory of this source tree 
+##+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
+
+import torch
+import importlib
+import inspect
+
+def test_model_inference():
+    # get all models
+    import resnest.torch as module
+    functions = inspect.getmembers(module, inspect.isfunction)
+    model_list = [f[0] for f in functions]
+
+    get_model = importlib.import_module('resnest.torch')
+    x = torch.rand(1, 3, 224, 224)
+    for model_name in model_list:
+        print('Doing: ', model_name)
+        net = getattr(get_model, model_name)
+        model = net(pretrained=True)
+        model.eval()
+        y = model(x)
+
+if __name__ == "__main__":
+    import nose
+    nose.runmodule()
diff --git a/final-project/model_zoo/swin/swin_transformer.py b/final-project/model_zoo/swin/swin_transformer.py
new file mode 100644
index 0000000..a66366c
--- /dev/null
+++ b/final-project/model_zoo/swin/swin_transformer.py
@@ -0,0 +1,622 @@
+import torch
+import torch.nn as nn
+import torch.utils.checkpoint as checkpoint
+from timm.models.layers import DropPath, to_2tuple, trunc_normal_
+
+def get_swin(ckpt):
+    '''
+    According to model type, define model parameters
+    Reference: https://github.com/microsoft/Swin-Transformer/tree/main/configs
+    '''
+    if ckpt == "./model_zoo/swin/swin_base_patch4_window7_224.pth":
+        img_size = 224
+        path_size = 4
+        window_size = 7
+        embed_dim = 128
+        depths = [ 2, 2, 18, 2 ]
+        num_heads = [ 4, 8, 16, 32 ]
+        drop_path_rate = 0.5
+    elif ckpt == "./model_zoo/swin/swin_large_patch4_window12_384_22kto1k.pth":
+        img_size = 384
+        path_size = 4
+        window_size = 12
+        embed_dim = 192
+        depths = [ 2, 2, 18, 2 ]
+        num_heads = [ 6, 12, 24, 48 ]
+        drop_path_rate = 0.2
+    elif ckpt == "./model_zoo/swin/swin_tiny_patch4_window7_224.pth":
+        img_size = 224
+        path_size = 4
+        window_size = 7
+        embed_dim = 96
+        depths = [ 2, 2, 6, 2 ]
+        num_heads = [ 3, 6, 12, 24 ]
+        drop_path_rate = 0.2
+
+
+    model = SwinTransformer(img_size=img_size,
+                            patch_size=path_size,
+                            in_chans=3,
+                            num_classes=1000,
+                            embed_dim=embed_dim,
+                            depths=depths,
+                            num_heads=num_heads,
+                            window_size=window_size,
+                            mlp_ratio=4.,
+                            qkv_bias=True,
+                            qk_scale=None,
+                            drop_rate=0.0,
+                            drop_path_rate=drop_path_rate,
+                            ape=False,
+                            patch_norm=True,
+                            use_checkpoint=False)
+    checkpoint = torch.load(ckpt)
+    model.load_state_dict(checkpoint, strict=False)
+
+    return model
+
+
+class Mlp(nn.Module):
+    def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):
+        super().__init__()
+        out_features = out_features or in_features
+        hidden_features = hidden_features or in_features
+        self.fc1 = nn.Linear(in_features, hidden_features)
+        self.act = act_layer()
+        self.fc2 = nn.Linear(hidden_features, out_features)
+        self.drop = nn.Dropout(drop)
+
+    def forward(self, x):
+        x = self.fc1(x)
+        x = self.act(x)
+        x = self.drop(x)
+        x = self.fc2(x)
+        x = self.drop(x)
+        return x
+
+
+def window_partition(x, window_size):
+    """
+    Args:
+        x: (B, H, W, C)
+        window_size (int): window size
+    Returns:
+        windows: (num_windows*B, window_size, window_size, C)
+    """
+    B, H, W, C = x.shape
+    x = x.view(B, H // window_size, window_size, W // window_size, window_size, C)
+    windows = x.permute(0, 1, 3, 2, 4, 5).contiguous().view(-1, window_size, window_size, C)
+    return windows
+
+
+def window_reverse(windows, window_size, H, W):
+    """
+    Args:
+        windows: (num_windows*B, window_size, window_size, C)
+        window_size (int): Window size
+        H (int): Height of image
+        W (int): Width of image
+    Returns:
+        x: (B, H, W, C)
+    """
+    B = int(windows.shape[0] / (H * W / window_size / window_size))
+    x = windows.view(B, H // window_size, W // window_size, window_size, window_size, -1)
+    x = x.permute(0, 1, 3, 2, 4, 5).contiguous().view(B, H, W, -1)
+    return x
+
+
+class WindowAttention(nn.Module):
+    r""" Window based multi-head self attention (W-MSA) module with relative position bias.
+    It supports both of shifted and non-shifted window.
+    Args:
+        dim (int): Number of input channels.
+        window_size (tuple[int]): The height and width of the window.
+        num_heads (int): Number of attention heads.
+        qkv_bias (bool, optional):  If True, add a learnable bias to query, key, value. Default: True
+        qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set
+        attn_drop (float, optional): Dropout ratio of attention weight. Default: 0.0
+        proj_drop (float, optional): Dropout ratio of output. Default: 0.0
+    """
+
+    def __init__(self, dim, window_size, num_heads, qkv_bias=True, qk_scale=None, attn_drop=0., proj_drop=0.):
+
+        super().__init__()
+        self.dim = dim
+        self.window_size = window_size  # Wh, Ww
+        self.num_heads = num_heads
+        head_dim = dim // num_heads
+        self.scale = qk_scale or head_dim ** -0.5
+
+        # define a parameter table of relative position bias
+        self.relative_position_bias_table = nn.Parameter(
+            torch.zeros((2 * window_size[0] - 1) * (2 * window_size[1] - 1), num_heads))  # 2*Wh-1 * 2*Ww-1, nH
+
+        # get pair-wise relative position index for each token inside the window
+        coords_h = torch.arange(self.window_size[0])
+        coords_w = torch.arange(self.window_size[1])
+        coords = torch.stack(torch.meshgrid([coords_h, coords_w]))  # 2, Wh, Ww
+        coords_flatten = torch.flatten(coords, 1)  # 2, Wh*Ww
+        relative_coords = coords_flatten[:, :, None] - coords_flatten[:, None, :]  # 2, Wh*Ww, Wh*Ww
+        relative_coords = relative_coords.permute(1, 2, 0).contiguous()  # Wh*Ww, Wh*Ww, 2
+        relative_coords[:, :, 0] += self.window_size[0] - 1  # shift to start from 0
+        relative_coords[:, :, 1] += self.window_size[1] - 1
+        relative_coords[:, :, 0] *= 2 * self.window_size[1] - 1
+        relative_position_index = relative_coords.sum(-1)  # Wh*Ww, Wh*Ww
+        self.register_buffer("relative_position_index", relative_position_index)
+
+        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
+        self.attn_drop = nn.Dropout(attn_drop)
+        self.proj = nn.Linear(dim, dim)
+        self.proj_drop = nn.Dropout(proj_drop)
+
+        trunc_normal_(self.relative_position_bias_table, std=.02)
+        self.softmax = nn.Softmax(dim=-1)
+
+    def forward(self, x, mask=None):
+        """
+        Args:
+            x: input features with shape of (num_windows*B, N, C)
+            mask: (0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None
+        """
+        B_, N, C = x.shape
+        qkv = self.qkv(x).reshape(B_, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)
+        q, k, v = qkv[0], qkv[1], qkv[2]  # make torchscript happy (cannot use tensor as tuple)
+
+        q = q * self.scale
+        attn = (q @ k.transpose(-2, -1))
+
+        relative_position_bias = self.relative_position_bias_table[self.relative_position_index.view(-1)].view(
+            self.window_size[0] * self.window_size[1], self.window_size[0] * self.window_size[1], -1)  # Wh*Ww,Wh*Ww,nH
+        relative_position_bias = relative_position_bias.permute(2, 0, 1).contiguous()  # nH, Wh*Ww, Wh*Ww
+        attn = attn + relative_position_bias.unsqueeze(0)
+
+        if mask is not None:
+            nW = mask.shape[0]
+            attn = attn.view(B_ // nW, nW, self.num_heads, N, N) + mask.unsqueeze(1).unsqueeze(0)
+            attn = attn.view(-1, self.num_heads, N, N)
+            attn = self.softmax(attn)
+        else:
+            attn = self.softmax(attn)
+
+        attn = self.attn_drop(attn)
+
+        x = (attn @ v).transpose(1, 2).reshape(B_, N, C)
+        x = self.proj(x)
+        x = self.proj_drop(x)
+        return x
+
+    def extra_repr(self) -> str:
+        return f'dim={self.dim}, window_size={self.window_size}, num_heads={self.num_heads}'
+
+    def flops(self, N):
+        # calculate flops for 1 window with token length of N
+        flops = 0
+        # qkv = self.qkv(x)
+        flops += N * self.dim * 3 * self.dim
+        # attn = (q @ k.transpose(-2, -1))
+        flops += self.num_heads * N * (self.dim // self.num_heads) * N
+        #  x = (attn @ v)
+        flops += self.num_heads * N * N * (self.dim // self.num_heads)
+        # x = self.proj(x)
+        flops += N * self.dim * self.dim
+        return flops
+
+
+class SwinTransformerBlock(nn.Module):
+    r""" Swin Transformer Block.
+    Args:
+        dim (int): Number of input channels.
+        input_resolution (tuple[int]): Input fSwilotion.
+        num_heads (int): Number of attention heads.
+        window_size (int): Window size.
+        shift_size (int): Shift size for SW-MSA.
+        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim.
+        qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True
+        qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set.
+        drop (float, optional): Dropout rate. Default: 0.0
+        attn_drop (float, optional): Attention dropout rate. Default: 0.0
+        drop_path (float, optional): Stochastic depth rate. Default: 0.0
+        act_layer (nn.Module, optional): Activation layer. Default: nn.GELU
+        norm_layer (nn.Module, optional): Normalization layer.  Default: nn.LayerNorm
+    """
+
+    def __init__(self, dim, input_resolution, num_heads, window_size=7, shift_size=0,
+                 mlp_ratio=4., qkv_bias=True, qk_scale=None, drop=0., attn_drop=0., drop_path=0.,
+                 act_layer=nn.GELU, norm_layer=nn.LayerNorm):
+        super().__init__()
+        self.dim = dim
+        self.input_resolution = input_resolution
+        self.num_heads = num_heads
+        self.window_size = window_size
+        self.shift_size = shift_size
+        self.mlp_ratio = mlp_ratio
+        if min(self.input_resolution) <= self.window_size:
+            # if window size is larger than input resolution, we don't partition windows
+            self.shift_size = 0
+            self.window_size = min(self.input_resolution)
+        assert 0 <= self.shift_size < self.window_size, "shift_size must in 0-window_size"
+
+        self.norm1 = norm_layer(dim)
+        self.attn = WindowAttention(
+            dim, window_size=to_2tuple(self.window_size), num_heads=num_heads,
+            qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop)
+
+        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
+        self.norm2 = norm_layer(dim)
+        mlp_hidden_dim = int(dim * mlp_ratio)
+        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
+
+        if self.shift_size > 0:
+            # calculate attention mask for SW-MSA
+            H, W = self.input_resolution
+            img_mask = torch.zeros((1, H, W, 1))  # 1 H W 1
+            h_slices = (slice(0, -self.window_size),
+                        slice(-self.window_size, -self.shift_size),
+                        slice(-self.shift_size, None))
+            w_slices = (slice(0, -self.window_size),
+                        slice(-self.window_size, -self.shift_size),
+                        slice(-self.shift_size, None))
+            cnt = 0
+            for h in h_slices:
+                for w in w_slices:
+                    img_mask[:, h, w, :] = cnt
+                    cnt += 1
+
+            mask_windows = window_partition(img_mask, self.window_size)  # nW, window_size, window_size, 1
+            mask_windows = mask_windows.view(-1, self.window_size * self.window_size)
+            attn_mask = mask_windows.unsqueeze(1) - mask_windows.unsqueeze(2)
+            attn_mask = attn_mask.masked_fill(attn_mask != 0, float(-100.0)).masked_fill(attn_mask == 0, float(0.0))
+        else:
+            attn_mask = None
+
+        self.register_buffer("attn_mask", attn_mask)
+
+    def forward(self, x):
+        H, W = self.input_resolution
+        B, L, C = x.shape
+        assert L == H * W, "input feature has wrong size"
+
+        shortcut = x
+        x = self.norm1(x)
+        x = x.view(B, H, W, C)
+
+        # cyclic shift
+        if self.shift_size > 0:
+            shifted_x = torch.roll(x, shifts=(-self.shift_size, -self.shift_size), dims=(1, 2))
+        else:
+            shifted_x = x
+
+        # partition windows
+        x_windows = window_partition(shifted_x, self.window_size)  # nW*B, window_size, window_size, C
+        x_windows = x_windows.view(-1, self.window_size * self.window_size, C)  # nW*B, window_size*window_size, C
+
+        # W-MSA/SW-MSA
+        attn_windows = self.attn(x_windows, mask=self.attn_mask)  # nW*B, window_size*window_size, C
+
+        # merge windows
+        attn_windows = attn_windows.view(-1, self.window_size, self.window_size, C)
+        shifted_x = window_reverse(attn_windows, self.window_size, H, W)  # B H' W' C
+
+        # reverse cyclic shift
+        if self.shift_size > 0:
+            x = torch.roll(shifted_x, shifts=(self.shift_size, self.shift_size), dims=(1, 2))
+        else:
+            x = shifted_x
+        x = x.view(B, H * W, C)
+
+        # FFN
+        x = shortcut + self.drop_path(x)
+        x = x + self.drop_path(self.mlp(self.norm2(x)))
+
+        return x
+
+    def extra_repr(self) -> str:
+        return f"dim={self.dim}, input_resolution={self.input_resolution}, num_heads={self.num_heads}, " \
+               f"window_size={self.window_size}, shift_size={self.shift_size}, mlp_ratio={self.mlp_ratio}"
+
+    def flops(self):
+        flops = 0
+        H, W = self.input_resolution
+        # norm1
+        flops += self.dim * H * W
+        # W-MSA/SW-MSA
+        nW = H * W / self.window_size / self.window_size
+        flops += nW * self.attn.flops(self.window_size * self.window_size)
+        # mlp
+        flops += 2 * H * W * self.dim * self.dim * self.mlp_ratio
+        # norm2
+        flops += self.dim * H * W
+        return flops
+
+
+class PatchMerging(nn.Module):
+    r""" Patch Merging Layer.
+    Args:
+        input_resolution (tuple[int]): Resolution of input feature.
+        dim (int): Number of input channels.
+        norm_layer (nn.Module, optional): Normalization layer.  Default: nn.LayerNorm
+    """
+
+    def __init__(self, input_resolution, dim, norm_layer=nn.LayerNorm):
+        super().__init__()
+        self.input_resolution = input_resolution
+        self.dim = dim
+        self.reduction = nn.Linear(4 * dim, 2 * dim, bias=False)
+        self.norm = norm_layer(4 * dim)
+
+    def forward(self, x):
+        """
+        x: B, H*W, C
+        """
+        H, W = self.input_resolution
+        B, L, C = x.shape
+        assert L == H * W, "input feature has wrong size"
+        assert H % 2 == 0 and W % 2 == 0, f"x size ({H}*{W}) are not even."
+
+        x = x.view(B, H, W, C)
+
+        x0 = x[:, 0::2, 0::2, :]  # B H/2 W/2 C
+        x1 = x[:, 1::2, 0::2, :]  # B H/2 W/2 C
+        x2 = x[:, 0::2, 1::2, :]  # B H/2 W/2 C
+        x3 = x[:, 1::2, 1::2, :]  # B H/2 W/2 C
+        x = torch.cat([x0, x1, x2, x3], -1)  # B H/2 W/2 4*C
+        x = x.view(B, -1, 4 * C)  # B H/2*W/2 4*C
+
+        x = self.norm(x)
+        x = self.reduction(x)
+
+        return x
+
+    def extra_repr(self) -> str:
+        return f"input_resolution={self.input_resolution}, dim={self.dim}"
+
+    def flops(self):
+        H, W = self.input_resolution
+        flops = H * W * self.dim
+        flops += (H // 2) * (W // 2) * 4 * self.dim * 2 * self.dim
+        return flops
+
+
+class BasicLayer(nn.Module):
+    """ A basic Swin Transformer layer for one stage.
+    Args:
+        dim (int): Number of input channels.
+        input_resolution (tuple[int]): Input resolution.
+        depth (int): Number of blocks.
+        num_heads (int): Number of attention heads.
+        window_size (int): Local window size.
+        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim.
+        qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True
+        qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set.
+        drop (float, optional): Dropout rate. Default: 0.0
+        attn_drop (float, optional): Attention dropout rate. Default: 0.0
+        drop_path (float | tuple[float], optional): Stochastic depth rate. Default: 0.0
+        norm_layer (nn.Module, optional): Normalization layer. Default: nn.LayerNorm
+        downsample (nn.Module | None, optional): Downsample layer at the end of the layer. Default: None
+        use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False.
+    """
+
+    def __init__(self, dim, input_resolution, depth, num_heads, window_size,
+                 mlp_ratio=4., qkv_bias=True, qk_scale=None, drop=0., attn_drop=0.,
+                 drop_path=0., norm_layer=nn.LayerNorm, downsample=None, use_checkpoint=False):
+
+        super().__init__()
+        self.dim = dim
+        self.input_resolution = input_resolution
+        self.depth = depth
+        self.use_checkpoint = use_checkpoint
+
+        # build blocks
+        self.blocks = nn.ModuleList([
+            SwinTransformerBlock(dim=dim, input_resolution=input_resolution,
+                                 num_heads=num_heads, window_size=window_size,
+                                 shift_size=0 if (i % 2 == 0) else window_size // 2,
+                                 mlp_ratio=mlp_ratio,
+                                 qkv_bias=qkv_bias, qk_scale=qk_scale,
+                                 drop=drop, attn_drop=attn_drop,
+                                 drop_path=drop_path[i] if isinstance(drop_path, list) else drop_path,
+                                 norm_layer=norm_layer)
+            for i in range(depth)])
+
+        # patch merging layer
+        if downsample is not None:
+            self.downsample = downsample(input_resolution, dim=dim, norm_layer=norm_layer)
+        else:
+            self.downsample = None
+
+    def forward(self, x):
+        for blk in self.blocks:
+            if self.use_checkpoint:
+                x = checkpoint.checkpoint(blk, x)
+            else:
+                x = blk(x)
+        if self.downsample is not None:
+            x = self.downsample(x)
+        return x
+
+    def extra_repr(self) -> str:
+        return f"dim={self.dim}, input_resolution={self.input_resolution}, depth={self.depth}"
+
+    def flops(self):
+        flops = 0
+        for blk in self.blocks:
+            flops += blk.flops()
+        if self.downsample is not None:
+            flops += self.downsample.flops()
+        return flops
+
+
+class PatchEmbed(nn.Module):
+    """ Image to Patch Embedding
+    Args:
+        img_size (int): Image size.  Default: 224.
+        patch_size (int): Patch token size. Default: 4.
+        in_chans (int): Number of input image channels. Default: 3.
+        embed_dim (int): Number of linear projection output channels. Default: 96.
+        norm_layer (nn.Module, optional): Normalization layer. Default: None
+    """
+
+    def __init__(self, img_size=224, patch_size=4, in_chans=3, embed_dim=96, norm_layer=None):
+        super().__init__()
+        img_size = to_2tuple(img_size)
+        patch_size = to_2tuple(patch_size)
+        patches_resolution = [img_size[0] // patch_size[0], img_size[1] // patch_size[1]]
+        self.img_size = img_size
+        self.patch_size = patch_size
+        self.patches_resolution = patches_resolution
+        self.num_patches = patches_resolution[0] * patches_resolution[1]
+
+        self.in_chans = in_chans
+        self.embed_dim = embed_dim
+
+        self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size)
+        if norm_layer is not None:
+            self.norm = norm_layer(embed_dim)
+        else:
+            self.norm = None
+
+    def forward(self, x):
+        B, C, H, W = x.shape
+        # FIXME look at relaxing size constraints
+        assert H == self.img_size[0] and W == self.img_size[1], \
+            f"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]})."
+        x = self.proj(x).flatten(2).transpose(1, 2)  # B Ph*Pw C
+        if self.norm is not None:
+            x = self.norm(x)
+        return x
+
+    def flops(self):
+        Ho, Wo = self.patches_resolution
+        flops = Ho * Wo * self.embed_dim * self.in_chans * (self.patch_size[0] * self.patch_size[1])
+        if self.norm is not None:
+            flops += Ho * Wo * self.embed_dim
+        return flops
+
+
+class SwinTransformer(nn.Module):
+    r""" Swin Transformer
+        A PyTorch impl of : `Swin Transformer: Hierarchical Vision Transformer using Shifted Windows`  -
+          https://arxiv.org/pdf/2103.14030
+    Args:
+        img_size (int | tuple(int)): Input image size. Default 224
+        patch_size (int | tuple(int)): Patch size. Default: 4
+        in_chans (int): Number of input image channels. Default: 3
+        num_classes (int): Number of classes for classification head. Default: 1000
+        embed_dim (int): Patch embedding dimension. Default: 96
+        depths (tuple(int)): Depth of each Swin Transformer layer.
+        num_heads (tuple(int)): Number of attention heads in different layers.
+        window_size (int): Window size. Default: 7
+        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. Default: 4
+        qkv_bias (bool): If True, add a learnable bias to query, key, value. Default: True
+        qk_scale (float): Override default qk scale of head_dim ** -0.5 if set. Default: None
+        drop_rate (float): Dropout rate. Default: 0
+        attn_drop_rate (float): Attention dropout rate. Default: 0
+        drop_path_rate (float): Stochastic depth rate. Default: 0.1
+        norm_layer (nn.Module): Normalization layer. Default: nn.LayerNorm.
+        ape (bool): If True, add absolute position embedding to the patch embedding. Default: False
+        patch_norm (bool): If True, add normalization after patch embedding. Default: True
+        use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False
+    """
+
+    def __init__(self, img_size=224, patch_size=4, in_chans=3, num_classes=1000,
+                 embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24],
+                 window_size=7, mlp_ratio=4., qkv_bias=True, qk_scale=None,
+                 drop_rate=0., attn_drop_rate=0., drop_path_rate=0.1,
+                 norm_layer=nn.LayerNorm, ape=False, patch_norm=True,
+                 use_checkpoint=False, **kwargs):
+        super().__init__()
+
+        self.num_classes = num_classes
+        self.num_layers = len(depths)
+        self.embed_dim = embed_dim
+        self.ape = ape
+        self.patch_norm = patch_norm
+        self.num_features = int(embed_dim * 2 ** (self.num_layers - 1))
+        self.mlp_ratio = mlp_ratio
+
+        # split image into non-overlapping patches
+        self.patch_embed = PatchEmbed(
+            img_size=img_size, patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim,
+            norm_layer=norm_layer if self.patch_norm else None)
+        num_patches = self.patch_embed.num_patches
+        patches_resolution = self.patch_embed.patches_resolution
+        self.patches_resolution = patches_resolution
+
+        # absolute position embedding
+        if self.ape:
+            self.absolute_pos_embed = nn.Parameter(torch.zeros(1, num_patches, embed_dim))
+            trunc_normal_(self.absolute_pos_embed, std=.02)
+
+        self.pos_drop = nn.Dropout(p=drop_rate)
+
+        # stochastic depth
+        dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))]  # stochastic depth decay rule
+
+        # build layers
+        self.layers = nn.ModuleList()
+        for i_layer in range(self.num_layers):
+            layer = BasicLayer(dim=int(embed_dim * 2 ** i_layer),
+                               input_resolution=(patches_resolution[0] // (2 ** i_layer),
+                                                 patches_resolution[1] // (2 ** i_layer)),
+                               depth=depths[i_layer],
+                               num_heads=num_heads[i_layer],
+                               window_size=window_size,
+                               mlp_ratio=self.mlp_ratio,
+                               qkv_bias=qkv_bias, qk_scale=qk_scale,
+                               drop=drop_rate, attn_drop=attn_drop_rate,
+                               drop_path=dpr[sum(depths[:i_layer]):sum(depths[:i_layer + 1])],
+                               norm_layer=norm_layer,
+                               downsample=PatchMerging if (i_layer < self.num_layers - 1) else None,
+                               use_checkpoint=use_checkpoint)
+            self.layers.append(layer)
+
+        self.norm = norm_layer(self.num_features)
+        self.avgpool = nn.AdaptiveAvgPool1d(1)
+        self.head = nn.Linear(self.num_features, num_classes) if num_classes > 0 else nn.Identity()
+
+        self.apply(self._init_weights)
+
+    def _init_weights(self, m):
+        if isinstance(m, nn.Linear):
+            trunc_normal_(m.weight, std=.02)
+            if isinstance(m, nn.Linear) and m.bias is not None:
+                nn.init.constant_(m.bias, 0)
+        elif isinstance(m, nn.LayerNorm):
+            nn.init.constant_(m.bias, 0)
+            nn.init.constant_(m.weight, 1.0)
+
+    @torch.jit.ignore
+    def no_weight_decay(self):
+        return {'absolute_pos_embed'}
+
+    @torch.jit.ignore
+    def no_weight_decay_keywords(self):
+        return {'relative_position_bias_table'}
+
+    def forward_features(self, x):
+        x = self.patch_embed(x)
+        if self.ape:
+            x = x + self.absolute_pos_embed
+        x = self.pos_drop(x)
+
+        for layer in self.layers:
+            x = layer(x)
+
+        x = self.norm(x)  # B L C
+        x = self.avgpool(x.transpose(1, 2))  # B C 1
+        x = torch.flatten(x, 1)
+        return x
+
+    def forward(self, x):
+        x = self.forward_features(x)
+        x = self.head(x)
+        return x
+
+    def flops(self):
+        flops = 0
+        flops += self.patch_embed.flops()
+        for i, layer in enumerate(self.layers):
+            flops += layer.flops()
+        flops += self.num_features * self.patches_resolution[0] * self.patches_resolution[1] // (2 ** self.num_layers)
+        flops += self.num_features * self.num_classes
+        return flops
\ No newline at end of file
diff --git a/final-project/model_zoo/swin/swin_transformer_bbn.py b/final-project/model_zoo/swin/swin_transformer_bbn.py
new file mode 100644
index 0000000..f875243
--- /dev/null
+++ b/final-project/model_zoo/swin/swin_transformer_bbn.py
@@ -0,0 +1,211 @@
+from .swin_transformer import *
+
+
+def get_swin_bbn(ckpt):
+	'''
+	According to model type, define model parameters
+	Reference: https://github.com/microsoft/Swin-Transformer/tree/main/configs
+	'''
+	if ckpt == "./model_zoo/swin/swin_base_patch4_window7_224.pth":
+		img_size = 224
+		path_size = 4
+		window_size = 7
+		embed_dim = 128
+		depths = [ 2, 2, 18, 2 ]
+		num_heads = [ 4, 8, 16, 32 ]
+		drop_path_rate = 0.5
+	elif ckpt == "./model_zoo/swin/swin_large_patch4_window12_384_22kto1k.pth":
+		img_size = 384
+		path_size = 4
+		window_size = 12
+		embed_dim = 192
+		depths = [ 2, 2, 18, 2 ]
+		num_heads = [ 6, 12, 24, 48 ]
+		drop_path_rate = 0.2
+	elif ckpt == "./model_zoo/swin/swin_tiny_patch4_window7_224.pth":
+		img_size = 224
+		path_size = 4
+		window_size = 7
+		embed_dim = 96
+		depths = [ 2, 2, 6, 2 ]
+		num_heads = [ 3, 6, 12, 24 ]
+		drop_path_rate = 0.2
+	else:
+		img_size = 384
+		path_size = 4
+		window_size = 12
+		embed_dim = 192
+		depths = [ 2, 2, 18, 2 ]
+		num_heads = [ 6, 12, 24, 48 ]
+		drop_path_rate = 0.2
+
+	model = BBN_SwinTransformer(img_size=img_size,
+							patch_size=path_size,
+							in_chans=3,
+							num_classes=1000,
+							embed_dim=embed_dim,
+							depths=depths,
+							num_heads=num_heads,
+							window_size=window_size,
+							mlp_ratio=4.,
+							qkv_bias=True,
+							qk_scale=None,
+							drop_rate=0.0,
+							drop_path_rate=drop_path_rate,
+							ape=False,
+							patch_norm=True,
+							use_checkpoint=False)
+	#print(model)
+	checkpoint = torch.load(ckpt)
+	model.load_state_dict(checkpoint, strict=False)
+
+	return model
+
+
+
+class BBN_SwinTransformer(SwinTransformer):
+	r""" Swin Transformer
+		A PyTorch impl of : `Swin Transformer: Hierarchical Vision Transformer using Shifted Windows`  -
+		  https://arxiv.org/pdf/2103.14030
+	Args:
+		img_size (int | tuple(int)): Input image size. Default 224
+		patch_size (int | tuple(int)): Patch size. Default: 4
+		in_chans (int): Number of input image channels. Default: 3
+		num_classes (int): Number of classes for classification head. Default: 1000
+		embed_dim (int): Patch embedding dimension. Default: 96
+		depths (tuple(int)): Depth of each Swin Transformer layer.
+		num_heads (tuple(int)): Number of attention heads in different layers.
+		window_size (int): Window size. Default: 7
+		mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. Default: 4
+		qkv_bias (bool): If True, add a learnable bias to query, key, value. Default: True
+		qk_scale (float): Override default qk scale of head_dim ** -0.5 if set. Default: None
+		drop_rate (float): Dropout rate. Default: 0
+		attn_drop_rate (float): Attention dropout rate. Default: 0
+		drop_path_rate (float): Stochastic depth rate. Default: 0.1
+		norm_layer (nn.Module): Normalization layer. Default: nn.LayerNorm.
+		ape (bool): If True, add absolute position embedding to the patch embedding. Default: False
+		patch_norm (bool): If True, add normalization after patch embedding. Default: True
+		use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False
+	"""
+
+	def __init__(self, img_size=224, patch_size=4, in_chans=3, num_classes=1000,
+				 embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24],
+				 window_size=7, mlp_ratio=4., qkv_bias=True, qk_scale=None,
+				 drop_rate=0., attn_drop_rate=0., drop_path_rate=0.1,
+				 norm_layer=nn.LayerNorm, ape=False, patch_norm=True,
+				 use_checkpoint=False, **kwargs):
+		super().__init__()
+
+		self.num_classes = num_classes
+		self.num_layers = len(depths)
+		self.embed_dim = embed_dim
+		self.ape = ape
+		self.patch_norm = patch_norm
+		self.num_features = int(embed_dim * 2 ** (self.num_layers - 1))
+		self.mlp_ratio = mlp_ratio
+
+		# split image into non-overlapping patches
+		self.patch_embed = PatchEmbed(
+			img_size=img_size, patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim,
+			norm_layer=norm_layer if self.patch_norm else None)
+		num_patches = self.patch_embed.num_patches
+		patches_resolution = self.patch_embed.patches_resolution
+		self.patches_resolution = patches_resolution
+
+		# absolute position embedding
+		if self.ape:
+			self.absolute_pos_embed = nn.Parameter(torch.zeros(1, num_patches, embed_dim))
+			trunc_normal_(self.absolute_pos_embed, std=.02)
+
+		self.pos_drop = nn.Dropout(p=drop_rate)
+
+		# stochastic depth
+		dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))]  # stochastic depth decay rule
+
+		# build layers
+		self.layers = nn.ModuleList()
+		for i_layer in range(self.num_layers):
+			layer = BasicLayer(dim=int(embed_dim * 2 ** i_layer),
+							   input_resolution=(patches_resolution[0] // (2 ** i_layer),
+												 patches_resolution[1] // (2 ** i_layer)),
+							   depth=depths[i_layer],
+							   num_heads=num_heads[i_layer],
+							   window_size=window_size,
+							   mlp_ratio=self.mlp_ratio,
+							   qkv_bias=qkv_bias, qk_scale=qk_scale,
+							   drop=drop_rate, attn_drop=attn_drop_rate,
+							   drop_path=dpr[sum(depths[:i_layer]):sum(depths[:i_layer + 1])],
+							   norm_layer=norm_layer,
+							   downsample=PatchMerging if (i_layer < self.num_layers - 1) else None,
+							   use_checkpoint=use_checkpoint)
+			self.layers.append(layer)
+
+		self.norm = norm_layer(self.num_features)
+		self.avgpool = nn.AdaptiveAvgPool1d(1)
+		self.head = nn.Linear(self.num_features, num_classes) if num_classes > 0 else nn.Identity()
+
+
+		##########################################
+		# BNN related layer(Hard core coding.....)
+		##########################################
+		self.fc_block = nn.Linear(self.num_features,2048)
+		self.fr_block = nn.Linear(self.num_features,2048)
+
+
+		self.apply(self._init_weights)
+
+	def _init_weights(self, m):
+		if isinstance(m, nn.Linear):
+			trunc_normal_(m.weight, std=.02)
+			if isinstance(m, nn.Linear) and m.bias is not None:
+				nn.init.constant_(m.bias, 0)
+		elif isinstance(m, nn.LayerNorm):
+			nn.init.constant_(m.bias, 0)
+			nn.init.constant_(m.weight, 1.0)
+
+	@torch.jit.ignore
+	def no_weight_decay(self):
+		return {'absolute_pos_embed'}
+
+	@torch.jit.ignore
+	def no_weight_decay_keywords(self):
+		return {'relative_position_bias_table'}
+
+	def forward_features(self, x):
+		x = self.patch_embed(x)
+		if self.ape:
+			x = x + self.absolute_pos_embed
+		x = self.pos_drop(x)
+
+		for layer in self.layers:
+			x = layer(x)
+
+		x = self.norm(x)  # B L C
+		x = self.avgpool(x.transpose(1, 2))  # B C 1
+		x = torch.flatten(x, 1)
+		return x
+
+	def forward(self, x, **kwargs):
+		##########################################
+		# BNN related layer(Hard core coding.....)
+		##########################################
+		x = self.forward_features(x)
+		if "feature_cb" in kwargs:
+			out = self.fc_block(x)
+			return out
+		elif "feature_rb" in kwargs:
+			out = self.fr_block(x)
+			return out
+		out1 = self.fc_block(x)
+		out2 = self.fr_block(x)
+		out = torch.cat((out1, out2), dim=1)
+		return out
+
+	def flops(self):
+		flops = 0
+		flops += self.patch_embed.flops()
+		for i, layer in enumerate(self.layers):
+			flops += layer.flops()
+		flops += self.num_features * self.patches_resolution[0] * self.patches_resolution[1] // (2 ** self.num_layers)
+		flops += self.num_features * self.num_classes
+		return flops
\ No newline at end of file
diff --git a/final-project/model_zoo/swin/swin_transformer_vis.py b/final-project/model_zoo/swin/swin_transformer_vis.py
new file mode 100644
index 0000000..cb77a6b
--- /dev/null
+++ b/final-project/model_zoo/swin/swin_transformer_vis.py
@@ -0,0 +1,626 @@
+import torch
+import torch.nn as nn
+import torch.utils.checkpoint as checkpoint
+from timm.models.layers import DropPath, to_2tuple, trunc_normal_
+
+def get_swin(ckpt):
+    '''
+    According to model type, define model parameters
+    Reference: https://github.com/microsoft/Swin-Transformer/tree/main/configs
+    '''
+    if ckpt == "./model_zoo/swin/swin_base_patch4_window7_224.pth":
+        img_size = 224
+        path_size = 4
+        window_size = 7
+        embed_dim = 128
+        depths = [ 2, 2, 18, 2 ]
+        num_heads = [ 4, 8, 16, 32 ]
+        drop_path_rate = 0.5
+    elif ckpt == "./model_zoo/swin/swin_large_patch4_window12_384_22kto1k.pth":
+        img_size = 384
+        path_size = 4
+        window_size = 12
+        embed_dim = 192
+        depths = [ 2, 2, 18, 2 ]
+        num_heads = [ 6, 12, 24, 48 ]
+        drop_path_rate = 0.2
+    elif ckpt == "./model_zoo/swin/swin_tiny_patch4_window7_224.pth":
+        img_size = 224
+        path_size = 4
+        window_size = 7
+        embed_dim = 96
+        depths = [ 2, 2, 6, 2 ]
+        num_heads = [ 3, 6, 12, 24 ]
+        drop_path_rate = 0.2
+
+
+    model = SwinTransformer(img_size=img_size,
+                            patch_size=path_size,
+                            in_chans=3,
+                            num_classes=1000,
+                            embed_dim=embed_dim,
+                            depths=depths,
+                            num_heads=num_heads,
+                            window_size=window_size,
+                            mlp_ratio=4.,
+                            qkv_bias=True,
+                            qk_scale=None,
+                            drop_rate=0.0,
+                            drop_path_rate=drop_path_rate,
+                            ape=False,
+                            patch_norm=True,
+                            use_checkpoint=False)
+    checkpoint = torch.load(ckpt)
+    model.load_state_dict(checkpoint['model'], strict=False)
+
+    return model
+
+
+class Mlp(nn.Module):
+    def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):
+        super().__init__()
+        out_features = out_features or in_features
+        hidden_features = hidden_features or in_features
+        self.fc1 = nn.Linear(in_features, hidden_features)
+        self.act = act_layer()
+        self.fc2 = nn.Linear(hidden_features, out_features)
+        self.drop = nn.Dropout(drop)
+
+    def forward(self, x):
+        x = self.fc1(x)
+        x = self.act(x)
+        x = self.drop(x)
+        x = self.fc2(x)
+        x = self.drop(x)
+        return x
+
+
+def window_partition(x, window_size):
+    """
+    Args:
+        x: (B, H, W, C)
+        window_size (int): window size
+    Returns:
+        windows: (num_windows*B, window_size, window_size, C)
+    """
+    B, H, W, C = x.shape
+    x = x.view(B, H // window_size, window_size, W // window_size, window_size, C)
+    windows = x.permute(0, 1, 3, 2, 4, 5).contiguous().view(-1, window_size, window_size, C)
+    return windows
+
+
+def window_reverse(windows, window_size, H, W):
+    """
+    Args:
+        windows: (num_windows*B, window_size, window_size, C)
+        window_size (int): Window size
+        H (int): Height of image
+        W (int): Width of image
+    Returns:
+        x: (B, H, W, C)
+    """
+    B = int(windows.shape[0] / (H * W / window_size / window_size))
+    x = windows.view(B, H // window_size, W // window_size, window_size, window_size, -1)
+    x = x.permute(0, 1, 3, 2, 4, 5).contiguous().view(B, H, W, -1)
+    return x
+
+
+class WindowAttention(nn.Module):
+    r""" Window based multi-head self attention (W-MSA) module with relative position bias.
+    It supports both of shifted and non-shifted window.
+    Args:
+        dim (int): Number of input channels.
+        window_size (tuple[int]): The height and width of the window.
+        num_heads (int): Number of attention heads.
+        qkv_bias (bool, optional):  If True, add a learnable bias to query, key, value. Default: True
+        qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set
+        attn_drop (float, optional): Dropout ratio of attention weight. Default: 0.0
+        proj_drop (float, optional): Dropout ratio of output. Default: 0.0
+    """
+
+    def __init__(self, dim, window_size, num_heads, qkv_bias=True, qk_scale=None, attn_drop=0., proj_drop=0.):
+
+        super().__init__()
+        self.dim = dim
+        self.window_size = window_size  # Wh, Ww
+        self.num_heads = num_heads
+        head_dim = dim // num_heads
+        self.scale = qk_scale or head_dim ** -0.5
+
+        # define a parameter table of relative position bias
+        self.relative_position_bias_table = nn.Parameter(
+            torch.zeros((2 * window_size[0] - 1) * (2 * window_size[1] - 1), num_heads))  # 2*Wh-1 * 2*Ww-1, nH
+
+        # get pair-wise relative position index for each token inside the window
+        coords_h = torch.arange(self.window_size[0])
+        coords_w = torch.arange(self.window_size[1])
+        coords = torch.stack(torch.meshgrid([coords_h, coords_w]))  # 2, Wh, Ww
+        coords_flatten = torch.flatten(coords, 1)  # 2, Wh*Ww
+        relative_coords = coords_flatten[:, :, None] - coords_flatten[:, None, :]  # 2, Wh*Ww, Wh*Ww
+        relative_coords = relative_coords.permute(1, 2, 0).contiguous()  # Wh*Ww, Wh*Ww, 2
+        relative_coords[:, :, 0] += self.window_size[0] - 1  # shift to start from 0
+        relative_coords[:, :, 1] += self.window_size[1] - 1
+        relative_coords[:, :, 0] *= 2 * self.window_size[1] - 1
+        relative_position_index = relative_coords.sum(-1)  # Wh*Ww, Wh*Ww
+        self.register_buffer("relative_position_index", relative_position_index)
+
+        self.qkv = nn.Linear(dim, dim * 3, bias=qkv_bias)
+        self.attn_drop = nn.Dropout(attn_drop)
+        self.proj = nn.Linear(dim, dim)
+        self.proj_drop = nn.Dropout(proj_drop)
+
+        trunc_normal_(self.relative_position_bias_table, std=.02)
+        self.softmax = nn.Softmax(dim=-1)
+
+    def forward(self, x, mask=None):
+        """
+        Args:
+            x: input features with shape of (num_windows*B, N, C)
+            mask: (0/-inf) mask with shape of (num_windows, Wh*Ww, Wh*Ww) or None
+        """
+        B_, N, C = x.shape
+        qkv = self.qkv(x).reshape(B_, N, 3, self.num_heads, C // self.num_heads).permute(2, 0, 3, 1, 4)
+        q, k, v = qkv[0], qkv[1], qkv[2]  # make torchscript happy (cannot use tensor as tuple)
+
+        q = q * self.scale
+        attn = (q @ k.transpose(-2, -1))
+        attn_copy = attn
+
+        relative_position_bias = self.relative_position_bias_table[self.relative_position_index.view(-1)].view(
+            self.window_size[0] * self.window_size[1], self.window_size[0] * self.window_size[1], -1)  # Wh*Ww,Wh*Ww,nH
+        relative_position_bias = relative_position_bias.permute(2, 0, 1).contiguous()  # nH, Wh*Ww, Wh*Ww
+        attn = attn + relative_position_bias.unsqueeze(0)
+
+        if mask is not None:
+            nW = mask.shape[0]
+            attn = attn.view(B_ // nW, nW, self.num_heads, N, N) + mask.unsqueeze(1).unsqueeze(0)
+            attn = attn.view(-1, self.num_heads, N, N)
+            attn = self.softmax(attn)
+        else:
+            attn = self.softmax(attn)
+
+        attn = self.attn_drop(attn)
+
+        x = (attn @ v).transpose(1, 2).reshape(B_, N, C)
+        x = self.proj(x)
+        x = self.proj_drop(x)
+        return x, attn_copy
+
+    def extra_repr(self) -> str:
+        return f'dim={self.dim}, window_size={self.window_size}, num_heads={self.num_heads}'
+
+    def flops(self, N):
+        # calculate flops for 1 window with token length of N
+        flops = 0
+        # qkv = self.qkv(x)
+        flops += N * self.dim * 3 * self.dim
+        # attn = (q @ k.transpose(-2, -1))
+        flops += self.num_heads * N * (self.dim // self.num_heads) * N
+        #  x = (attn @ v)
+        flops += self.num_heads * N * N * (self.dim // self.num_heads)
+        # x = self.proj(x)
+        flops += N * self.dim * self.dim
+        return flops
+
+
+class SwinTransformerBlock(nn.Module):
+    r""" Swin Transformer Block.
+    Args:
+        dim (int): Number of input channels.
+        input_resolution (tuple[int]): Input fSwilotion.
+        num_heads (int): Number of attention heads.
+        window_size (int): Window size.
+        shift_size (int): Shift size for SW-MSA.
+        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim.
+        qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True
+        qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set.
+        drop (float, optional): Dropout rate. Default: 0.0
+        attn_drop (float, optional): Attention dropout rate. Default: 0.0
+        drop_path (float, optional): Stochastic depth rate. Default: 0.0
+        act_layer (nn.Module, optional): Activation layer. Default: nn.GELU
+        norm_layer (nn.Module, optional): Normalization layer.  Default: nn.LayerNorm
+    """
+
+    def __init__(self, dim, input_resolution, num_heads, window_size=7, shift_size=0,
+                 mlp_ratio=4., qkv_bias=True, qk_scale=None, drop=0., attn_drop=0., drop_path=0.,
+                 act_layer=nn.GELU, norm_layer=nn.LayerNorm):
+        super().__init__()
+        self.dim = dim
+        self.input_resolution = input_resolution
+        self.num_heads = num_heads
+        self.window_size = window_size
+        self.shift_size = shift_size
+        self.mlp_ratio = mlp_ratio
+        if min(self.input_resolution) <= self.window_size:
+            # if window size is larger than input resolution, we don't partition windows
+            self.shift_size = 0
+            self.window_size = min(self.input_resolution)
+        assert 0 <= self.shift_size < self.window_size, "shift_size must in 0-window_size"
+
+        self.norm1 = norm_layer(dim)
+        self.attn = WindowAttention(
+            dim, window_size=to_2tuple(self.window_size), num_heads=num_heads,
+            qkv_bias=qkv_bias, qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop)
+
+        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
+        self.norm2 = norm_layer(dim)
+        mlp_hidden_dim = int(dim * mlp_ratio)
+        self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop)
+
+        if self.shift_size > 0:
+            # calculate attention mask for SW-MSA
+            H, W = self.input_resolution
+            img_mask = torch.zeros((1, H, W, 1))  # 1 H W 1
+            h_slices = (slice(0, -self.window_size),
+                        slice(-self.window_size, -self.shift_size),
+                        slice(-self.shift_size, None))
+            w_slices = (slice(0, -self.window_size),
+                        slice(-self.window_size, -self.shift_size),
+                        slice(-self.shift_size, None))
+            cnt = 0
+            for h in h_slices:
+                for w in w_slices:
+                    img_mask[:, h, w, :] = cnt
+                    cnt += 1
+
+            mask_windows = window_partition(img_mask, self.window_size)  # nW, window_size, window_size, 1
+            mask_windows = mask_windows.view(-1, self.window_size * self.window_size)
+            attn_mask = mask_windows.unsqueeze(1) - mask_windows.unsqueeze(2)
+            attn_mask = attn_mask.masked_fill(attn_mask != 0, float(-100.0)).masked_fill(attn_mask == 0, float(0.0))
+        else:
+            attn_mask = None
+
+        self.register_buffer("attn_mask", attn_mask)
+
+    def forward(self, x):
+        H, W = self.input_resolution
+        B, L, C = x.shape
+        assert L == H * W, "input feature has wrong size"
+
+        shortcut = x
+        x = self.norm1(x)
+        x = x.view(B, H, W, C)
+
+        # cyclic shift
+        if self.shift_size > 0:
+            shifted_x = torch.roll(x, shifts=(-self.shift_size, -self.shift_size), dims=(1, 2))
+        else:
+            shifted_x = x
+
+        # partition windows
+        x_windows = window_partition(shifted_x, self.window_size)  # nW*B, window_size, window_size, C
+        x_windows = x_windows.view(-1, self.window_size * self.window_size, C)  # nW*B, window_size*window_size, C
+
+        # W-MSA/SW-MSA
+        attn_windows, attn = self.attn(x_windows, mask=self.attn_mask)  # nW*B, window_size*window_size, C
+
+        # merge windows
+        attn_windows = attn_windows.view(-1, self.window_size, self.window_size, C)
+        shifted_x = window_reverse(attn_windows, self.window_size, H, W)  # B H' W' C
+
+        # reverse cyclic shift
+        if self.shift_size > 0:
+            x = torch.roll(shifted_x, shifts=(self.shift_size, self.shift_size), dims=(1, 2))
+        else:
+            x = shifted_x
+        x = x.view(B, H * W, C)
+
+        # FFN
+        x = shortcut + self.drop_path(x)
+        x = x + self.drop_path(self.mlp(self.norm2(x)))
+
+        return x, attn
+
+    def extra_repr(self) -> str:
+        return f"dim={self.dim}, input_resolution={self.input_resolution}, num_heads={self.num_heads}, " \
+               f"window_size={self.window_size}, shift_size={self.shift_size}, mlp_ratio={self.mlp_ratio}"
+
+    def flops(self):
+        flops = 0
+        H, W = self.input_resolution
+        # norm1
+        flops += self.dim * H * W
+        # W-MSA/SW-MSA
+        nW = H * W / self.window_size / self.window_size
+        flops += nW * self.attn.flops(self.window_size * self.window_size)
+        # mlp
+        flops += 2 * H * W * self.dim * self.dim * self.mlp_ratio
+        # norm2
+        flops += self.dim * H * W
+        return flops
+
+
+class PatchMerging(nn.Module):
+    r""" Patch Merging Layer.
+    Args:
+        input_resolution (tuple[int]): Resolution of input feature.
+        dim (int): Number of input channels.
+        norm_layer (nn.Module, optional): Normalization layer.  Default: nn.LayerNorm
+    """
+
+    def __init__(self, input_resolution, dim, norm_layer=nn.LayerNorm):
+        super().__init__()
+        self.input_resolution = input_resolution
+        self.dim = dim
+        self.reduction = nn.Linear(4 * dim, 2 * dim, bias=False)
+        self.norm = norm_layer(4 * dim)
+
+    def forward(self, x):
+        """
+        x: B, H*W, C
+        """
+        H, W = self.input_resolution
+        B, L, C = x.shape
+        assert L == H * W, "input feature has wrong size"
+        assert H % 2 == 0 and W % 2 == 0, f"x size ({H}*{W}) are not even."
+
+        x = x.view(B, H, W, C)
+
+        x0 = x[:, 0::2, 0::2, :]  # B H/2 W/2 C
+        x1 = x[:, 1::2, 0::2, :]  # B H/2 W/2 C
+        x2 = x[:, 0::2, 1::2, :]  # B H/2 W/2 C
+        x3 = x[:, 1::2, 1::2, :]  # B H/2 W/2 C
+        x = torch.cat([x0, x1, x2, x3], -1)  # B H/2 W/2 4*C
+        x = x.view(B, -1, 4 * C)  # B H/2*W/2 4*C
+
+        x = self.norm(x)
+        x = self.reduction(x)
+
+        return x
+
+    def extra_repr(self) -> str:
+        return f"input_resolution={self.input_resolution}, dim={self.dim}"
+
+    def flops(self):
+        H, W = self.input_resolution
+        flops = H * W * self.dim
+        flops += (H // 2) * (W // 2) * 4 * self.dim * 2 * self.dim
+        return flops
+
+
+class BasicLayer(nn.Module):
+    """ A basic Swin Transformer layer for one stage.
+    Args:
+        dim (int): Number of input channels.
+        input_resolution (tuple[int]): Input resolution.
+        depth (int): Number of blocks.
+        num_heads (int): Number of attention heads.
+        window_size (int): Local window size.
+        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim.
+        qkv_bias (bool, optional): If True, add a learnable bias to query, key, value. Default: True
+        qk_scale (float | None, optional): Override default qk scale of head_dim ** -0.5 if set.
+        drop (float, optional): Dropout rate. Default: 0.0
+        attn_drop (float, optional): Attention dropout rate. Default: 0.0
+        drop_path (float | tuple[float], optional): Stochastic depth rate. Default: 0.0
+        norm_layer (nn.Module, optional): Normalization layer. Default: nn.LayerNorm
+        downsample (nn.Module | None, optional): Downsample layer at the end of the layer. Default: None
+        use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False.
+    """
+
+    def __init__(self, dim, input_resolution, depth, num_heads, window_size,
+                 mlp_ratio=4., qkv_bias=True, qk_scale=None, drop=0., attn_drop=0.,
+                 drop_path=0., norm_layer=nn.LayerNorm, downsample=None, use_checkpoint=False):
+
+        super().__init__()
+        self.dim = dim
+        self.input_resolution = input_resolution
+        self.depth = depth
+        self.use_checkpoint = use_checkpoint
+
+        # build blocks
+        self.blocks = nn.ModuleList([
+            SwinTransformerBlock(dim=dim, input_resolution=input_resolution,
+                                 num_heads=num_heads, window_size=window_size,
+                                 shift_size=0 if (i % 2 == 0) else window_size // 2,
+                                 mlp_ratio=mlp_ratio,
+                                 qkv_bias=qkv_bias, qk_scale=qk_scale,
+                                 drop=drop, attn_drop=attn_drop,
+                                 drop_path=drop_path[i] if isinstance(drop_path, list) else drop_path,
+                                 norm_layer=norm_layer)
+            for i in range(depth)])
+
+        # patch merging layer
+        if downsample is not None:
+            self.downsample = downsample(input_resolution, dim=dim, norm_layer=norm_layer)
+        else:
+            self.downsample = None
+
+    def forward(self, x):
+        for blk in self.blocks:
+            if self.use_checkpoint:
+                x, attn = checkpoint.checkpoint(blk, x)
+            else:
+                x, attn = blk(x)
+        if self.downsample is not None:
+            x = self.downsample(x)
+        return x, attn
+
+    def extra_repr(self) -> str:
+        return f"dim={self.dim}, input_resolution={self.input_resolution}, depth={self.depth}"
+
+    def flops(self):
+        flops = 0
+        for blk in self.blocks:
+            flops += blk.flops()
+        if self.downsample is not None:
+            flops += self.downsample.flops()
+        return flops
+
+
+class PatchEmbed(nn.Module):
+    """ Image to Patch Embedding
+    Args:
+        img_size (int): Image size.  Default: 224.
+        patch_size (int): Patch token size. Default: 4.
+        in_chans (int): Number of input image channels. Default: 3.
+        embed_dim (int): Number of linear projection output channels. Default: 96.
+        norm_layer (nn.Module, optional): Normalization layer. Default: None
+    """
+
+    def __init__(self, img_size=224, patch_size=4, in_chans=3, embed_dim=96, norm_layer=None):
+        super().__init__()
+        img_size = to_2tuple(img_size)
+        patch_size = to_2tuple(patch_size)
+        patches_resolution = [img_size[0] // patch_size[0], img_size[1] // patch_size[1]]
+        self.img_size = img_size
+        self.patch_size = patch_size
+        self.patches_resolution = patches_resolution
+        self.num_patches = patches_resolution[0] * patches_resolution[1]
+
+        self.in_chans = in_chans
+        self.embed_dim = embed_dim
+
+        self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_size)
+        if norm_layer is not None:
+            self.norm = norm_layer(embed_dim)
+        else:
+            self.norm = None
+
+    def forward(self, x):
+        B, C, H, W = x.shape
+        # FIXME look at relaxing size constraints
+        assert H == self.img_size[0] and W == self.img_size[1], \
+            f"Input image size ({H}*{W}) doesn't match model ({self.img_size[0]}*{self.img_size[1]})."
+        x = self.proj(x).flatten(2).transpose(1, 2)  # B Ph*Pw C
+        if self.norm is not None:
+            x = self.norm(x)
+        return x
+
+    def flops(self):
+        Ho, Wo = self.patches_resolution
+        flops = Ho * Wo * self.embed_dim * self.in_chans * (self.patch_size[0] * self.patch_size[1])
+        if self.norm is not None:
+            flops += Ho * Wo * self.embed_dim
+        return flops
+
+
+class SwinTransformer(nn.Module):
+    r""" Swin Transformer
+        A PyTorch impl of : `Swin Transformer: Hierarchical Vision Transformer using Shifted Windows`  -
+          https://arxiv.org/pdf/2103.14030
+    Args:
+        img_size (int | tuple(int)): Input image size. Default 224
+        patch_size (int | tuple(int)): Patch size. Default: 4
+        in_chans (int): Number of input image channels. Default: 3
+        num_classes (int): Number of classes for classification head. Default: 1000
+        embed_dim (int): Patch embedding dimension. Default: 96
+        depths (tuple(int)): Depth of each Swin Transformer layer.
+        num_heads (tuple(int)): Number of attention heads in different layers.
+        window_size (int): Window size. Default: 7
+        mlp_ratio (float): Ratio of mlp hidden dim to embedding dim. Default: 4
+        qkv_bias (bool): If True, add a learnable bias to query, key, value. Default: True
+        qk_scale (float): Override default qk scale of head_dim ** -0.5 if set. Default: None
+        drop_rate (float): Dropout rate. Default: 0
+        attn_drop_rate (float): Attention dropout rate. Default: 0
+        drop_path_rate (float): Stochastic depth rate. Default: 0.1
+        norm_layer (nn.Module): Normalization layer. Default: nn.LayerNorm.
+        ape (bool): If True, add absolute position embedding to the patch embedding. Default: False
+        patch_norm (bool): If True, add normalization after patch embedding. Default: True
+        use_checkpoint (bool): Whether to use checkpointing to save memory. Default: False
+    """
+
+    def __init__(self, img_size=224, patch_size=4, in_chans=3, num_classes=1000,
+                 embed_dim=96, depths=[2, 2, 6, 2], num_heads=[3, 6, 12, 24],
+                 window_size=7, mlp_ratio=4., qkv_bias=True, qk_scale=None,
+                 drop_rate=0., attn_drop_rate=0., drop_path_rate=0.1,
+                 norm_layer=nn.LayerNorm, ape=False, patch_norm=True,
+                 use_checkpoint=False, **kwargs):
+        super().__init__()
+
+        self.num_classes = num_classes
+        self.num_layers = len(depths)
+        self.embed_dim = embed_dim
+        self.ape = ape
+        self.patch_norm = patch_norm
+        self.num_features = int(embed_dim * 2 ** (self.num_layers - 1))
+        self.mlp_ratio = mlp_ratio
+
+        # split image into non-overlapping patches
+        self.patch_embed = PatchEmbed(
+            img_size=img_size, patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim,
+            norm_layer=norm_layer if self.patch_norm else None)
+        num_patches = self.patch_embed.num_patches
+        patches_resolution = self.patch_embed.patches_resolution
+        self.patches_resolution = patches_resolution
+
+        # absolute position embedding
+        if self.ape:
+            self.absolute_pos_embed = nn.Parameter(torch.zeros(1, num_patches, embed_dim))
+            trunc_normal_(self.absolute_pos_embed, std=.02)
+
+        self.pos_drop = nn.Dropout(p=drop_rate)
+
+        # stochastic depth
+        dpr = [x.item() for x in torch.linspace(0, drop_path_rate, sum(depths))]  # stochastic depth decay rule
+
+        # build layers
+        self.layers = nn.ModuleList()
+        for i_layer in range(self.num_layers):
+            layer = BasicLayer(dim=int(embed_dim * 2 ** i_layer),
+                               input_resolution=(patches_resolution[0] // (2 ** i_layer),
+                                                 patches_resolution[1] // (2 ** i_layer)),
+                               depth=depths[i_layer],
+                               num_heads=num_heads[i_layer],
+                               window_size=window_size,
+                               mlp_ratio=self.mlp_ratio,
+                               qkv_bias=qkv_bias, qk_scale=qk_scale,
+                               drop=drop_rate, attn_drop=attn_drop_rate,
+                               drop_path=dpr[sum(depths[:i_layer]):sum(depths[:i_layer + 1])],
+                               norm_layer=norm_layer,
+                               downsample=PatchMerging if (i_layer < self.num_layers - 1) else None,
+                               use_checkpoint=use_checkpoint)
+            self.layers.append(layer)
+
+        self.norm = norm_layer(self.num_features)
+        self.avgpool = nn.AdaptiveAvgPool1d(1)
+        self.head = nn.Linear(self.num_features, num_classes) if num_classes > 0 else nn.Identity()
+
+        self.apply(self._init_weights)
+
+    def _init_weights(self, m):
+        if isinstance(m, nn.Linear):
+            trunc_normal_(m.weight, std=.02)
+            if isinstance(m, nn.Linear) and m.bias is not None:
+                nn.init.constant_(m.bias, 0)
+        elif isinstance(m, nn.LayerNorm):
+            nn.init.constant_(m.bias, 0)
+            nn.init.constant_(m.weight, 1.0)
+
+    @torch.jit.ignore
+    def no_weight_decay(self):
+        return {'absolute_pos_embed'}
+
+    @torch.jit.ignore
+    def no_weight_decay_keywords(self):
+        return {'relative_position_bias_table'}
+
+    def forward_features(self, x):
+        x = self.patch_embed(x)
+        if self.ape:
+            x = x + self.absolute_pos_embed
+        x = self.pos_drop(x)
+        attns = []
+        for layer in self.layers:
+            x, attn = layer(x)
+            attns.append(attn)
+        # for attn in attns:
+            # print(attn.shape)
+
+        x = self.norm(x)  # B L C
+        x = self.avgpool(x.transpose(1, 2))  # B C 1
+        x = torch.flatten(x, 1)
+        return x, attns[-1]
+
+    def forward(self, x):
+        x, attn = self.forward_features(x)
+        x = self.head(x)
+        return x, attn
+
+    def flops(self):
+        flops = 0
+        flops += self.patch_embed.flops()
+        for i, layer in enumerate(self.layers):
+            flops += layer.flops()
+        flops += self.num_features * self.patches_resolution[0] * self.patches_resolution[1] // (2 ** self.num_layers)
+        flops += self.num_features * self.num_classes
+        return flops
\ No newline at end of file
diff --git a/final-project/model_zoo/ttach/Dockerfile.dev b/final-project/model_zoo/ttach/Dockerfile.dev
new file mode 100644
index 0000000..910e272
--- /dev/null
+++ b/final-project/model_zoo/ttach/Dockerfile.dev
@@ -0,0 +1,11 @@
+FROM  anibali/pytorch:no-cuda
+
+# install requirements
+RUN pip install pytest
+
+# copy project
+COPY . /project
+WORKDIR /project
+
+# install project
+RUN pip install .
diff --git a/final-project/model_zoo/ttach/LICENSE b/final-project/model_zoo/ttach/LICENSE
new file mode 100644
index 0000000..8f14f4e
--- /dev/null
+++ b/final-project/model_zoo/ttach/LICENSE
@@ -0,0 +1,21 @@
+The MIT License
+
+Copyright (c) 2019, Pavel Yakubovskiy
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
\ No newline at end of file
diff --git a/final-project/model_zoo/ttach/MANIFEST.in b/final-project/model_zoo/ttach/MANIFEST.in
new file mode 100644
index 0000000..d18a98d
--- /dev/null
+++ b/final-project/model_zoo/ttach/MANIFEST.in
@@ -0,0 +1 @@
+include README.md LICENSE requirements.txt
\ No newline at end of file
diff --git a/final-project/model_zoo/ttach/README.md b/final-project/model_zoo/ttach/README.md
new file mode 100644
index 0000000..2753901
--- /dev/null
+++ b/final-project/model_zoo/ttach/README.md
@@ -0,0 +1,125 @@
+# TTAch
+Image Test Time Augmentation with PyTorch!
+
+Similar to what Data Augmentation is doing to the training set, the purpose of Test Time Augmentation is to perform random modifications to the test images. Thus, instead of showing the regular, “clean” images, only once to the trained model, we will show it the augmented images several times. We will then average the predictions of each corresponding image and take that as our final guess [[1](https://towardsdatascience.com/test-time-augmentation-tta-and-how-to-perform-it-with-keras-4ac19b67fb4d)].  
+```
+           Input
+             |           # input batch of images 
+        / / /|\ \ \      # apply augmentations (flips, rotation, scale, etc.)
+       | | | | | | |     # pass augmented batches through model
+       | | | | | | |     # reverse transformations for each batch of masks/labels
+        \ \ \ / / /      # merge predictions (mean, max, gmean, etc.)
+             |           # output batch of masks/labels
+           Output
+```
+## Table of Contents
+1. [Quick Start](#quick-start)
+2. [Transforms](#transforms)
+3. [Aliases](#aliases)
+4. [Merge modes](#merge-modes)
+5. [Installation](#installation)
+
+## Quick start
+
+#####  Segmentation model wrapping [[docstring](ttach/wrappers.py#L8)]:
+```python
+import ttach as tta
+tta_model = tta.SegmentationTTAWrapper(model, tta.aliases.d4_transform(), merge_mode='mean')
+```
+#####  Classification model wrapping [[docstring](ttach/wrappers.py#L52)]:
+```python
+tta_model = tta.ClassificationTTAWrapper(model, tta.aliases.five_crop_transform())
+```
+
+#####  Keypoints model wrapping [[docstring](ttach/wrappers.py#L96)]:
+```python
+tta_model = tta.KeypointsTTAWrapper(model, tta.aliases.flip_transform(), scaled=True)
+```
+**Note**: the model must return keypoints in the format `torch([x1, y1, ..., xn, yn])`
+
+## Advanced Examples
+#####  Custom transform:
+```python
+# defined 2 * 2 * 3 * 3 = 36 augmentations !
+transforms = tta.Compose(
+    [
+        tta.HorizontalFlip(),
+        tta.Rotate90(angles=[0, 180]),
+        tta.Scale(scales=[1, 2, 4]),
+        tta.Multiply(factors=[0.9, 1, 1.1]),        
+    ]
+)
+
+tta_model = tta.SegmentationTTAWrapper(model, transforms)
+```
+##### Custom model (multi-input / multi-output)
+```python
+# Example how to process ONE batch on images with TTA
+# Here `image`/`mask` are 4D tensors (B, C, H, W), `label` is 2D tensor (B, N)
+
+for transformer in transforms: # custom transforms or e.g. tta.aliases.d4_transform() 
+    
+    # augment image
+    augmented_image = transformer.augment_image(image)
+    
+    # pass to model
+    model_output = model(augmented_image, another_input_data)
+    
+    # reverse augmentation for mask and label
+    deaug_mask = transformer.deaugment_mask(model_output['mask'])
+    deaug_label = transformer.deaugment_label(model_output['label'])
+    
+    # save results
+    labels.append(deaug_mask)
+    masks.append(deaug_label)
+    
+# reduce results as you want, e.g mean/max/min
+label = mean(labels)
+mask = mean(masks)
+```
+ 
+## Transforms
+  
+| Transform      | Parameters                | Values                            |
+|----------------|:-------------------------:|:---------------------------------:|
+| HorizontalFlip | -                         | -                                 |
+| VerticalFlip   | -                         | -                                 |
+| Rotate90       | angles                    | List\[0, 90, 180, 270]            |
+| Scale          | scales<br>interpolation   | List\[float]<br>"nearest"/"linear"|
+| Resize         | sizes<br>original_size<br>interpolation   | List\[Tuple\[int, int]]<br>Tuple\[int,int]<br>"nearest"/"linear"|
+| Add            | values                    | List\[float]                      |
+| Multiply       | factors                   | List\[float]                      |
+| FiveCrops      | crop_height<br>crop_width | int<br>int                        |
+ 
+## Aliases
+
+  - flip_transform (horizontal + vertical flips)
+  - hflip_transform (horizontal flip)
+  - d4_transform (flips + rotation 0, 90, 180, 270)
+  - multiscale_transform (scale transform, take scales as input parameter)
+  - five_crop_transform (corner crops + center crop)
+  - ten_crop_transform (five crops + five crops on horizontal flip)
+  
+## Merge modes
+ - mean
+ - gmean (geometric mean)
+ - sum
+ - max
+ - min
+ - tsharpen ([temperature sharpen](https://www.kaggle.com/c/severstal-steel-defect-detection/discussion/107716#latest-624046) with t=0.5)
+ 
+## Installation
+PyPI:
+```bash
+$ pip install ttach
+```
+Source:
+```bash
+$ pip install git+https://github.com/qubvel/ttach
+```
+
+## Run tests
+
+```bash
+docker build -f Dockerfile.dev -t ttach:dev . && docker run --rm ttach:dev pytest -p no:cacheprovider
+```
diff --git a/final-project/model_zoo/ttach/__init__.py b/final-project/model_zoo/ttach/__init__.py
new file mode 100644
index 0000000..f05f4f5
--- /dev/null
+++ b/final-project/model_zoo/ttach/__init__.py
@@ -0,0 +1 @@
+from .ttach import *
diff --git a/final-project/model_zoo/ttach/requirements.txt b/final-project/model_zoo/ttach/requirements.txt
new file mode 100644
index 0000000..e69de29
diff --git a/final-project/model_zoo/ttach/setup.py b/final-project/model_zoo/ttach/setup.py
new file mode 100644
index 0000000..d2e953c
--- /dev/null
+++ b/final-project/model_zoo/ttach/setup.py
@@ -0,0 +1,131 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+# Note: To use the 'upload' functionality of this file, you must:
+#   $ pip install twine
+
+import io
+import os
+import sys
+from shutil import rmtree
+
+from setuptools import find_packages, setup, Command
+
+# Package meta-data.
+NAME = 'ttach'
+DESCRIPTION = 'Images test time augmentation with PyTorch.'
+URL = 'https://github.com/qubvel/ttach'
+EMAIL = 'qubvel@gmail.com'
+AUTHOR = 'Pavel Yakubovskiy'
+REQUIRES_PYTHON = '>=3.0.0'
+VERSION = None
+
+# The rest you shouldn't have to touch too much :)
+# ------------------------------------------------
+# Except, perhaps the License and Trove Classifiers!
+# If you do change the License, remember to change the Trove Classifier for that!
+
+here = os.path.abspath(os.path.dirname(__file__))
+
+# What packages are required for this module to be executed?
+try:
+    with open(os.path.join(here, 'requirements.txt'), encoding='utf-8') as f:
+        REQUIRED = f.read().split('\n')
+except:
+    REQUIRED = []
+
+# What packages are optional?
+EXTRAS = {
+    'test': ['pytest']
+}
+
+# Import the README and use it as the long-description.
+# Note: this will only work if 'README.md' is present in your MANIFEST.in file!
+try:
+    with io.open(os.path.join(here, 'README.md'), encoding='utf-8') as f:
+        long_description = '\n' + f.read()
+except FileNotFoundError:
+    long_description = DESCRIPTION
+
+# Load the package's __version__.py module as a dictionary.
+about = {}
+if not VERSION:
+    with open(os.path.join(here, NAME, '__version__.py')) as f:
+        exec(f.read(), about)
+else:
+    about['__version__'] = VERSION
+
+
+class UploadCommand(Command):
+    """Support setup.py upload."""
+
+    description = 'Build and publish the package.'
+    user_options = []
+
+    @staticmethod
+    def status(s):
+        """Prints things in bold."""
+        print(s)
+
+    def initialize_options(self):
+        pass
+
+    def finalize_options(self):
+        pass
+
+    def run(self):
+        try:
+            self.status('Removing previous builds...')
+            rmtree(os.path.join(here, 'dist'))
+        except OSError:
+            pass
+
+        self.status('Building Source and Wheel (universal) distribution...')
+        os.system('{0} setup.py sdist bdist_wheel --universal'.format(sys.executable))
+
+        self.status('Uploading the package to PyPI via Twine...')
+        os.system('twine upload dist/*')
+
+        self.status('Pushing git tags...')
+        os.system('git tag v{0}'.format(about['__version__']))
+        os.system('git push --tags')
+
+        sys.exit()
+
+
+# Where the magic happens:
+setup(
+    name=NAME,
+    version=about['__version__'],
+    description=DESCRIPTION,
+    long_description=long_description,
+    long_description_content_type='text/markdown',
+    author=AUTHOR,
+    author_email=EMAIL,
+    python_requires=REQUIRES_PYTHON,
+    url=URL,
+    packages=find_packages(exclude=('tests', 'docs', 'images')),
+    # If your package is a single module, use this instead of 'packages':
+    # py_modules=['mypackage'],
+
+    # entry_points={
+    #     'console_scripts': ['mycli=mymodule:cli'],
+    # },
+    install_requires=REQUIRED,
+    extras_require=EXTRAS,
+    include_package_data=True,
+    license='MIT',
+    classifiers=[
+        # Trove classifiers
+        # Full list: https://pypi.python.org/pypi?%3Aaction=list_classifiers
+        'License :: OSI Approved :: MIT License',
+        'Programming Language :: Python',
+        'Programming Language :: Python :: 3',
+        'Programming Language :: Python :: Implementation :: CPython',
+        'Programming Language :: Python :: Implementation :: PyPy'
+    ],
+    # $ setup.py publish support.
+    cmdclass={
+        'upload': UploadCommand,
+    },
+)
diff --git a/final-project/model_zoo/ttach/tests/test_base.py b/final-project/model_zoo/ttach/tests/test_base.py
new file mode 100644
index 0000000..4646dac
--- /dev/null
+++ b/final-project/model_zoo/ttach/tests/test_base.py
@@ -0,0 +1,48 @@
+import pytest
+import torch
+import ttach as tta
+
+
+def test_compose_1():
+    transform = tta.Compose(
+        [
+            tta.HorizontalFlip(),
+            tta.VerticalFlip(),
+            tta.Rotate90(angles=[0, 90, 180, 270]),
+            tta.Scale(scales=[1, 2, 4], interpolation="nearest"),
+        ]
+    )
+
+    assert len(transform) == 2 * 2 * 4 * 3  # all combinations for aug parameters
+
+    dummy_label = torch.ones(2).reshape(2, 1).float()
+    dummy_image = torch.arange(2 * 3 * 4 * 5).reshape(2, 3, 4, 5).float()
+    dummy_model = lambda x: {"label": dummy_label, "mask": x}
+
+    for augmenter in transform:
+        augmented_image = augmenter.augment_image(dummy_image)
+        model_output = dummy_model(augmented_image)
+        deaugmented_mask = augmenter.deaugment_mask(model_output["mask"])
+        deaugmented_label = augmenter.deaugment_label(model_output["label"])
+        assert torch.allclose(deaugmented_mask, dummy_image)
+        assert torch.allclose(deaugmented_label, dummy_label)
+
+
+@pytest.mark.parametrize(
+    "case",
+    [
+        ("mean", 0.5),
+        ("gmean", 0.0),
+        ("max", 1.0),
+        ("min", 0.0),
+        ("sum", 1.5),
+        ("tsharpen", 0.56903558),
+    ],
+)
+def test_merger(case):
+    merge_type, output = case
+    input = [1.0, 0.0, 0.5]
+    merger = tta.base.Merger(type=merge_type, n=len(input))
+    for i in input:
+        merger.append(torch.tensor(i))
+    assert torch.allclose(merger.result, torch.tensor(output))
diff --git a/final-project/model_zoo/ttach/tests/test_transforms.py b/final-project/model_zoo/ttach/tests/test_transforms.py
new file mode 100644
index 0000000..4848791
--- /dev/null
+++ b/final-project/model_zoo/ttach/tests/test_transforms.py
@@ -0,0 +1,104 @@
+import pytest
+import torch
+import ttach as tta
+
+
+@pytest.mark.parametrize(
+    "transform",
+    [
+        tta.HorizontalFlip(),
+        tta.VerticalFlip(),
+        tta.Rotate90(angles=[0, 90, 180, 270]),
+        tta.Scale(scales=[1, 2, 4], interpolation="nearest"),
+        tta.Resize(sizes=[(4, 5), (8, 10)], original_size=(4, 5), interpolation="nearest")
+    ],
+)
+def test_aug_deaug_mask(transform):
+    a = torch.arange(20).reshape(1, 1, 4, 5).float()
+    for p in transform.params:
+        aug = transform.apply_aug_image(a, **{transform.pname: p})
+        deaug = transform.apply_deaug_mask(aug, **{transform.pname: p})
+        assert torch.allclose(a, deaug)
+
+
+@pytest.mark.parametrize(
+    "transform",
+    [
+        tta.HorizontalFlip(),
+        tta.VerticalFlip(),
+        tta.Rotate90(angles=[0, 90, 180, 270]),
+        tta.Scale(scales=[1, 2, 4], interpolation="nearest"),
+        tta.Add(values=[-1, 0, 1, 2]),
+        tta.Multiply(factors=[-1, 0, 1, 2]),
+        tta.FiveCrops(crop_height=3, crop_width=5),
+        tta.Resize(sizes=[(4, 5), (8, 10), (2, 2)], interpolation="nearest")
+    ],
+)
+def test_label_is_same(transform):
+    a = torch.arange(20).reshape(1, 1, 4, 5).float()
+    for p in transform.params:
+        aug = transform.apply_aug_image(a, **{transform.pname: p})
+        deaug = transform.apply_deaug_label(aug, **{transform.pname: p})
+        assert torch.allclose(aug, deaug)
+
+
+@pytest.mark.parametrize(
+    "transform",
+    [
+        tta.HorizontalFlip(),
+        tta.VerticalFlip()
+    ],
+)
+def test_flip_keypoints(transform):
+    keypoints = torch.tensor([[0.1, 0.1], [0.1, 0.9], [0.9, 0.1], [0.9, 0.9], [0.4, 0.3]])
+    for p in transform.params:
+        aug = transform.apply_deaug_keypoints(keypoints.detach().clone(), **{transform.pname: p})
+        deaug = transform.apply_deaug_keypoints(aug, **{transform.pname: p})
+        assert torch.allclose(keypoints, deaug)
+
+
+@pytest.mark.parametrize(
+    "transform",
+    [
+        tta.Rotate90(angles=[0, 90, 180, 270])
+    ],
+)
+def test_rotate90_keypoints(transform):
+    keypoints = torch.tensor([[0.1, 0.1], [0.1, 0.9], [0.9, 0.1], [0.9, 0.9], [0.4, 0.3]])
+    for p in transform.params:
+        aug = transform.apply_deaug_keypoints(keypoints.detach().clone(), **{transform.pname: p})
+        deaug = transform.apply_deaug_keypoints(aug, **{transform.pname: -p})
+        assert torch.allclose(keypoints, deaug)
+
+
+def test_add_transform():
+    transform = tta.Add(values=[-1, 0, 1])
+    a = torch.arange(20).reshape(1, 1, 4, 5).float()
+    for p in transform.params:
+        aug = transform.apply_aug_image(a, **{transform.pname: p})
+        assert torch.allclose(aug, a + p)
+
+
+def test_multiply_transform():
+    transform = tta.Multiply(factors=[-1, 0, 1])
+    a = torch.arange(20).reshape(1, 1, 4, 5).float()
+    for p in transform.params:
+        aug = transform.apply_aug_image(a, **{transform.pname: p})
+        assert torch.allclose(aug, a * p)
+
+
+def test_fivecrop_transform():
+    transform = tta.FiveCrops(crop_height=1, crop_width=1)
+    a = torch.arange(25).reshape(1, 1, 5, 5).float()
+    output = [0, 20, 24, 4, 12]
+    for i, p in enumerate(transform.params):
+        aug = transform.apply_aug_image(a, **{transform.pname: p})
+        assert aug.item() == output[i]
+
+#
+# def test_resize_transform():
+#     transform = tta.Resize(sizes=[(10, 10), (5, 5)], original_size=(5, 5))
+#     a = torch.arange(25).reshape(1, 1, 5, 5).float()
+#     for i, p in enumerate(transform.params):
+#         aug = transform.apply_aug_image(a, **{transform.pname: p})
+#         assert aug.item() == output[i]
\ No newline at end of file
diff --git a/final-project/model_zoo/ttach/ttach/__init__.py b/final-project/model_zoo/ttach/ttach/__init__.py
new file mode 100644
index 0000000..30b43e8
--- /dev/null
+++ b/final-project/model_zoo/ttach/ttach/__init__.py
@@ -0,0 +1,14 @@
+from .wrappers import (
+    SegmentationTTAWrapper,
+    ClassificationTTAWrapper,
+    KeypointsTTAWrapper
+)
+from .base import Compose
+
+from .transforms import (
+    HorizontalFlip, VerticalFlip, Rotate90, Scale, Add, Multiply, FiveCrops, Resize
+)
+
+from . import aliases
+
+from .__version__ import __version__
diff --git a/final-project/model_zoo/ttach/ttach/__version__.py b/final-project/model_zoo/ttach/ttach/__version__.py
new file mode 100644
index 0000000..bd570e9
--- /dev/null
+++ b/final-project/model_zoo/ttach/ttach/__version__.py
@@ -0,0 +1,3 @@
+VERSION = (0, 0, 3)
+
+__version__ = '.'.join(map(str, VERSION))
diff --git a/final-project/model_zoo/ttach/ttach/aliases.py b/final-project/model_zoo/ttach/ttach/aliases.py
new file mode 100644
index 0000000..ac44c4e
--- /dev/null
+++ b/final-project/model_zoo/ttach/ttach/aliases.py
@@ -0,0 +1,34 @@
+from .base import Compose
+from . import transforms as tta
+
+
+def flip_transform():
+    return Compose([tta.HorizontalFlip(), tta.VerticalFlip()])
+
+
+def hflip_transform():
+    return Compose([tta.HorizontalFlip()])
+
+
+def vflip_transform():
+    return Compose([tta.VerticalFlip()])
+
+
+def d4_transform():
+    return Compose(
+        [
+            tta.HorizontalFlip(),
+            tta.Rotate90(angles=[0, 90, 180, 270]),
+        ]
+    )
+
+def multiscale_transform(scales, interpolation="nearest"):
+    return Compose([tta.Scale(scales, interpolation=interpolation)])
+
+
+def five_crop_transform(crop_height, crop_width):
+    return Compose([tta.FiveCrops(crop_height, crop_width)])
+
+
+def ten_crop_transform(crop_height, crop_width):
+    return Compose([tta.HorizontalFlip(), tta.FiveCrops(crop_height, crop_width)])
diff --git a/final-project/model_zoo/ttach/ttach/base.py b/final-project/model_zoo/ttach/ttach/base.py
new file mode 100644
index 0000000..e45b860
--- /dev/null
+++ b/final-project/model_zoo/ttach/ttach/base.py
@@ -0,0 +1,161 @@
+import itertools
+from functools import partial
+from typing import List, Optional, Union
+
+from . import functional as F
+
+
+class BaseTransform:
+    identity_param = None
+
+    def __init__(
+            self,
+            name: str,
+            params: Union[list, tuple],
+    ):
+        self.params = params
+        self.pname = name
+
+    def apply_aug_image(self, image, *args, **params):
+        raise NotImplementedError
+
+    def apply_deaug_mask(self, mask, *args, **params):
+        raise NotImplementedError
+
+    def apply_deaug_label(self, label, *args, **params):
+        raise NotImplementedError
+
+    def apply_deaug_keypoints(self, keypoints, *args, **params):
+        raise NotImplementedError
+
+
+class ImageOnlyTransform(BaseTransform):
+
+    def apply_deaug_mask(self, mask, *args, **params):
+        return mask
+
+    def apply_deaug_label(self, label, *args, **params):
+        return label
+
+    def apply_deaug_keypoints(self, keypoints, *args, **params):
+        return keypoints
+
+
+class DualTransform(BaseTransform):
+    pass
+
+
+class Chain:
+
+    def __init__(
+            self,
+            functions: List[callable]
+    ):
+        self.functions = functions or []
+
+    def __call__(self, x):
+        for f in self.functions:
+            x = f(x)
+        return x
+
+
+class Transformer:
+    def __init__(
+            self,
+            image_pipeline: Chain,
+            mask_pipeline: Chain,
+            label_pipeline: Chain,
+            keypoints_pipeline: Chain
+    ):
+        self.image_pipeline = image_pipeline
+        self.mask_pipeline = mask_pipeline
+        self.label_pipeline = label_pipeline
+        self.keypoints_pipeline = keypoints_pipeline
+
+    def augment_image(self, image):
+        return self.image_pipeline(image)
+
+    def deaugment_mask(self, mask):
+        return self.mask_pipeline(mask)
+
+    def deaugment_label(self, label):
+        return self.label_pipeline(label)
+
+    def deaugment_keypoints(self, keypoints):
+        return self.keypoints_pipeline(keypoints)
+
+
+class Compose:
+
+    def __init__(
+            self,
+            transforms: List[BaseTransform],
+    ):
+        self.aug_transforms = transforms
+        self.aug_transform_parameters = list(itertools.product(*[t.params for t in self.aug_transforms]))
+        self.deaug_transforms = transforms[::-1]
+        self.deaug_transform_parameters = [p[::-1] for p in self.aug_transform_parameters]
+
+    def __iter__(self) -> Transformer:
+        for aug_params, deaug_params in zip(self.aug_transform_parameters, self.deaug_transform_parameters):
+            image_aug_chain = Chain([partial(t.apply_aug_image, **{t.pname: p})
+                                     for t, p in zip(self.aug_transforms, aug_params)])
+            mask_deaug_chain = Chain([partial(t.apply_deaug_mask, **{t.pname: p})
+                                      for t, p in zip(self.deaug_transforms, deaug_params)])
+            label_deaug_chain = Chain([partial(t.apply_deaug_label, **{t.pname: p})
+                                       for t, p in zip(self.deaug_transforms, deaug_params)])
+            keypoints_deaug_chain = Chain([partial(t.apply_deaug_keypoints, **{t.pname: p})
+                                           for t, p in zip(self.deaug_transforms, deaug_params)])
+            yield Transformer(
+                image_pipeline=image_aug_chain,
+                mask_pipeline=mask_deaug_chain,
+                label_pipeline=label_deaug_chain,
+                keypoints_pipeline=keypoints_deaug_chain
+            )
+
+    def __len__(self) -> int:
+        return len(self.aug_transform_parameters)
+
+
+class Merger:
+
+    def __init__(
+            self,
+            type: str = 'mean',
+            n: int = 1,
+    ):
+
+        if type not in ['mean', 'gmean', 'sum', 'max', 'min', 'tsharpen']:
+            raise ValueError('Not correct merge type `{}`.'.format(type))
+
+        self.output = None
+        self.type = type
+        self.n = n
+
+    def append(self, x):
+
+        if self.type == 'tsharpen':
+            x = x ** 0.5
+
+        if self.output is None:
+            self.output = x
+        elif self.type in ['mean', 'sum', 'tsharpen']:
+            self.output = self.output + x
+        elif self.type == 'gmean':
+            self.output = self.output * x
+        elif self.type == 'max':
+            self.output = F.max(self.output, x)
+        elif self.type == 'min':
+            self.output = F.min(self.output, x)
+
+    @property
+    def result(self):
+        if self.type in ['sum', 'max', 'min']:
+            result = self.output
+        elif self.type in ['mean', 'tsharpen']:
+            result = self.output / self.n
+        elif self.type in ['gmean']:
+            result = self.output ** (1 / self.n)
+        else:
+            raise ValueError('Not correct merge type `{}`.'.format(self.type))
+        return result
diff --git a/final-project/model_zoo/ttach/ttach/functional.py b/final-project/model_zoo/ttach/ttach/functional.py
new file mode 100644
index 0000000..400b7d7
--- /dev/null
+++ b/final-project/model_zoo/ttach/ttach/functional.py
@@ -0,0 +1,136 @@
+import torch
+import torch.nn.functional as F
+
+
+def rot90(x, k=1):
+    """rotate batch of images by 90 degrees k times"""
+    return torch.rot90(x, k, (2, 3))
+
+
+def hflip(x):
+    """flip batch of images horizontally"""
+    return x.flip(3)
+
+
+def vflip(x):
+    """flip batch of images vertically"""
+    return x.flip(2)
+
+
+def sum(x1, x2):
+    """sum of two tensors"""
+    return x1 + x2
+
+
+def add(x, value):
+    """add value to tensor"""
+    return x + value
+
+
+def max(x1, x2):
+    """compare 2 tensors and take max values"""
+    return torch.max(x1, x2)
+
+
+def min(x1, x2):
+    """compare 2 tensors and take min values"""
+    return torch.min(x1, x2)
+
+
+def multiply(x, factor):
+    """multiply tensor by factor"""
+    return x * factor
+
+
+def scale(x, scale_factor, interpolation="nearest", align_corners=None):
+    """scale batch of images by `scale_factor` with given interpolation mode"""
+    h, w = x.shape[2:]
+    new_h = int(h * scale_factor)
+    new_w = int(w * scale_factor)
+    return F.interpolate(
+        x, size=(new_h, new_w), mode=interpolation, align_corners=align_corners
+    )
+
+
+def resize(x, size, interpolation="nearest", align_corners=None):
+    """resize batch of images to given spatial size with given interpolation mode"""
+    return F.interpolate(x, size=size, mode=interpolation, align_corners=align_corners)
+
+
+def crop(x, x_min=None, x_max=None, y_min=None, y_max=None):
+    """perform crop on batch of images"""
+    return x[:, :, y_min:y_max, x_min:x_max]
+
+
+def crop_lt(x, crop_h, crop_w):
+    """crop left top corner"""
+    return x[:, :, 0:crop_h, 0:crop_w]
+
+
+def crop_lb(x, crop_h, crop_w):
+    """crop left bottom corner"""
+    return x[:, :, -crop_h:, 0:crop_w]
+
+
+def crop_rt(x, crop_h, crop_w):
+    """crop right top corner"""
+    return x[:, :, 0:crop_h, -crop_w:]
+
+
+def crop_rb(x, crop_h, crop_w):
+    """crop right bottom corner"""
+    return x[:, :, -crop_h:, -crop_w:]
+
+
+def center_crop(x, crop_h, crop_w):
+    """make center crop"""
+
+    center_h = x.shape[2] // 2
+    center_w = x.shape[3] // 2
+    half_crop_h = crop_h // 2
+    half_crop_w = crop_w // 2
+
+    y_min = center_h - half_crop_h
+    y_max = center_h + half_crop_h + crop_h % 2
+    x_min = center_w - half_crop_w
+    x_max = center_w + half_crop_w + crop_w % 2
+
+    return x[:, :, y_min:y_max, x_min:x_max]
+
+
+def _disassemble_keypoints(keypoints):
+    x = keypoints[..., 0]
+    y = keypoints[..., 1]
+    return x, y
+
+
+def _assemble_keypoints(x, y):
+    return torch.stack([x, y], dim=-1)
+
+
+def keypoints_hflip(keypoints):
+    x, y = _disassemble_keypoints(keypoints)
+    return _assemble_keypoints(1. - x, y)
+
+
+def keypoints_vflip(keypoints):
+    x, y = _disassemble_keypoints(keypoints)
+    return _assemble_keypoints(x, 1. - y)
+
+
+def keypoints_rot90(keypoints, k=1):
+
+    if k not in {0, 1, 2, 3}:
+        raise ValueError("Parameter k must be in [0:3]")
+    if k == 0:
+        return keypoints
+    x, y = _disassemble_keypoints(keypoints)
+
+    if k == 1:
+        xy = [y, 1. - x]
+    elif k == 2:
+        xy = [1. - x, 1. - y]
+    elif k == 3:
+        xy = [1. - y, x]
+
+    return _assemble_keypoints(*xy)
diff --git a/final-project/model_zoo/ttach/ttach/transforms.py b/final-project/model_zoo/ttach/ttach/transforms.py
new file mode 100644
index 0000000..0ab6444
--- /dev/null
+++ b/final-project/model_zoo/ttach/ttach/transforms.py
@@ -0,0 +1,265 @@
+from functools import partial
+from typing import Optional, List, Union, Tuple
+from . import functional as F
+from .base import DualTransform, ImageOnlyTransform
+
+
+class HorizontalFlip(DualTransform):
+    """Flip images horizontally (left->right)"""
+
+    identity_param = False
+
+    def __init__(self):
+        super().__init__("apply", [False, True])
+
+    def apply_aug_image(self, image, apply=False, **kwargs):
+        if apply:
+            image = F.hflip(image)
+        return image
+
+    def apply_deaug_mask(self, mask, apply=False, **kwargs):
+        if apply:
+            mask = F.hflip(mask)
+        return mask
+
+    def apply_deaug_label(self, label, apply=False, **kwargs):
+        return label
+
+    def apply_deaug_keypoints(self, keypoints, apply=False, **kwargs):
+        if apply:
+            keypoints = F.keypoints_hflip(keypoints)
+        return keypoints
+
+
+class VerticalFlip(DualTransform):
+    """Flip images vertically (up->down)"""
+
+    identity_param = False
+
+    def __init__(self):
+        super().__init__("apply", [False, True])
+
+    def apply_aug_image(self, image, apply=False, **kwargs):
+        if apply:
+            image = F.vflip(image)
+        return image
+
+    def apply_deaug_mask(self, mask, apply=False, **kwargs):
+        if apply:
+            mask = F.vflip(mask)
+        return mask
+
+    def apply_deaug_label(self, label, apply=False, **kwargs):
+        return label
+
+    def apply_deaug_keypoints(self, keypoints, apply=False, **kwargs):
+        if apply:
+            keypoints = F.keypoints_vflip(keypoints)
+        return keypoints
+
+
+class Rotate90(DualTransform):
+    """Rotate images 0/90/180/270 degrees
+
+    Args:
+        angles (list): angles to rotate images
+    """
+
+    identity_param = 0
+
+    def __init__(self, angles: List[int]):
+        if self.identity_param not in angles:
+            angles = [self.identity_param] + list(angles)
+
+        super().__init__("angle", angles)
+
+    def apply_aug_image(self, image, angle=0, **kwargs):
+        k = angle // 90 if angle >= 0 else (angle + 360) // 90
+        return F.rot90(image, k)
+
+    def apply_deaug_mask(self, mask, angle=0, **kwargs):
+        return self.apply_aug_image(mask, -angle)
+
+    def apply_deaug_label(self, label, angle=0, **kwargs):
+        return label
+
+    def apply_deaug_keypoints(self, keypoints, angle=0, **kwargs):
+        angle *= -1
+        k = angle // 90 if angle >= 0 else (angle + 360) // 90
+        return F.keypoints_rot90(keypoints, k=k)
+
+
+class Scale(DualTransform):
+    """Scale images
+
+    Args:
+        scales (List[Union[int, float]]): scale factors for spatial image dimensions
+        interpolation (str): one of "nearest"/"lenear" (see more in torch.nn.interpolate)
+        align_corners (bool): see more in torch.nn.interpolate
+    """
+
+    identity_param = 1
+
+    def __init__(
+        self,
+        scales: List[Union[int, float]],
+        interpolation: str = "nearest",
+        align_corners: Optional[bool] = None,
+    ):
+        if self.identity_param not in scales:
+            scales = [self.identity_param] + list(scales)
+        self.interpolation = interpolation
+        self.align_corners = align_corners
+
+        super().__init__("scale", scales)
+
+    def apply_aug_image(self, image, scale=1, **kwargs):
+        if scale != self.identity_param:
+            image = F.scale(
+                image,
+                scale,
+                interpolation=self.interpolation,
+                align_corners=self.align_corners,
+            )
+        return image
+
+    def apply_deaug_mask(self, mask, scale=1, **kwargs):
+        if scale != self.identity_param:
+            mask = F.scale(
+                mask,
+                1 / scale,
+                interpolation=self.interpolation,
+                align_corners=self.align_corners,
+            )
+        return mask
+
+    def apply_deaug_label(self, label, scale=1, **kwargs):
+        return label
+
+    def apply_deaug_keypoints(self, keypoints, scale=1, **kwargs):
+        return keypoints
+
+
+class Resize(DualTransform):
+    """Resize images
+
+    Args:
+        sizes (List[Tuple[int, int]): scale factors for spatial image dimensions
+        original_size Tuple(int, int): optional, image original size for deaugmenting mask
+        interpolation (str): one of "nearest"/"lenear" (see more in torch.nn.interpolate)
+        align_corners (bool): see more in torch.nn.interpolate
+    """
+
+    def __init__(
+        self,
+        sizes: List[Tuple[int, int]],
+        original_size: Tuple[int, int] = None,
+        interpolation: str = "nearest",
+        align_corners: Optional[bool] = None,
+    ):
+        if original_size is not None and original_size not in sizes:
+            sizes = [original_size] + list(sizes)
+        self.interpolation = interpolation
+        self.align_corners = align_corners
+        self.original_size = original_size
+
+        super().__init__("size", sizes)
+
+    def apply_aug_image(self, image, size, **kwargs):
+        if size != self.original_size:
+            image = F.resize(
+                image,
+                size,
+                interpolation=self.interpolation,
+                align_corners=self.align_corners,
+            )
+        return image
+
+    def apply_deaug_mask(self, mask, size, **kwargs):
+        if self.original_size is None:
+            raise ValueError(
+                "Provide original image size to make mask backward transformation"
+            )
+        if size != self.original_size:
+            mask = F.resize(
+                mask,
+                self.original_size,
+                interpolation=self.interpolation,
+                align_corners=self.align_corners,
+            )
+        return mask
+
+    def apply_deaug_label(self, label, size=1, **kwargs):
+        return label
+
+    def apply_deaug_keypoints(self, keypoints, size=1, **kwargs):
+        return keypoints
+
+
+class Add(ImageOnlyTransform):
+    """Add value to images
+
+    Args:
+        values (List[float]): values to add to each pixel
+    """
+
+    identity_param = 0
+
+    def __init__(self, values: List[float]):
+
+        if self.identity_param not in values:
+            values = [self.identity_param] + list(values)
+        super().__init__("value", values)
+
+    def apply_aug_image(self, image, value=0, **kwargs):
+        if value != self.identity_param:
+            image = F.add(image, value)
+        return image
+
+
+class Multiply(ImageOnlyTransform):
+    """Multiply images by factor
+
+    Args:
+        factors (List[float]): factor to multiply each pixel by
+    """
+
+    identity_param = 1
+
+    def __init__(self, factors: List[float]):
+        if self.identity_param not in factors:
+            factors = [self.identity_param] + list(factors)
+        super().__init__("factor", factors)
+
+    def apply_aug_image(self, image, factor=1, **kwargs):
+        if factor != self.identity_param:
+            image = F.multiply(image, factor)
+        return image
+
+
+class FiveCrops(ImageOnlyTransform):
+    """Makes 4 crops for each corner + center crop
+
+    Args:
+        crop_height (int): crop height in pixels
+        crop_width (int): crop width in pixels 
+    """
+
+    def __init__(self, crop_height, crop_width):
+        crop_functions = (
+            partial(F.crop_lt, crop_h=crop_height, crop_w=crop_width),
+            partial(F.crop_lb, crop_h=crop_height, crop_w=crop_width),
+            partial(F.crop_rb, crop_h=crop_height, crop_w=crop_width),
+            partial(F.crop_rt, crop_h=crop_height, crop_w=crop_width),
+            partial(F.center_crop, crop_h=crop_height, crop_w=crop_width),
+        )
+        super().__init__("crop_fn", crop_functions)
+
+    def apply_aug_image(self, image, crop_fn=None, **kwargs):
+        return crop_fn(image)
+
+    def apply_deaug_mask(self, mask, **kwargs):
+        raise ValueError("`FiveCrop` augmentation is not suitable for mask!")
+
+    def apply_deaug_keypoints(self, keypoints, **kwargs):
+        raise ValueError("`FiveCrop` augmentation is not suitable for keypoints!")
diff --git a/final-project/model_zoo/ttach/ttach/wrappers.py b/final-project/model_zoo/ttach/ttach/wrappers.py
new file mode 100644
index 0000000..ab19c71
--- /dev/null
+++ b/final-project/model_zoo/ttach/ttach/wrappers.py
@@ -0,0 +1,157 @@
+import torch
+import torch.nn as nn
+from typing import Optional, Mapping, Union, Tuple
+
+from .base import Merger, Compose
+
+
+class SegmentationTTAWrapper(nn.Module):
+    """Wrap PyTorch nn.Module (segmentation model) with test time augmentation transforms
+
+    Args:
+        model (torch.nn.Module): segmentation model with single input and single output
+            (.forward(x) should return either torch.Tensor or Mapping[str, torch.Tensor])
+        transforms (ttach.Compose): composition of test time transforms
+        merge_mode (str): method to merge augmented predictions mean/gmean/max/min/sum/tsharpen
+        output_mask_key (str): if model output is `dict`, specify which key belong to `mask`
+    """
+
+    def __init__(
+        self,
+        model: nn.Module,
+        transforms: Compose,
+        merge_mode: str = "mean",
+        output_mask_key: Optional[str] = None,
+    ):
+        super().__init__()
+        self.model = model
+        self.transforms = transforms
+        self.merge_mode = merge_mode
+        self.output_key = output_mask_key
+
+    def forward(
+        self, image: torch.Tensor, *args
+    ) -> Union[torch.Tensor, Mapping[str, torch.Tensor]]:
+        merger = Merger(type=self.merge_mode, n=len(self.transforms))
+
+        for transformer in self.transforms:
+            augmented_image = transformer.augment_image(image)
+            augmented_output = self.model(augmented_image, *args)
+            if self.output_key is not None:
+                augmented_output = augmented_output[self.output_key]
+            deaugmented_output = transformer.deaugment_mask(augmented_output)
+            merger.append(deaugmented_output)
+
+        result = merger.result
+        if self.output_key is not None:
+            result = {self.output_key: result}
+
+        return result
+
+
+class ClassificationTTAWrapper(nn.Module):
+    """Wrap PyTorch nn.Module (classification model) with test time augmentation transforms
+
+    Args:
+        model (torch.nn.Module): classification model with single input and single output
+            (.forward(x) should return either torch.Tensor or Mapping[str, torch.Tensor])
+        transforms (ttach.Compose): composition of test time transforms
+        merge_mode (str): method to merge augmented predictions mean/gmean/max/min/sum/tsharpen
+        output_label_key (str): if model output is `dict`, specify which key belong to `label`
+    """
+
+    def __init__(
+        self,
+        model: nn.Module,
+        transforms: Compose,
+        merge_mode: str = "mean",
+        output_label_key: Optional[str] = None,
+    ):
+        super().__init__()
+        self.model = model
+        self.transforms = transforms
+        self.merge_mode = merge_mode
+        self.output_key = output_label_key
+
+    def forward(
+        self, image: torch.Tensor, *args
+    ) -> Union[torch.Tensor, Mapping[str, torch.Tensor]]:
+        merger = Merger(type=self.merge_mode, n=len(self.transforms))
+
+        for transformer in self.transforms:
+            augmented_image = transformer.augment_image(image)
+            augmented_output = self.model(augmented_image, *args)
+            if self.output_key is not None:
+                augmented_output = augmented_output[self.output_key]
+            deaugmented_output = transformer.deaugment_label(augmented_output)
+            merger.append(deaugmented_output)
+
+        result = merger.result
+        if self.output_key is not None:
+            result = {self.output_key: result}
+
+        return result
+
+
+class KeypointsTTAWrapper(nn.Module):
+    """Wrap PyTorch nn.Module (keypoints model) with test time augmentation transforms
+
+    Args:
+        model (torch.nn.Module): keypoints model with single input and single output
+         in format [x1,y1, x2, y2, ..., xn, yn]
+            (.forward(x) should return either torch.Tensor or Mapping[str, torch.Tensor])
+        transforms (ttach.Compose): composition of test time transforms
+        merge_mode (str): method to merge augmented predictions mean/gmean/max/min/sum/tsharpen
+        output_keypoints_key (str): if model output is `dict`, specify which key belong to `label`
+        scaled (bool): True if model return x, y scaled values in [0, 1], else False
+
+    """
+
+    def __init__(
+        self,
+        model: nn.Module,
+        transforms: Compose,
+        merge_mode: str = "mean",
+        output_keypoints_key: Optional[str] = None,
+        scaled: bool = False,
+    ):
+        super().__init__()
+        self.model = model
+        self.transforms = transforms
+        self.merge_mode = merge_mode
+        self.output_key = output_keypoints_key
+        self.scaled = scaled
+
+    def forward(
+        self, image: torch.Tensor, *args
+    ) -> Union[torch.Tensor, Mapping[str, torch.Tensor]]:
+        merger = Merger(type=self.merge_mode, n=len(self.transforms))
+        size = image.size()
+        batch_size, image_height, image_width = size[0], size[2], size[3]
+
+        for transformer in self.transforms:
+            augmented_image = transformer.augment_image(image)
+            augmented_output = self.model(augmented_image, *args)
+
+            if self.output_key is not None:
+                augmented_output = augmented_output[self.output_key]
+
+            augmented_output = augmented_output.reshape(batch_size, -1, 2)
+            if not self.scaled:
+                augmented_output[..., 0] /= image_width
+                augmented_output[..., 1] /= image_height
+
+            deaugmented_output = transformer.deaugment_keypoints(augmented_output)
+            merger.append(deaugmented_output)
+
+        result = merger.result
+
+        if not self.scaled:
+            result[..., 0] *= image_width
+            result[..., 1] *= image_height
+        result = result.reshape(batch_size, -1)
+
+        if self.output_key is not None:
+            result = {self.output_key: result}
+
+        return result
diff --git a/final-project/model_zoo/vgg16.py b/final-project/model_zoo/vgg16.py
new file mode 100644
index 0000000..010d9ec
--- /dev/null
+++ b/final-project/model_zoo/vgg16.py
@@ -0,0 +1,22 @@
+import torchvision
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torchvision.models import VGG
+import numpy as np
+
+
+class VGG16(nn.Module):
+    def __init__(self,num_class=1000):
+        super().__init__()
+        self.vgg16 = torch.hub.load('pytorch/vision:v0.5.0', 'vgg16_bn', pretrained=True)
+        # fix imageNet layer
+        self.vgg16.classifier[6] = nn.Linear(4096,num_class)
+        print(type(self.vgg16.classifier))
+    def forward(self,image):
+        output = self.vgg16.features.forward(image)
+        output = self.vgg16.avgpool(output)
+        output = output.view(output.shape[0],-1)
+        features = self.vgg16.classifier[0:6](output)
+        output = self.vgg16.classifier[6](features)
+        return output
\ No newline at end of file
diff --git a/final-project/poster.pdf b/final-project/poster.pdf
new file mode 100644
index 0000000..e80b5b4
Binary files /dev/null and b/final-project/poster.pdf differ
diff --git a/final-project/poster.pptx b/final-project/poster.pptx
new file mode 100644
index 0000000..72bc288
Binary files /dev/null and b/final-project/poster.pptx differ
diff --git a/final-project/requirements.txt b/final-project/requirements.txt
new file mode 100644
index 0000000..6d71fdb
--- /dev/null
+++ b/final-project/requirements.txt
@@ -0,0 +1,42 @@
+fvcore
+kaggle
+certifi==2021.10.8
+charset-normalizer==2.0.7
+click==8.0.3
+cycler==0.10.0
+filelock==3.4.0
+huggingface-hub==0.1.2
+idna==3.3
+imageio==2.9.0
+joblib==0.17.0
+kiwisolver==1.3.1
+matplotlib==3.4.3
+networkx==2.6.3
+numpy==1.21.2
+opencv-python==4.5.3.56
+packaging==21.3
+pandas==1.1.3
+Pillow==8.4.0
+pyparsing==2.4.7
+python-dateutil==2.8.1
+pytorch-pretrained-vit==0.0.7
+pytz==2020.1
+PyWavelets==1.2.0
+PyYAML==6.0
+regex==2021.11.10
+requests==2.26.0
+sacremoses==0.0.46
+scikit-image==0.18.1
+scikit-learn==0.23.2
+scipy==1.6.2
+six==1.16.0
+threadpoolctl==3.0.0
+tifffile==2021.11.2
+timm==0.4.12
+tokenizers==0.10.3
+torch==1.10.0
+torchvision==0.11.1
+tqdm==4.62.3
+transformers==4.12.5
+typing_extensions==4.0.0
+urllib3==1.26.7
diff --git a/final-project/test_TTA.sh b/final-project/test_TTA.sh
new file mode 100644
index 0000000..816b668
--- /dev/null
+++ b/final-project/test_TTA.sh
@@ -0,0 +1,22 @@
+#ResNest269
+python3 test_template_TTA.py --img_size 320 --batch_size 8 --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_main_track.csv --model_path $2 --model_type RESNEST269 --load $2/33.pth --valid_mat $2/33_test_tta_main.mat --kaggle_csv_log $2/33_test_tta_main.log
+python3 test_template_TTA.py --img_size 320 --batch_size 8 --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_freq_track.csv --model_path $2 --model_type RESNEST269 --load $2/33.pth --valid_mat $2/33_test_tta_freq.mat --kaggle_csv_log $2/33_test_tta_freq.log
+python3 test_template_TTA.py --img_size 320 --batch_size 8 --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_comm_track.csv --model_path $2 --model_type RESNEST269 --load $2/33.pth --valid_mat $2/33_test_tta_comm.mat --kaggle_csv_log $2/33_test_tta_comm.log
+python3 test_template_TTA.py --img_size 320 --batch_size 8 --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_rare_track.csv --model_path $2 --model_type RESNEST269 --load $2/33.pth --valid_mat $2/33_test_tta_rare.mat --kaggle_csv_log $2/33_test_tta_rare.log
+
+#Swin
+python3 test_template_TTA.py --img_size 384 --batch_size 8 --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_main_track.csv --model_path $2 --model_type SWIN --load $2/40.pth --valid_mat $2/40_test_tta_main.mat --kaggle_csv_log $2/40_test_tta_main.log
+python3 test_template_TTA.py --img_size 384 --batch_size 8 --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_freq_track.csv --model_path $2 --model_type SWIN --load $2/40.pth --valid_mat $2/40_test_tta_freq.mat --kaggle_csv_log $2/40_test_tta_freq.log
+python3 test_template_TTA.py --img_size 384 --batch_size 8 --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_comm_track.csv --model_path $2 --model_type SWIN --load $2/40.pth --valid_mat $2/40_test_tta_comm.mat --kaggle_csv_log $2/40_test_tta_comm.log
+python3 test_template_TTA.py --img_size 384 --batch_size 8 --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_rare_track.csv --model_path $2 --model_type SWIN --load $2/40.pth --valid_mat $2/40_test_tta_rare.mat --kaggle_csv_log $2/40_test_tta_rare.log
+
+python3 test_template_TTA.py --img_size 384 --batch_size 8 --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_main_track.csv --model_path $2 --model_type SWIN --load $2/22.pth --valid_mat $2/22_test_tta_main.mat --kaggle_csv_log $2/22_test_tta_main.log
+python3 test_template_TTA.py --img_size 384 --batch_size 8 --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_freq_track.csv --model_path $2 --model_type SWIN --load $2/22.pth --valid_mat $2/22_test_tta_freq.mat --kaggle_csv_log $2/22_test_tta_freq.log
+python3 test_template_TTA.py --img_size 384 --batch_size 8 --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_comm_track.csv --model_path $2 --model_type SWIN --load $2/22.pth --valid_mat $2/22_test_tta_comm.mat --kaggle_csv_log $2/22_test_tta_comm.log
+python3 test_template_TTA.py --img_size 384 --batch_size 8 --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_rare_track.csv --model_path $2 --model_type SWIN --load $2/22.pth --valid_mat $2/22_test_tta_rare.mat --kaggle_csv_log $2/22_test_tta_rare.log
+
+#Swin BBN
+python3 test_template_TTA.py --img_size 384 --batch_size 8 --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_main_track.csv --model_path $2 --model_type SWIN_BBN --load $2/38.pth --valid_mat $2/38_test_tta_main.mat --kaggle_csv_log $2/38_test_tta_main.log
+python3 test_template_TTA.py --img_size 384 --batch_size 8 --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_freq_track.csv --model_path $2 --model_type SWIN_BBN --load $2/38.pth --valid_mat $2/38_test_tta_freq.mat --kaggle_csv_log $2/38_test_tta_freq.log
+python3 test_template_TTA.py --img_size 384 --batch_size 8 --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_comm_track.csv --model_path $2 --model_type SWIN_BBN --load $2/38.pth --valid_mat $2/38_test_tta_comm.mat --kaggle_csv_log $2/38_test_tta_comm.log
+python3 test_template_TTA.py --img_size 384 --batch_size 8 --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_rare_track.csv --model_path $2 --model_type SWIN_BBN --load $2/38.pth --valid_mat $2/38_test_tta_rare.mat --kaggle_csv_log $2/38_test_tta_rare.log
\ No newline at end of file
diff --git a/final-project/test_ensemble.sh b/final-project/test_ensemble.sh
new file mode 100644
index 0000000..bf71e3b
--- /dev/null
+++ b/final-project/test_ensemble.sh
@@ -0,0 +1,5 @@
+#Ensemble test
+python3 test_template_ensemble.py --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_main_track.csv --kaggle_csv_log $2/main.log --load_model_mat_1 $2/40_test_tta_main.mat --model_weight_1 1.3 --load_model_mat_2 $2/33_test_tta_main.mat --model_weight_2 1.5 --load_model_mat_3 $2/38_test_tta_main.mat --model_weight_3 1 --load_model_mat_4 $2/22_test_tta_main.mat --model_weight_4 0.8
+python3 test_template_ensemble.py --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_freq_track.csv --kaggle_csv_log $2/freq.log --load_model_mat_1 $2/40_test_tta_freq.mat --model_weight_1 1.3 --load_model_mat_2 $2/33_test_tta_freq.mat --model_weight_2 1.5 --load_model_mat_3 $2/38_test_tta_freq.mat --model_weight_3 1 --load_model_mat_4 $2/22_test_tta_freq.mat --model_weight_4 0.8
+python3 test_template_ensemble.py --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_comm_track.csv --kaggle_csv_log $2/comm.log --load_model_mat_1 $2/40_test_tta_comm.mat --model_weight_1 1.3 --load_model_mat_2 $2/33_test_tta_comm.mat --model_weight_2 1.5 --load_model_mat_3 $2/38_test_tta_comm.mat --model_weight_3 1 --load_model_mat_4 $2/22_test_tta_comm.mat --model_weight_4 0.8 
+python3 test_template_ensemble.py --mode TEST --test_data_dir $1/test --test_data_csv $1/testcase/sample_submission_rare_track.csv --kaggle_csv_log $2/rare.log --load_model_mat_1 $2/40_test_tta_rare.mat --model_weight_1 1.3 --load_model_mat_2 $2/33_test_tta_rare.mat --model_weight_2 1.5 --load_model_mat_3 $2/38_test_tta_rare.mat --model_weight_3 1 --load_model_mat_4 $2/22_test_tta_rare.mat --model_weight_4 0.8
diff --git a/final-project/test_kaggle.sh b/final-project/test_kaggle.sh
new file mode 100644
index 0000000..8961afb
--- /dev/null
+++ b/final-project/test_kaggle.sh
@@ -0,0 +1,17 @@
+# Generate kaggle_csv_log and submit to kaggle
+# bash test_kaggle.sh $1 
+# $1 : model_path (e.g., baseline/)
+# $2 : submit message in kaggle (e.g., ViT_B_16_imagenet1k)
+
+# Freq track
+python3 test_template.py -test_data_csv  food_data/testcase/sample_submission_freq_track.csv  -kaggle_csv_log $1/freq.log -load_model_path $1
+kaggle competitions submit -c dlcv-fall-2021-final-challenge-3-freq-track -f $1/freq.log -m $2
+# Main track
+python3 test_template.py -test_data_csv  food_data/testcase/sample_submission_main_track.csv  -kaggle_csv_log $1/main.log -load_model_path $1
+kaggle competitions submit -c dlcv-fall-2021-final-challenge-3 -f $1/main.log -m $2
+# Common track
+python3 test_template.py -test_data_csv  food_data/testcase/sample_submission_comm_track.csv  -kaggle_csv_log $1/comm.log -load_model_path $1
+kaggle competitions submit -c dlcv-fall-2021-final-challenge-3-comm-track -f $1/comm.log -m $2
+# Rare track
+python3 test_template.py -test_data_csv  food_data/testcase/sample_submission_rare_track.csv  -kaggle_csv_log $1/rare.log -load_model_path $1
+kaggle competitions submit -c dlcv-fall-2021-final-challenge-3-rare-track -f $1/rare.log -m $2
diff --git a/final-project/test_template.py b/final-project/test_template.py
new file mode 100644
index 0000000..77d5196
--- /dev/null
+++ b/final-project/test_template.py
@@ -0,0 +1,64 @@
+import os
+import argparse
+import torch
+import torch.optim as optim
+import warnings
+import argparse
+# our module 
+from model_zoo import vgg16 
+#from model_zoo.swin.swin_transformer import get_swin
+#from model_zoo.pytorch_resnest.resnest.torch import resnest50, resnest101, resnest200, resnest269
+from base.dataset import FoodTestDataset,ChunkSampler
+from base.tester import BaseTester
+from util import *
+if __name__=='__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument("-img_size", "--img_size", default=224,type=int , help='')
+    parser.add_argument("-batch_size", "--batch_size", default=16,type=int , help='')
+    parser.add_argument("-test_data_dir","--test_data_dir", default = "food_data/test",type=str, help ="Testing image directory")
+    parser.add_argument("-test_data_csv","--test_data_csv", default = "food_data/testcase/sample_submission_rare_track.csv",type=str, help ="Testcase csv")
+    parser.add_argument("-load_model_path", "--load_model_path",default="baseline",type=str , help='')
+    parser.add_argument("-kaggle_csv_log", "--kaggle_csv_log",default="baseline/log.csv",type=str , help='')
+    args = parser.parse_args()
+    #######################
+    # Environment setting
+    #######################
+    device = model_setting()
+    fix_seeds(87)
+    ##############
+    # Dataset
+    ##############
+    test_dataset = FoodTestDataset(args.test_data_csv,args.test_data_dir,img_size=args.img_size)
+    test_loader = torch.utils.data.DataLoader(test_dataset,
+                                                batch_size=args.batch_size,
+                                                shuffle=False,
+                                                num_workers=8)
+    ##############
+    # Model
+    ##############
+
+    # TODO define ours' model
+    model = vgg16.VGG16(num_class=1000)
+    # model = get_swin(ckpt='./model_zoo/swin/swin_base_patch4_window7_224.pth')
+    # model = resnest50(pretrained=False) 
+    
+    # load model from [load_model_path]/model_best.pth
+    model.load_state_dict(torch.load(os.path.join(args.load_model_path, "model_best.pth")))
+
+    ##############
+    # Trainer
+    ##############
+    tester = BaseTester(
+                 device = device, 
+                 model = model,
+                 test_loader = test_loader,
+                 load_model_path = args.load_model_path,
+                 kaggle_file= os.path.join(args.kaggle_csv_log))
+    tester.test()
+    
+
+    
+
+
+
+
diff --git a/final-project/test_template_TTA.py b/final-project/test_template_TTA.py
new file mode 100644
index 0000000..c74d09e
--- /dev/null
+++ b/final-project/test_template_TTA.py
@@ -0,0 +1,138 @@
+import os
+import argparse
+import torch
+import torch.optim as optim
+
+from torchvision import transforms
+# our module 
+#from model_zoo import vgg16 
+from model_zoo.swin.swin_transformer import get_swin
+from model_zoo.swin.swin_transformer_bbn import get_swin_bbn
+from model_zoo.pytorch_resnest.resnest.torch import resnest269
+from base.dataset import FoodTestDataset, FoodDataset
+from base.tester import BaseTester
+from util import *
+#from model_zoo.BBN.resnet import bbn_res50
+from model_zoo.BBN.network import BNNetwork
+#from model_zoo.BBN.combiner import Combiner
+# From tester
+import model_zoo.ttach as tta
+if __name__=='__main__':
+	parser = argparse.ArgumentParser()
+	parser.add_argument("-img_size", "--img_size", default=224,type=int , help='')
+	parser.add_argument("-batch_size", "--batch_size", default=16,type=int , help='')
+	parser.add_argument("-test_data_dir","--test_data_dir", default = "food_data/test",type=str, help ="Testing image directory")
+	parser.add_argument("-test_data_csv","--test_data_csv", default = "food_data/testcase/sample_submission_rare_track.csv",type=str, help ="Testcase csv")
+	parser.add_argument("-load", "--load",default="",type=str , help='')
+	parser.add_argument("-model_path", "--model_path",default="BBN_RESNET",type=str , help='')
+	parser.add_argument("-model_type", "--model_type",default="RESNEST269",type=str , help='')
+	parser.add_argument("-valid_mat", "--valid_mat",default="",type=str , help='')
+	parser.add_argument("-kaggle_csv_log", "--kaggle_csv_log",default="BBN_RESNET/log.csv",type=str , help='')
+	parser.add_argument("-mode", "--mode",default="TEST",type=str , help='') # TEST or VALID
+	args = parser.parse_args()
+	#######################
+	# Environment setting
+	#######################
+	device = model_setting()
+	fix_seeds(87)
+	##############
+	# Dataset
+	##############
+	if args.mode == "TEST" :
+		test_dataset = FoodTestDataset(args.test_data_csv,args.test_data_dir,img_size=args.img_size)
+		test_loader = torch.utils.data.DataLoader(test_dataset,
+												  batch_size=args.batch_size,
+												  shuffle=False,
+												  num_workers=8)
+		val_loader = None
+	elif args.mode == "VALID":
+		test_loader = None
+		val_dataset = FoodDataset("food_data/val",img_size=args.img_size,mode = "val")
+		val_loader = torch.utils.data.DataLoader(val_dataset,
+												 batch_size=args.batch_size,
+												 shuffle=False,
+												 num_workers=8)
+	else:
+		print("Wrong Flag QQ")
+		assert(False)
+
+	##############
+	# Model
+	##############
+
+	# TODO define ours' model
+	# model = vgg16.VGG16(num_class=1000)
+	#model = get_swin(ckpt='./model_zoo/swin/swin_large_patch4_window12_384_22kto1k.pth')
+	#model = resnest269(pretrained=False)	
+	#model = get_swin_bbn(ckpt='./model_zoo/swin/swin_large_patch4_window12_384_22kto1k.pth')
+	#model = BNNetwork(backbone_model=model,num_classes=1000,mode="swin") # Support swin/ResNet/ViT
+	if args.model_type == "RESNEST269":
+		model = resnest269(pretrained=False)
+	elif args.model_type == "SWIN":
+		model = get_swin(ckpt='./model_zoo/swin/swin_large_patch4_window12_384_22kto1k.pth')
+	elif args.model_type == "SWIN_BBN":
+		model = get_swin_bbn(ckpt='./model_zoo/swin/swin_large_patch4_window12_384_22kto1k.pth')
+		model = BNNetwork(backbone_model=model,num_classes=1000,mode="swin") # Support swin/ResNet/ViT
+	else:
+		print("Wrong Model type QQ")
+		assert(False)
+		
+	if args.load:
+		model.load_state_dict(torch.load(args.load))
+		print("model loaded from {}".format(args.load))
+
+	#BBN + ResNet50
+	#model = bbn_res50(
+	#		cfg = None,
+	#		pretrain=False,
+	#		pretrained_model="/home/r09021/DLCV_110FALL/final-project-challenge-3-no_qq_no_life/model_zoo/BNN/resnet50-19c8e357.pth",
+	#		last_layer_stride=2
+	#		
+	#)
+	#model = BNNetwork(backbone_model=model,num_classes=1000,mode = "ResNet50")
+	#model = torch.nn.DataParallel(model) #CUDA_VISIBLE_DEVICES
+	#model.load_state_dict(torch.load(os.path.join(args.load_model_path, "model_best.pth")))
+
+	#################################
+	# TTA method
+ 	# TODO define our own TTA metric
+	#################################
+	Food_Aug = tta.Compose(
+		[
+		tta.HorizontalFlip(),
+		#tta.VerticalFlip(),
+		#tta.Rotate90(angles=[0,90,180,270]),
+		#tta.FiveCrops(int(args.img_size*0.8),int(args.img_size*0.8)),
+		#tta.Resize(sizes=(args.img_size,args.img_size) ),
+		]
+	)
+	tta_model = tta.ClassificationTTAWrapper(model = model,
+											 transforms = Food_Aug,
+											 merge_mode="mean")
+	##############
+	# Trainer
+	##############
+	tester = BaseTester(
+				 device = device, 
+				 model = tta_model,
+				 test_loader = test_loader,
+				 val_loader = val_loader,
+				 load_model_path = args.model_path,
+				 mat_file= os.path.join(args.valid_mat),
+				 kaggle_file= os.path.join(args.kaggle_csv_log),
+     			 criterion = torch.nn.CrossEntropyLoss())
+	
+	# Genertate final testing files to kaggle or just generate validation files 
+	if args.mode == "TEST" :
+		tester.test()
+	elif args.mode == "VALID":
+		tester.valid_and_savemat()
+	else:
+		print("Wrong Flag QQ")
+		assert(False)	
+
+	
+
+
+
+
diff --git a/final-project/test_template_ensemble.py b/final-project/test_template_ensemble.py
new file mode 100644
index 0000000..4ab40c0
--- /dev/null
+++ b/final-project/test_template_ensemble.py
@@ -0,0 +1,134 @@
+import os
+import argparse
+import torch
+from scipy.io import loadmat
+from base.dataset import FoodTestDataset, FoodDataset
+import csv
+import pandas as pd
+from util import *
+if __name__=='__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument("-test_data_csv","--test_data_csv", default = "food_data/testcase/sample_submission_rare_track.csv",type=str, help ="Testcase csv")
+    parser.add_argument("-test_data_dir","--test_data_dir", default = "food_data/test",type=str, help ="Testing image directory")
+    parser.add_argument("-img_size", "--img_size", default=384,type=int , help='')
+    parser.add_argument("-load_model_mat_1", "--load_model_mat_1",default="",type=str , help='')
+    parser.add_argument("-model_weight_1", "--model_weight_1",default=1.0,type=float , help='')
+    parser.add_argument("-load_model_mat_2", "--load_model_mat_2",default="",type=str , help='')
+    parser.add_argument("-model_weight_2", "--model_weight_2",default=1.0,type=float , help='')
+    parser.add_argument("-load_model_mat_3", "--load_model_mat_3",default="",type=str , help='')
+    parser.add_argument("-model_weight_3", "--model_weight_3",default=1.0,type=float , help='')
+    parser.add_argument("-load_model_mat_4", "--load_model_mat_4",default="",type=str , help='')
+    parser.add_argument("-model_weight_4", "--model_weight_4",default=1.0,type=float , help='')
+    parser.add_argument("-load_model_mat_5", "--load_model_mat_5",default="",type=str , help='')
+    parser.add_argument("-model_weight_5", "--model_weight_5",default=1.0,type=float , help='')
+    parser.add_argument("-kaggle_csv_log", "--kaggle_csv_log",default="baseline/log.csv",type=str , help='')
+    parser.add_argument("-mode", "--mode", default="TEST", type=str , help='') # TEST or VALID
+    args = parser.parse_args()
+
+    ##############
+    # Ensemble Model
+    ##############
+    model1_valid = False
+    model2_valid = False
+    model3_valid = False
+    model4_valid = False
+    model5_valid = False
+    if args.load_model_mat_1:
+        mat1 = loadmat(args.load_model_mat_1)
+        print(args.load_model_mat_1)
+        print(args.model_weight_1)
+        out1_softmax = mat1['out_softmax']
+        model1_valid = True
+
+    if args.load_model_mat_2:
+        mat2 = loadmat(args.load_model_mat_2)
+        print(args.load_model_mat_2)
+        print(args.model_weight_2)
+        out2_softmax = mat2['out_softmax']
+        model2_valid = True
+
+    if args.load_model_mat_3:
+        mat3 = loadmat(args.load_model_mat_3)
+        print(args.load_model_mat_3)
+        print(args.model_weight_3)
+        out3_softmax = mat3['out_softmax']
+        model3_valid = True
+
+    if args.load_model_mat_4:
+        mat4 = loadmat(args.load_model_mat_4)
+        print(args.load_model_mat_4)
+        print(args.model_weight_4)
+        out4_softmax = mat4['out_softmax']
+        model4_valid = True
+
+    if args.load_model_mat_5:
+        mat5 = loadmat(args.load_model_mat_5)
+        print(args.load_model_mat_5)
+        print(args.model_weight_5)
+        out5_softmax = mat5['out_softmax']
+        model5_valid = True
+    #print(out1_softmax.shape[0])
+    
+    out_softmax = 0
+    if model1_valid:
+        out_softmax += out1_softmax * args.model_weight_1
+    if model2_valid:
+        out_softmax += out2_softmax * args.model_weight_2
+    if model3_valid:
+        out_softmax += out3_softmax * args.model_weight_3
+    if model4_valid:
+        out_softmax += out4_softmax * args.model_weight_4
+    if model5_valid:
+        out_softmax += out5_softmax * args.model_weight_5
+
+    if args.mode == "VALID" :        
+        val_dataset = FoodDataset("food_data/val",img_size=args.img_size,mode = "val")        
+        
+        label = np.squeeze(mat1['label'])
+        pred_label = np.argmax(out_softmax, axis=1)
+        val_acc = (label==pred_label).sum()
+        val_nums = len(label)
+        val_nums_freq = 0.0
+        val_nums_common = 0.0
+        val_nums_rare = 0.0
+        val_acc_freq = 0.0
+        val_acc_common = 0.0
+        val_acc_rare = 0.0
+        for i in range(len(label)):
+            if (val_dataset.freq_list[label[i]] == 0):
+                val_acc_freq += (label[i]==pred_label[i]).item()
+                val_nums_freq += 1
+            elif (val_dataset.freq_list[label[i]] == 1):
+                val_acc_common += (label[i]==pred_label[i]).item()
+                val_nums_common += 1
+            else:
+                val_acc_rare += (label[i]==pred_label[i]).item()
+                val_nums_rare += 1
+        print("Validation accuracy (main) : {:5f}".format(val_acc/val_nums))
+        print("Val_acc {:d} Val_nums {:d} (main)".format(int(val_acc),int(val_nums)) )
+        val_acc_rate_freq = val_acc_freq/val_nums_freq
+        val_acc_rate_common = val_acc_common/val_nums_common
+        val_acc_rate_rare = val_acc_rare/val_nums_rare
+        print("Validation accuracy (freq) : {:5f}".format(val_acc_rate_freq))
+        print("Val_acc {:d} Val_nums {:d} (freq)".format(int(val_acc_freq),int(val_nums_freq)) )
+        print("Validation accuracy (common) : {:5f}".format(val_acc_rate_common))
+        print("Val_acc {:d} Val_nums {:d} (common)".format(int(val_acc_common),int(val_nums_common)) )
+        print("Validation accuracy (rare) : {:5f}".format(val_acc_rate_rare))
+        print("Val_acc {:d} Val_nums {:d} (rare)".format(int(val_acc_rare),int(val_nums_rare)))
+    
+    elif args.mode == "TEST":
+        test_dataset = FoodTestDataset(args.test_data_csv,args.test_data_dir,img_size=args.img_size)
+        
+        pred_label = np.argmax(out_softmax, axis=1)
+        image_ids = ["{:06d}".format(i) for i in test_dataset.data_df.image_id]
+        df = pd.DataFrame({"image_id": image_ids, 'label': pred_label})
+        df.to_csv(args.kaggle_csv_log, index=False)
+        print("===> File saved as {}".format(args.kaggle_csv_log))
+        
+    else:
+        print("Wrong Flag QQ")
+        assert(False)
+
+
+
+
diff --git a/final-project/train.sh b/final-project/train.sh
new file mode 100644
index 0000000..885767b
--- /dev/null
+++ b/final-project/train.sh
@@ -0,0 +1,16 @@
+#ResNest269
+#python3 train_template.py --img_size 320 --lr 1e-5 --batch_size 4 --max_epoch 3 --train_data_dir $1/train --val_data_dir $1/val --model_type RESNEST269 --model_path 33_RESNEST269_stage1
+#python3 train_template_LT.py --img_size 320 --lr 1e-5 --batch_size 4 --max_epoch 4 --LT_EXP REVERSE --train_data_dir $1 --model_type RESNEST269 --model_path checkpoints --load 33_RESNEST269_stage1/model_best.pth
+#mv -f checkpoints/model_best.pth checkpoints/33.pth
+
+#Swin
+#python3 train_template.py --img_size 384 --lr 1e-5 --batch_size 4 --max_epoch 10 --train_data_dir $1/train --val_data_dir $1/val --model_type SWIN --model_path 22_40_SWIN_stage1
+#python3 train_template_LT.py --img_size 384 --lr 1e-5 -batch_size 4 --max_epoch 1 -LT_EXP REVERSE --train_data_dir $1 --model_type SWIN --model_path checkpoints --load 22_40_SWIN_stage1/model_best.pth
+#mv -f checkpoints/model_best.pth checkpoints/22.pth
+#python3 train_template_LT.py --img_size 384 --lr 1e-5 --batch_size 4 --max_epoch 14 --gradaccum_size 16 --train_data_dir $1 --model_type SWIN --param_fix MLP --LT_EXP REVERSE --model_path checkpoints --load 22_40_SWIN_stage1/model_best.pth
+#mv -f checkpoints/model_best.pth checkpoints/40.pth
+
+#Swin BBN
+#python3 train_template.py --img_size 384 --lr 1e-5 --batch_size 4 --max_epoch 1--train_data_dir $1/train --val_data_dir $1/val  --model_type SWIN --model_path 38_SWIN_stage1
+#python3 train_template_BBN.py --img_size 384 --lr 1e-5 --batch_size 2 --max_epoch 10 --gradaccum_size 60 --train_data_dir $1 --model_path checkpoints --load 38_SWIN_stage1/model_best.pth
+#mv -f checkpoints/model_best.pth checkpoints/38.pth
\ No newline at end of file
diff --git a/final-project/train_template.py b/final-project/train_template.py
new file mode 100644
index 0000000..4c6b3de
--- /dev/null
+++ b/final-project/train_template.py
@@ -0,0 +1,103 @@
+import os
+import argparse
+import torch
+import torch.optim as optim
+import warnings
+# our module 
+#from model_zoo import vgg16
+#from model_zoo.pytorch_pretrained_vit import ViT
+from model_zoo.swin.swin_transformer import get_swin
+from model_zoo.pytorch_resnest.resnest.torch import resnest269
+from base.trainer import BaseTrainer
+from base.dataset import FoodDataset,ChunkSampler,P1_Dataset
+from util import *
+
+if __name__=='__main__':
+    parser = argparse.ArgumentParser()
+    # training related argument
+    parser.add_argument("-cont", "--cont",action="store_true", help='')
+    parser.add_argument("-lr", "--lr", default=1e-6,type=float , help='')
+    parser.add_argument("-period", "--period", default=20,type=int , help='')
+    parser.add_argument("-batch_size", "--batch_size", default=8,type=int , help='')
+    parser.add_argument("-gradaccum_size", "--gradaccum_size", default=1,type=int , help='')
+    parser.add_argument("-load", "--load",default="",type=str , help='')
+    parser.add_argument("-model_path", "--model_path",default="baseline",type=str , help='')
+    parser.add_argument("-model_type", "--model_type",default="RESNEST269",type=str , help='')
+    parser.add_argument("-max_epoch", "--max_epoch",default=100,type=int, help='')
+    # data related argument
+    parser.add_argument("-img_size", "--img_size", default=384,type=int , help='')
+    parser.add_argument("-train_data_dir","--train_data_dir", default = "food_data/train",type=str, help ="Training images directory")
+    parser.add_argument("-val_data_dir","--val_data_dir", default = "food_data/val",type=str, help ="Validation images directory")
+    args = parser.parse_args()
+    #######################
+    # Environment setting
+    #######################
+    device = model_setting()
+    fix_seeds(87)
+    os.makedirs(args.model_path, exist_ok=True)
+    ##############
+    # Dataset
+    ##############
+    #train_dataset = P1_Dataset("hw1_data/train_50",val_mode=False)
+    #val_dataset = P1_Dataset("hw1_data/val_50",val_mode=True)
+    train_dataset = FoodDataset(args.train_data_dir,img_size=args.img_size,mode = "train")
+    train_loader = torch.utils.data.DataLoader(train_dataset,
+                                                batch_size=args.batch_size,
+                                                shuffle=True,
+                                                num_workers=8)
+                                                #sampler=ChunkSampler(1024, 512))
+    val_dataset = FoodDataset(args.val_data_dir,img_size=args.img_size,mode = "val")
+    val_loader = torch.utils.data.DataLoader(val_dataset,
+                                                batch_size=args.batch_size,
+                                                shuffle=False,
+                                                num_workers=8)
+                                                #sampler=ChunkSampler(512, 0))
+    ##############
+    # Model
+    ##############
+
+    # TODO define ours' model,schedular
+    # model = ViT(model_name, pretrained=True,num_classes=1000,image_size=384)
+	# ResNeSt50
+	# model = resnest50(pretrained=False)
+	# model.load_state_dict(torch.load('./model_zoo/pytorch_resnest/resnest50_v1.pth'))
+    # Swin Tranformer
+    # model = get_swin(ckpt='./model_zoo/swin/swin_large_patch4_window12_384_22kto1k.pth')
+    if args.model_type == "RESNEST269":
+        model = resnest269(pretrained=True)
+    elif args.model_type == "SWIN":
+        model = get_swin(ckpt='./model_zoo/swin/swin_large_patch4_window12_384_22kto1k.pth')
+    elif args.model_type == "SWIN_BBN":
+        model = get_swin_bbn(ckpt='./model_zoo/swin/swin_large_patch4_window12_384_22kto1k.pth')
+        model = BNNetwork(backbone_model=model,num_classes=1000,mode="swin") # Support swin/ResNet/ViT
+    else:
+        print("Wrong Model type QQ")
+        assert(False)
+
+    if args.load:
+        model.load_state_dict(torch.load(args.load))
+        print("model loaded from {}".format(args.load))
+
+    optimizer = torch.optim.AdamW(model.parameters(), lr=args.lr) #,weight_decay=0.01
+    criterion =  torch.nn.CrossEntropyLoss()
+    lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=2)
+
+    ##############
+    # Trainer
+    ##############
+    trainer = BaseTrainer(
+                 device = device, 
+                 model = model,
+                 optimizer = optimizer,
+                 scheduler = None,
+                 MAX_EPOCH = args.max_epoch,
+                 criterion = criterion,
+                 train_loader = train_loader,
+                 val_loader = val_loader,
+                 model_path = args.model_path,
+                 lr = args.lr,
+                 batch_size = args.batch_size, 
+                 gradaccum_size = args.gradaccum_size, 
+                 save_period = 10)
+    trainer.train()
+
diff --git a/final-project/train_template_BBN.py b/final-project/train_template_BBN.py
new file mode 100644
index 0000000..5eec577
--- /dev/null
+++ b/final-project/train_template_BBN.py
@@ -0,0 +1,140 @@
+from logging import Logger
+import os
+import argparse
+import torch
+import torch.optim as optim
+import warnings
+# our module 
+#from model_zoo import vgg16
+#from  model_zoo.pytorch_pretrained_vit import ViT
+#from model_zoo.swin.swin_transformer import get_swin
+from model_zoo.swin.swin_transformer_bbn import get_swin_bbn
+#from model_zoo.BBN.resnet import bbn_res50
+from model_zoo.BBN.network import BNNetwork
+from model_zoo.BBN.combiner import Combiner
+from base.trainer import BaseTrainer,BBNTrainer
+from base.dataset import FoodLTDataLoader,FoodDataset,ChunkSampler,P1_Dataset
+from base.loss import LDAMLoss
+from util import *
+if __name__=='__main__':
+	parser = argparse.ArgumentParser()
+	# training related argument
+	parser.add_argument("-cont", "--cont",action="store_true", help='')
+	parser.add_argument("-lr", "--lr", default=1e-5,type=float , help='')
+	parser.add_argument("-period", "--period", default=1,type=int , help='')
+	parser.add_argument("-batch_size", "--batch_size", default=4,type=int , help='')
+	parser.add_argument("-gradaccum_size", "--gradaccum_size", default=20,type=int , help='')
+	parser.add_argument("-load", "--load",default="",type=str , help='')
+	parser.add_argument("-model_path", "--model_path",default="BBN_RESNET_UNIFORM",type=str , help='')
+	parser.add_argument("-max_epoch", "--max_epoch",default=10,type=int, help='')
+	# data related argument
+	parser.add_argument("-img_size", "--img_size", default=384,type=int , help='')
+	parser.add_argument("-train_data_dir","--train_data_dir", default = "food_data",type=str, help ="Training images directory")
+	parser.add_argument("-val_data_dir","--val_data_dir", default = "",type=str, help ="Validation images directory")
+	# experiment argument
+	parser.add_argument("-CL_FLAG","--CL_FLAG", default = "UNIFORM",type=str, help ="Conventional Learning Branch") #BALANCED/UNIFORM
+	args = parser.parse_args()
+	#######################
+	# Environment setting
+	#######################
+	device = model_setting()
+	fix_seeds(87)
+	os.makedirs(args.model_path, exist_ok=True)
+	##############
+	# Dataset
+	##############	
+	'''
+	train_dataset = P1_Dataset("hw1_data/train_50",img_size=args.img_size,val_mode=False)
+	val_dataset = P1_Dataset("hw1_data/val_50",img_size=args.img_size,val_mode=True)
+	#train_dataset = FoodDataset(args.train_data_dir,img_size=args.img_size,mode = "train")
+	train_loader = torch.utils.data.DataLoader(train_dataset,
+												batch_size=args.batch_size,
+												shuffle=False,
+												num_workers=8,
+												sampler=ChunkSampler(1024, 512))
+	
+	#val_dataset = FoodDataset(args.val_data_dir,img_size=args.img_size,mode = "val")
+	val_loader = torch.utils.data.DataLoader(val_dataset,
+												batch_size=args.batch_size,
+												shuffle=False,
+												num_workers=8,
+												sampler=ChunkSampler(512, 0))
+	train_loader_reverse = val_loader	
+
+	'''
+	train_loader = FoodLTDataLoader(data_dir=args.train_data_dir,
+					 img_size=args.img_size,
+					 batch_size=args.batch_size,
+					 shuffle=True,
+					 num_workers=8,
+					 training=True, 
+					 balanced= (args.CL_FLAG != "UNIFORM"),
+					 reversed= False,
+					 retain_epoch_size=True)
+	train_loader_reverse = FoodLTDataLoader(data_dir=args.train_data_dir,
+					 img_size=args.img_size,
+					 batch_size=args.batch_size,
+					 shuffle=True,
+					 num_workers=8,
+					 training=True, 
+					 balanced= False,
+					 reversed= True,
+					 retain_epoch_size=True)
+	val_loader = train_loader.split_validation()
+
+	##############
+	# Model
+	##############
+
+	# TODO define ours' model,schedular
+	#model = ViT(model_name, pretrained=True,num_classes=1000,image_size=384)
+	# ResNeSt50
+	#model = resnest50(pretrained=False)
+	#model.load_state_dict(torch.load('./model_zoo/pytorch_resnest/model_best.pth'))
+	# BBN + ResNet50
+	#model = bbn_res50(
+	#		cfg = None,
+	#		pretrain=True,
+	#		pretrained_model="/home/r09021/DLCV_110FALL/final-project-challenge-3-no_qq_no_life/model_zoo/BBN/resnet50-19c8e357.pth",
+	#		last_layer_stride=2
+	#)
+	#
+	# Swin Tranformer +BBN
+	model = get_swin_bbn(ckpt=args.load)
+	model = BNNetwork(backbone_model=model,num_classes=1000,mode="swin") # Support swin/ResNet/ViT
+	combiner = Combiner(MaxEpoch=args.max_epoch,
+                     	model_type = model._get_name,
+                        device = device)	
+	#if args.load:
+	#	model.load_state_dict(torch.load(args.load))
+	#	print("model loaded from {}".format(args.load))
+	
+	optimizer = torch.optim.AdamW(model.parameters(), lr=args.lr) #,weight_decay=0.01
+	criterion = torch.nn.CrossEntropyLoss()
+	lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=2)
+
+	##############
+	# Trainer
+	##############
+	trainer = BBNTrainer(
+				 device = device, 
+				 model = model,
+				 combiner= combiner,
+				 optimizer = optimizer,
+				 scheduler = None,
+				 MAX_EPOCH = args.max_epoch,
+				 criterion = criterion,
+				 train_loader = train_loader,
+				 train_loader_reverse = train_loader_reverse,
+				 val_loader = val_loader,
+				 model_path = args.model_path,
+				 lr = args.lr,
+				 batch_size = args.batch_size, 
+				 gradaccum_size = args.gradaccum_size, 
+				 save_period = args.period)
+			
+	trainer.train()
+	#trainer._valid(epoch=0)
+	#trainer._valid_separate(epoch=0,branch=0)
+	#trainer._valid_separate(epoch=0,branch=1)
+
diff --git a/final-project/train_template_LT.py b/final-project/train_template_LT.py
new file mode 100644
index 0000000..b4e1e22
--- /dev/null
+++ b/final-project/train_template_LT.py
@@ -0,0 +1,114 @@
+import os
+import argparse
+import torch
+import torch.optim as optim
+import warnings
+# our module 
+#from model_zoo import vgg16
+#from model_zoo.pytorch_pretrained_vit import ViT
+from model_zoo.swin.swin_transformer import get_swin
+from model_zoo.pytorch_resnest.resnest.torch import resnest269
+from base.trainer import BaseTrainer
+from base.dataset import FoodLTDataLoader,FoodDataset,ChunkSampler,P1_Dataset
+from base.loss import LDAMLoss
+from util import *
+if __name__=='__main__':
+	parser = argparse.ArgumentParser()
+	# training related argument
+	parser.add_argument("-cont", "--cont",action="store_true", help='')
+	parser.add_argument("-lr", "--lr", default=1e-5,type=float , help='')
+	parser.add_argument("-period", "--period", default=20,type=int , help='')
+	parser.add_argument("-batch_size", "--batch_size", default=64,type=int , help='')
+	parser.add_argument("-gradaccum_size", "--gradaccum_size", default=1,type=int , help='')
+	parser.add_argument("-load", "--load",default="",type=str , help='')
+	parser.add_argument("-model_path", "--model_path",default="resenet_RESAMPLE",type=str , help='')
+	parser.add_argument("-param_fix", "--param_fix",default="",type=str , help='')
+	parser.add_argument("-model_type", "--model_type",default="RESNEST269",type=str , help='')
+	parser.add_argument("-max_epoch", "--max_epoch",default=50,type=int, help='')
+	# data related argument
+	parser.add_argument("-img_size", "--img_size", default=224,type=int , help='')
+	parser.add_argument("-train_data_dir","--train_data_dir", default = "food_data",type=str, help ="Training images directory")
+	parser.add_argument("-val_data_dir","--val_data_dir", default = "",type=str, help ="Validation images directory")
+	# experiment related flags
+	parser.add_argument("-LT_EXP", "--LT_EXP",default="LDAM",type=str , help='')
+	args = parser.parse_args()
+	#######################
+	# Environment setting
+	#######################
+	device = model_setting()
+	fix_seeds(87)
+	os.makedirs(args.model_path, exist_ok=True)
+	##############
+	# Dataset
+	##############
+	train_loader = FoodLTDataLoader(data_dir=args.train_data_dir,
+					 img_size=args.img_size,
+					 batch_size=args.batch_size,
+					 shuffle=True,
+					 num_workers=8,
+					 training=True, 
+					 balanced= (args.LT_EXP == "RESAMPLE"),
+					 reversed= (args.LT_EXP == "REVERSE"),
+					 retain_epoch_size=True)
+	val_loader = train_loader.split_validation()
+	##############
+	# Model
+	##############
+
+	# TODO define ours' model,schedular
+	# model = ViT(model_name, pretrained=True,num_classes=1000,image_size=384)
+	# ResNeSt50
+	# model = resnest50(pretrained=False)
+	# model.load_state_dict(torch.load('./model_zoo/pytorch_resnest/resnest50_v1.pth'))
+	# Swin Tranformer
+	# model = get_swin(ckpt='./model_zoo/swin/swin_large_patch4_window12_384_22kto1k.pth')
+	if args.model_type == "RESNEST269":
+		model = resnest269(pretrained=False)
+	elif args.model_type == "SWIN":
+		model = get_swin(ckpt='./model_zoo/swin/swin_large_patch4_window12_384_22kto1k.pth')
+		if args.param_fix == "MLP":
+			for name, param in model.named_parameters():
+				if not name.startswith('head'):
+					param.requires_grad = False # Fix Feat
+	elif args.model_type == "SWIN_BBN":
+		model = get_swin_bbn(ckpt='./model_zoo/swin/swin_large_patch4_window12_384_22kto1k.pth')
+		model = BNNetwork(backbone_model=model,num_classes=1000,mode="swin") # Support swin/ResNet/ViT
+	else:
+		print("Wrong Model type QQ")
+		assert(False)
+
+	if args.load:
+		model.load_state_dict(torch.load(args.load))
+		print("model loaded from {}".format(args.load))
+	
+	optimizer = torch.optim.AdamW(model.parameters(), lr=args.lr) #,weight_decay=0.01
+
+	if args.LT_EXP == "LDAM":
+		REWEIGHT =  LDAMLoss(cls_num_list=train_loader.cls_num_list).to(device=device)
+		criterion = REWEIGHT.forward
+	elif args.LT_EXP in ["RESAMPLE", "REVERSE"]:
+		criterion =  torch.nn.CrossEntropyLoss()
+	else:
+		print("Wrong Flag QQ")
+		assert(False)
+	lr_scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=2)
+
+	##############
+	# Trainer
+	##############
+	trainer = BaseTrainer(
+				 device = device, 
+				 model = model,
+				 optimizer = optimizer,
+				 scheduler = None,
+				 MAX_EPOCH = args.max_epoch,
+				 criterion = criterion,
+				 train_loader = train_loader,
+				 val_loader = val_loader,
+				 model_path = args.model_path,
+				 lr = args.lr,
+				 batch_size = args.batch_size, 
+				 gradaccum_size = args.gradaccum_size, 
+				 save_period = 10)
+	trainer.train()
+
diff --git a/final-project/tsne.py b/final-project/tsne.py
new file mode 100644
index 0000000..eff10be
--- /dev/null
+++ b/final-project/tsne.py
@@ -0,0 +1,199 @@
+######################################################################################################
+# compute the inter output (activation layer) of specific labels (num of samples, feature dim)
+# compute the tsne, output the projected vectors X_emb: (num of sample, 2), label: (num of sample,)
+# draw
+######################################################################################################
+import os
+import torch
+import argparse
+from sklearn.manifold import TSNE
+from tqdm import tqdm
+import numpy as np
+import pandas as pd
+import matplotlib.pyplot as plt
+import matplotlib as mpl
+# mpl.rcParams['font.sans-serif'] = ['SimHei']
+# mpl.rcParams['font.serif'] = ['SimHei']
+plt.rcParams['font.sans-serif'] = ['SimHei']
+import seaborn as sns
+sns.set_style("darkgrid",{"font.sans-serif":['simhei', 'Arial']})
+
+from model_zoo.swin.swin_transformer_vis import get_swin
+from base_vis.dataset import FoodDataset,ChunkSampler,P1_Dataset
+from util import *
+
+def compute_embedding(val_loader, model, args):
+    
+    activation = {}
+    def get_activation(name):
+        def hook(model, input, output):
+            activation[name] = output.detach()
+        return hook
+
+    model.layers[3].blocks[1].mlp.fc1.register_forward_hook(get_activation('layers[3].blocks.mlp.fc1'))
+    inter_outputs = []
+    labels = []
+    predicts = []
+    print('Computing activation...')
+    model.eval()
+    with torch.no_grad():
+      for data, label in tqdm(val_loader):
+        data = data.to(device)
+        output, _ = model(data)
+        _,pred_label=torch.max(output,1)
+        inter_output = activation['layers[3].blocks.mlp.fc1']
+        inter_output = inter_output.view(data.shape[0], -1).detach().cpu().numpy().astype('float64')
+        inter_outputs.append(inter_output)
+        predicts.append(pred_label.view(-1, 1).cpu().numpy())
+        labels.append(label.view(-1, 1).numpy())
+
+    inter_outputs = np.vstack(inter_outputs)
+    labels = np.vstack(labels).squeeze(-1)
+    predicts = np.vstack(predicts).squeeze(-1)
+
+    np.save(os.path.join(args.model_path, 'inter_output_5class.npy'), inter_outputs)
+    np.save(os.path.join(args.model_path, 'label_5class.npy'), labels)
+    np.save(os.path.join(args.model_path, 'predict_5class.npy'), predicts)
+
+    print(inter_outputs.shape, labels.shape)
+
+    print('Computing TSNE...')
+    X_embedded = TSNE(n_components=2, learning_rate='auto', init='random').fit_transform(inter_outputs)
+    print(X_embedded.shape)
+    np.save(os.path.join(args.model_path, 'X_embedded_5class.npy'), X_embedded)
+
+def plot_tsne(args, class_list):
+    fin = open('../final-project-challenge-3-no_qq_no_life/food_data/label2name.txt', encoding='utf8')
+    lines = fin.readlines()
+    fin.close()
+    id2name = {}
+    for line in lines:
+        label, freq, name = line.split()
+        if freq == 'f':
+            _freq = 'frequent'
+        elif freq == 'c':
+            _freq = 'common'
+        else:
+            _freq = 'rare'
+        id2name[int(label)] = (_freq, name)
+
+    name_order = []
+    for c in class_list:
+        name_order.append(id2name[c][1])
+    name_order.append('other')
+    name_order.append('dumb')
+
+    label = np.load(os.path.join(args.model_path, 'label_5class.npy'))
+    names_label = []
+    freqs_label = []
+    for l in label:
+        names_label.append(id2name[l][1])
+        freqs_label.append(id2name[l][0])
+
+
+    predict = np.load(os.path.join(args.model_path, 'predict_5class.npy'))
+    names_pred = []
+    freqs_pred = []
+    for l in predict:
+        if id2name[l][1] not in name_order:
+            names_pred.append('other')
+        else:
+            names_pred.append(id2name[l][1])
+        freqs_pred.append(id2name[l][0])
+
+    X_embedded = np.load(os.path.join(args.model_path, 'X_embedded_5class.npy'))
+    print('Plotting label...')
+    df = pd.DataFrame(zip(list(X_embedded[:, 0]), list(X_embedded[:, 1]), list(label), names_label, freqs_label), columns=['x', 'y', 'label', 'name', 'frequency'])
+    # df = truncate(df, raw_class_list, num_per_class)
+    
+    palette = sns.color_palette("tab20", 12)  #Choosing color
+    palette = dict(zip(name_order, palette))
+    
+    g = sns.scatterplot(data=df, x="x", y="y", hue='name', style='frequency', palette=palette, legend='full')
+    g.legend(loc='center left', bbox_to_anchor=(1, 0.5))
+    # figure = g.get_figure()    
+    # figure.savefig(os.path.join(args.model_path, 'tsne_raw&confuse.png'), bbox_inches='tight')
+    ax = plt.gca()
+    plt.xlim([-22, 22])
+    plt.ylim([-25, 25])
+    ax.grid(True)
+    ax.set_xticklabels([])
+    ax.set_yticklabels([])
+    # plt.axis('off')
+    plt.savefig(os.path.join(args.model_path, 'tsne_raw&confuse_50_label_5class.png'), bbox_inches='tight')
+
+    plt.clf()
+    plt.cla()
+
+    print('Plotting predict...')
+    palette = sns.color_palette("tab20", 20)  #Choosing color
+    palette = dict(zip(name_order, palette))
+    df = pd.DataFrame(zip(list(X_embedded[:, 0]), list(X_embedded[:, 1]), list(predict), names_pred, freqs_pred), columns=['x', 'y', 'label', 'name', 'frequency'])
+    # df = truncate(df, raw_class_list, num_per_class)
+    
+    g = sns.scatterplot(data=df, x="x", y="y", hue='name', style='frequency', palette=palette, legend='full')
+    g.legend(loc='center left', bbox_to_anchor=(1, 0.5))
+    # figure = g.get_figure()    
+    # figure.savefig(os.path.join(args.model_path, 'tsne_raw&confuse.png'), bbox_inches='tight')
+    ax = plt.gca()
+    plt.xlim([-22, 22])
+    plt.ylim([-25, 25])
+    ax.grid(True)
+    ax.set_xticklabels([])
+    ax.set_yticklabels([])
+    plt.savefig(os.path.join(args.model_path, 'tsne_raw&confuse_50_pred_5class.png'), bbox_inches='tight')
+
+
+
+def truncate(df, raw_class_list, num_per_class):
+    df_truncate = pd.DataFrame(columns=['x', 'y', 'label'])
+
+    for c in raw_class_list:
+        df_temp = df.loc[df['label']==c]
+        if len(df_temp) > num_per_class:
+            df_temp = df_temp.iloc[:num_per_class]
+        df_truncate = df_truncate.append(df_temp)
+    print('Original df len: {}, truncated df len: {}'.format(len(df), len(df_truncate)))
+    return df_truncate
+if __name__ == '__main__':
+    # print(model)
+    # layers[3].blocks.mlp.fc1
+    parser = argparse.ArgumentParser()
+    parser.add_argument("-load", "--load",default='',type=str , help='')
+    parser.add_argument("-model_path", "--model_path",default="baseline",type=str , help='')
+    
+    parser.add_argument("-img_size", "--img_size", default=384,type=int , help='')
+    parser.add_argument("-batch_size", "--batch_size", default=4,type=int , help='')
+    parser.add_argument("-val_data_dir","--val_data_dir", default = "../final-project-challenge-3-no_qq_no_life/food_data/val",type=str, help ="Validation images directory")
+    args = parser.parse_args()
+
+    device = model_setting()
+    fix_seeds(87)
+
+    raw_class_list = [558, 925, 945, 827, 880, 800, 929, 633, 515, 326][:5]
+    confuse_class_list = [610, 294, 485, 866, 88, 759, 809, 297, 936, 33][:5]
+    num_per_class = 50
+
+    class_list = []
+    for r, c in zip(raw_class_list, confuse_class_list):
+        class_list.append(r)
+        class_list.append(c)
+
+    val_dataset = FoodDataset(args.val_data_dir,img_size=args.img_size,mode = "val", class_list=class_list, num_per_class=num_per_class)
+    val_loader = torch.utils.data.DataLoader(val_dataset,
+                                                batch_size=args.batch_size,
+                                                shuffle=False,
+                                                num_workers=8)
+
+    model = get_swin(ckpt='./model_zoo/swin/swin_large_patch4_window12_384_22kto1k.pth')
+    # print(model)
+    if args.load:
+        model.load_state_dict(torch.load(args.load))
+        print("model loaded from {}".format(args.load))
+    model.to(device)
+
+    # compute_embedding(val_loader, model, args)
+    plot_tsne(args, class_list)
+
+    
+
diff --git a/final-project/util.py b/final-project/util.py
new file mode 100644
index 0000000..53ff0d3
--- /dev/null
+++ b/final-project/util.py
@@ -0,0 +1,46 @@
+import logging
+import time
+from datetime import datetime
+import torch
+import random
+import numpy as np
+def fix_seeds(seed):
+    torch.manual_seed(seed)
+    if torch.cuda.is_available():
+        torch.cuda.manual_seed(seed)
+        torch.cuda.manual_seed_all(seed)  # if you are using multi-GPU.
+    np.random.seed(seed)  # Numpy module.
+    random.seed(seed)  # Python random module.
+    torch.backends.cudnn.benchmark = False  # For pytorhc reproducible
+    torch.backends.cudnn.deterministic = True
+def model_setting():
+	# model setting
+	use_cuda=torch.cuda.is_available()
+	if use_cuda:
+		device=torch.device('cuda')
+	else :
+		device=torch.device('cpu' )
+	return device
+def gen_logger(log_file_name):
+	# set up logging to file - see previous section for more details
+	logging.basicConfig(level=logging.DEBUG,
+						format='%(asctime)s %(name)-12s %(levelname)-8s %(message)s',
+						datefmt='%m-%d %H:%M',
+						filename=log_file_name,
+						filemode='w')
+	# define a Handler which writes INFO messages or higher to the sys.stderr
+	console = logging.StreamHandler()
+	console.setLevel(logging.INFO)
+	# set a format which is simpler for console use
+	formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s')
+	# tell the handler to use this format
+	console.setFormatter(formatter)
+	# add the handler to the root logger
+	logging.getLogger().addHandler(console)
+	logger = logging.getLogger(str(datetime.now().strftime("%Y:%H:%M:%S")))
+	# quiet PIL log
+	logging.getLogger('PIL').setLevel(logging.WARNING)
+	return logger
+if __name__ == "__main__":
+	L = gen_logger("./test.log")
+	L.info("{} QQ is my boss QQ".format(87))
diff --git a/final-project/valid_TTA.sh b/final-project/valid_TTA.sh
new file mode 100644
index 0000000..e648c98
--- /dev/null
+++ b/final-project/valid_TTA.sh
@@ -0,0 +1,10 @@
+#ResNest269
+#python3 test_template_TTA.py --img_size 320 --batch_size 8 --mode VALID --model_path models --load models/33.pth --valid_mat models/33_valid_tta.mat --kaggle_csv_log models/33_valid_tta.log 
+
+#Swin
+#python3 test_template_TTA.py --img_size 384 --batch_size 8 --mode VALID --model_path models --load models/40.pth --valid_mat models/40_valid_tta.mat --kaggle_csv_log models/40_valid_tta.log
+
+#python3 test_template_TTA.py --img_size 384 --batch_size 8 --mode VALID --model_path models --load models/22.pth --valid_mat models/22_valid_tta.mat --kaggle_csv_log models/22_valid_tta.log 
+
+#Swin BBN
+#python3 test_template_TTA.py --img_size 384 --batch_size 8 --mode VALID --model_path models --load models/38.pth --valid_mat models/38_valid_tta.mat --kaggle_csv_log models/38_valid_tta.log
\ No newline at end of file

Method	Backbone	mAP%	download
Faster R-CNN	ResNet-50	39.25	config \| model \| log
	ResNet-101	41.37	config \| model \| log
	ResNeSt-50 (ours)	42.33	config \| model \| log
	ResNeSt-50-DCNv2 (ours)	44.11	config \| model \| log
	ResNeSt-101 (ours)	44.72	config \| model \| log
Cascade R-CNN	ResNet-50	42.52	config \| model \| log
	ResNet-101	44.03	config \| model \| log
	ResNeSt-50 (ours)	45.41	config \| model \| log
	ResNeSt-101 (ours)	47.50	config \| model \| log
	ResNeSt-200 (ours)	49.03	config \| model \| log
Method	Backbone	bbox	mask	download
Mask R-CNN	ResNet-50	39.97	36.05	config \| model \| log
	ResNet-101	41.78	37.51	config \| model \| log
	ResNeSt-50 (ours)	42.81	38.14	config \| model \| log
	ResNeSt-101 (ours)	45.75	40.65	config \| model \| log
Cascade R-CNN	ResNet-50	43.06	37.19	config \| model \| log
	ResNet-101	44.79	38.52	config \| model \| log
	ResNeSt-50 (ours)	46.19	39.55	config \| model \| log
	ResNeSt-101 (ours)	48.30	41.56	config \| model \| log
	ResNeSt-200-tricks-3x (ours)	50.54	44.21	config \| model \| log
	ResNeSt-200-dcn-tricks-3x (ours)	50.91	44.50	config \| model \| log
	ResNeSt-200-dcn-tricks-3x (ours)	53.30*	47.10*	config \| model \| log