diff --git a/README.md b/README.md
index d47b7c8f..b0064510 100644
--- a/README.md
+++ b/README.md
@@ -8,11 +8,23 @@ The pre-trained models of backbone networks can be found here:
 - [SE-ResNet50](https://github.com/HiKapok/TF_Se_ResNe_t)
 - [SE-ResNeXt50](https://github.com/HiKapok/TF_Se_ResNe_t)
 
+## Introduction
+
 The main goal of this competition is to detect the keypoints of the clothes' image colleted from Alibaba's e-commerce platforms. There are tens of thousands images in total five categories: blouse, outwear, trousers, skirt, dress. The keypoints for each category is defined as follows.
 
 ![](demos/outline.jpg "The Keypoints for Each Category")
 
-All the codes was writen by myself and tested under TensorFlow 1.6, Python 3.5, Ubuntu 16.04. I tried to use the latest possible TensorFlow's best practice paradigm, like [tf.estimator](https://www.tensorflow.org/api_docs/python/tf/estimator) and [tf.layers](https://www.tensorflow.org/api_docs/python/tf/layers). Almost none py_func was used in my codes to maximize the performance. Augumentations like flip, rotate, random crop, color distort were used to reduce overfit. The current performance of the model is ~0.4% in Normalized Error and got to ~20th-place in the second stage of the competition. 
+Almost all the codes was writen by myself and tested under TensorFlow 1.6, Python 3.5, Ubuntu 16.04. I tried to use the latest possible TensorFlow's best practice paradigm, like [tf.estimator](https://www.tensorflow.org/api_docs/python/tf/estimator) and [tf.layers](https://www.tensorflow.org/api_docs/python/tf/layers). Almost none py_func was used in my codes to maximize the performance. Augumentations like flip, rotate, random crop, color distort were used to reduce overfitting. The current performance of the model is ~0.4% in Normalized Error and got to ~20th-place in the second stage of the competition. 
+
+About the model:
+
+- DetNet is better, perform almost the same as SEResNeXt, while SEResNet showed little improvement than ResNet
+- Enforce the loss of invisible keypoints to zero gave better performance
+- OHKM is useful
+- It's bad to do gaussian blur on the predicted heatmap, but it's better to do gaussian blur on the target heatmaps for lower-level prediction
+- Ensemble of the heatmaps for fliped images is worser than emsemble of the predictions of fliped images, and do one quarter correction is also useful
+- Do cascaded prediction on whole network can eliminate the using of clothes detection network as well as larger input image
+- The native hourglass model was the worst but still have great potential, see the top solution of [here](http://human-pose.mpi-inf.mpg.de/#results)
 
 There are still other ways to further improve the performance but I didn't try those in this competition because of their limitations in applications, for example:
 
@@ -25,10 +37,56 @@ There are still other ways to further improve the performance but I didn't try t
    
 If you find it's useful to your research or competitions, any contribution or star to this repo is welcomed.
 
-By the way, I'm looking for one computer vision related job recently. I'm very looking forward to your contact if you are interested in.
+## Usage
+- Download [fashionAI Dataset](https://tianchi.aliyun.com/competition/information.htm?spm=5176.11165261.5678.2.34b72ec5iFguTn&raceId=231648&_lang=en_US) and reorganize the directory as follows:
+	```
+	DATA_DIR/
+		   |->train_0/
+		   |    |->Annotations/
+		   |    |    |->annotations.csv
+		   |    |->Images/
+		   |    |    |->blouse
+		   |    |    |->...
+		   |->train_1/
+		   |    |->Annotations/
+		   |    |    |->annotations.csv
+		   |    |->Images/
+		   |    |    |->blouse
+		   |    |    |->...
+		   |->...
+		   |->test_0/
+		   |    |->test.csv
+		   |    |->Images/
+		   |    |    |->blouse
+		   |    |    |->...
+	```
+	DATA_DIR is your root path of the fashionAI Dataset. 
+	- train_0 -> [update] warm_up_train_20180222.tar
+	- train_1 -> fashionAI_key_points_train_20180227.tar.gz
+	- train_2 -> fashionAI_key_points_test_a_20180227.tar
+	- train_3 -> fashionAI_key_points_test_b_20180418.tgz
+	- test_0  -> round2_fashionAI_key_points_test_a_20180426.tar
+	- test_1  -> round2_fashionAI_key_points_test_b_20180601.tar
+
+- set your local dataset path in [config.py](https://github.com/HiKapok/tf.fashionAI/blob/e90c5b0072338fa638c56ae788f7146d3f36cb1f/config.py#L20)
+- create one file foler named 'model' under the root path of your codes, download all the pre-trained weights of the backbone networks and put them into different sub-folders named 'resnet50', 'seresnet50' and 'seresnext50'. Then start training(set RECORDS_DATA_DIR and TEST_RECORDS_DATA_DIR according to your [config.py](https://github.com/HiKapok/tf.fashionAI/blob/e90c5b0072338fa638c56ae788f7146d3f36cb1f/config.py#L20)):
+    ```sh
+	python train_detxt_cpn_onebyone.py --run_on_cloud=False --data_dir=RECORDS_DATA_DIR
+	python eval_all_cpn_onepass.py --run_on_cloud=False --backbone=detnext50_cpn --data_dir=TEST_RECORDS_DATA_DIR
+	```
+	Submit the generated 'detnext50_cpn_sub.csv' will give you ~0.0427
+	```sh
+	python train_senet_cpn_onebyone.py --run_on_cloud=False --data_dir=RECORDS_DATA_DIR
+	python eval_all_cpn_onepass.py --run_on_cloud=False --backbone=seresnext50_cpn --data_dir=TEST_RECORDS_DATA_DIR
+	```
+	Submit the generated 'seresnext50_cpn_sub.csv' will give you ~0.0424
+
+	Copy both 'detnext50_cpn_sub.csv' and 'seresnext50_cpn_sub.csv' to a new folder and modify the path and filename in [ensemble_from_csv.py](https://github.com/HiKapok/tf.fashionAI/blob/e90c5b0072338fa638c56ae788f7146d3f36cb1f/ensemble_from_csv.py#L27), then run 'python ensemble_from_csv.py' and submit the generated 'ensmeble.csv' will give you ~0.0407.
+- training more deeper backbone networks will give better results (+0.001).
+- the training of hourglass model is almost the same as above but gave inferior performance
 
 ## ##
-Some Detection Results:
+Some Detection Results (satge one):
 
 - Cascaded Pyramid Network:
   
diff --git a/ensemble_from_csv.py b/ensemble_from_csv.py
index b003dc99..a44a001e 100644
--- a/ensemble_from_csv.py
+++ b/ensemble_from_csv.py
@@ -32,7 +32,7 @@
 # 'sub_2_hg_4_256_64-half_epoch.csv',
 # 'sub_2_hg_8_256_64_v1-half_epoch.csv']#['cpn_2_320_160_1e-3.csv', 'sub_2_hg_4_256_64.csv', 'sub_2_cpn_320_100_1e-3.csv', 'sub_2_hg_8_256_64.csv']
 
-ensemble_subs = ['sext_cpn_flip.csv', 'detxt_cpn_flip.csv']
+ensemble_subs = ['large_seresnext_cpn_sub.csv', 'large_detnext_cpn_sub.csv']
 
 
 def parse_comma_list(args):
diff --git a/eval_all_cpn_onepass.py b/eval_all_cpn_onepass.py
index a3eec0d9..8ed922fe 100644
--- a/eval_all_cpn_onepass.py
+++ b/eval_all_cpn_onepass.py
@@ -112,9 +112,9 @@
   'seresnet50_cpn': {'backbone': seresnet_cpn.cascaded_pyramid_net, 'logs_sub_dir': 'logs_se_cpn'},
   'seresnext50_cpn': {'backbone': seresnet_cpn.xt_cascaded_pyramid_net, 'logs_sub_dir': 'logs_sext_cpn'},
   'detnext50_cpn': {'backbone': detxt_cpn.cascaded_pyramid_net, 'logs_sub_dir': 'logs_detxt_cpn'},
-  'large_seresnext_cpn': {'backbone': lambda inputs, output_channals, heatmap_size, istraining, data_format : seresnet_cpn.xt_cascaded_pyramid_net(inputs, output_channals, heatmap_size, istraining, data_format, net_depth=50),
+  'large_seresnext_cpn': {'backbone': lambda inputs, output_channals, heatmap_size, istraining, data_format : seresnet_cpn.xt_cascaded_pyramid_net(inputs, output_channals, heatmap_size, istraining, data_format, net_depth=101),
                         'logs_sub_dir': 'logs_large_sext_cpn'},
-  'large_detnext_cpn': {'backbone': lambda inputs, output_channals, heatmap_size, istraining, data_format : detxt_cpn.cascaded_pyramid_net(inputs, output_channals, heatmap_size, istraining, data_format, net_depth=50),
+  'large_detnext_cpn': {'backbone': lambda inputs, output_channals, heatmap_size, istraining, data_format : detxt_cpn.cascaded_pyramid_net(inputs, output_channals, heatmap_size, istraining, data_format, net_depth=101),
                         'logs_sub_dir': 'logs_large_detxt_cpn'},
   'head_seresnext50_cpn': {'backbone': seresnet_cpn.head_xt_cascaded_pyramid_net, 'logs_sub_dir': 'logs_head_sext_cpn'},
 }
@@ -164,7 +164,7 @@ def save_image_with_heatmap(image, height, width, heatmap_size, heatmap, predict
         imsave(os.path.join(config.EVAL_DEBUG_DIR, file_name), img.astype(np.uint8))
       return save_image_with_heatmap.counter
 
-def get_keypoint(image, predictions, heatmap_size, height, width, category, clip_at_zero=True, data_format='channels_last', name=None):
+def get_keypoint(image, predictions, heatmap_size, height, width, category, clip_at_zero=False, data_format='channels_last', name=None):
     # expand_border = 10
     # pad_pred = tf.pad(predictions, tf.constant([[0, 0], [0, 0], [expand_border, expand_border], [expand_border, expand_border]]),
     #               mode='CONSTANT', name='pred_padding', constant_values=0)
@@ -242,7 +242,7 @@ def keypoint_model_fn(features, labels, mode, params):
         if params['data_format'] == 'channels_last':
             pred_outputs = [tf.transpose(pred_outputs[ind], [0, 3, 1, 2], name='outputs_trans_{}'.format(ind)) for ind in list(range(len(pred_outputs)))]
 
-        pred_x_first_stage, pred_y_first_stage = get_keypoint(image, pred_outputs[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=True, data_format=params['data_format'])
+        pred_x_first_stage, pred_y_first_stage = get_keypoint(image, pred_outputs[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=False, data_format=params['data_format'])
     else:
         # test augumentation on the fly
         if params['data_format'] == 'channels_last':
@@ -270,8 +270,8 @@ def cond_flip(heatmap_ind):
         pred_outputs = [tf.split(_, 2) for _ in pred_outputs]
         pred_outputs_1 = [_[0] for _ in pred_outputs]
         pred_outputs_2 = [_[1] for _ in pred_outputs]
-        pred_x_first_stage1, pred_y_first_stage1 = get_keypoint(image, pred_outputs_1[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=True, data_format=params['data_format'])
-        pred_x_first_stage2, pred_y_first_stage2 = get_keypoint(image, pred_outputs_2[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=True, data_format=params['data_format'])
+        pred_x_first_stage1, pred_y_first_stage1 = get_keypoint(image, pred_outputs_1[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=False, data_format=params['data_format'])
+        pred_x_first_stage2, pred_y_first_stage2 = get_keypoint(image, pred_outputs_2[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=False, data_format=params['data_format'])
 
         dist = tf.pow(tf.pow(pred_x_first_stage1 - pred_x_first_stage2, 2.) + tf.pow(pred_y_first_stage1 - pred_y_first_stage2, 2.), .5)
 
@@ -318,7 +318,7 @@ def cond_flip(heatmap_ind):
             if params['data_format'] == 'channels_last':
                 pred_outputs = [tf.transpose(pred_outputs[ind], [0, 3, 1, 2], name='outputs_trans_{}'.format(ind)) for ind in list(range(len(pred_outputs)))]
 
-            pred_x, pred_y = get_keypoint(image, pred_outputs[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=True, data_format=params['data_format'])
+            pred_x, pred_y = get_keypoint(image, pred_outputs[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=False, data_format=params['data_format'])
     else:
         # test augumentation on the fly
         with tf.name_scope("refine_prediction"):
@@ -347,8 +347,9 @@ def cond_flip(heatmap_ind):
             pred_outputs = [tf.split(_, 2) for _ in pred_outputs]
             pred_outputs_1 = [_[0] for _ in pred_outputs]
             pred_outputs_2 = [_[1] for _ in pred_outputs]
-            pred_x_first_stage1, pred_y_first_stage1 = get_keypoint(image, pred_outputs_1[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=True, data_format=params['data_format'])
-            pred_x_first_stage2, pred_y_first_stage2 = get_keypoint(image, pred_outputs_2[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=True, data_format=params['data_format'])
+            #pred_outputs_1[-1] = tf.Print(pred_outputs_1[-1], [pred_outputs_1[-1]], summarize=10000)
+            pred_x_first_stage1, pred_y_first_stage1 = get_keypoint(image, pred_outputs_1[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=False, data_format=params['data_format'])
+            pred_x_first_stage2, pred_y_first_stage2 = get_keypoint(image, pred_outputs_2[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=False, data_format=params['data_format'])
 
             dist = tf.pow(tf.pow(pred_x_first_stage1 - pred_x_first_stage2, 2.) + tf.pow(pred_y_first_stage1 - pred_y_first_stage2, 2.), .5)
 
@@ -435,17 +436,17 @@ def main(_):
             #Images/blouse/ab669925e96490ec698af976586f0b2f.jpg
             df.loc[cur_record] = [filename, m] + temp_list
             cur_record = cur_record + 1
-        df.to_csv('./{}.csv'.format(m), encoding='utf-8', index=False)
+        df.to_csv('./{}_{}.csv'.format(FLAGS.backbone.strip(), m), encoding='utf-8', index=False)
 
     # merge dataframe
-    df_list = [pd.read_csv('./{}.csv'.format(model_to_eval[0]), encoding='utf-8')]
+    df_list = [pd.read_csv('./{}_{}.csv'.format(FLAGS.backbone.strip(), model_to_eval[0]), encoding='utf-8')]
     for m in model_to_eval[1:]:
         if m == '': continue
-        df_list.append(pd.read_csv('./{}.csv'.format(m), encoding='utf-8'))
-    pd.concat(df_list, ignore_index=True).to_csv('./sub.csv', encoding='utf-8', index=False)
+        df_list.append(pd.read_csv('./{}_{}.csv'.format(FLAGS.backbone.strip(), m), encoding='utf-8'))
+    pd.concat(df_list, ignore_index=True).to_csv('./{}_sub.csv'.format(FLAGS.backbone.strip()), encoding='utf-8', index=False)
 
     if FLAGS.run_on_cloud:
-        tf.gfile.Copy('./sub.csv', os.path.join(full_model_dir, 'sub.csv'), overwrite=True)
+        tf.gfile.Copy('./{}_sub.csv'.format(FLAGS.backbone.strip()), os.path.join(full_model_dir, '{}_sub.csv'.format(FLAGS.backbone.strip())), overwrite=True)
 
 if __name__ == '__main__':
   tf.logging.set_verbosity(tf.logging.INFO)
diff --git a/eval_hg_subnet.py b/eval_hg_subnet.py
index 77a21cce..a2c125ca 100644
--- a/eval_hg_subnet.py
+++ b/eval_hg_subnet.py
@@ -44,7 +44,7 @@
     'gpu_memory_fraction', 1., 'GPU memory fraction to use.')
 # scaffold related configuration
 tf.app.flags.DEFINE_string(
-    'data_dir', '../Datasets/tfrecords_test',# tfrecords_test_stage1_b tfrecords_test
+    'data_dir', '../Datasets/tfrecords_test_stage1_b',# tfrecords_test_stage1_b tfrecords_test
     'The directory where the dataset input data is stored.')
 tf.app.flags.DEFINE_string(
     'dataset_name', '{}_*.tfrecord', 'The pattern of the dataset name to load.')
@@ -85,9 +85,6 @@
 tf.app.flags.DEFINE_string(
     'checkpoint_path', None,
     'The path to a checkpoint from which to fine-tune.')
-tf.app.flags.DEFINE_string(
-    'coarse_pred_path', None,
-    'The path to a pred csv file from which to crop the input image for finer prediction.')
 tf.app.flags.DEFINE_boolean(
     'flip_on_test', False,
     'Wether we will average predictions of left-right fliped image.')
@@ -105,53 +102,11 @@
 #--model_scope=blouse --checkpoint_path=./logs/blouse
 FLAGS = tf.app.flags.FLAGS
 
-def preprocessing_fn(org_image, file_name, shape):
-  pd_df = None
-  if FLAGS.coarse_pred_path is not None:
-    if tf.gfile.Exists(FLAGS.coarse_pred_path):
-      tf.logging.info('Finetuning Prediction From {}.'.format(FLAGS.coarse_pred_path))
-      tf.gfile.Copy(FLAGS.coarse_pred_path, './__coarse_pred.csv', overwrite=True)
-      pd_df = pd.read_csv('./__coarse_pred.csv', encoding='utf-8')
-
-      all_filenames = []
-      all_xmin = []
-      all_ymin = []
-      all_xmax = []
-      all_ymax = []
-
-      all_values = pd_df.values.tolist()
-      for records in all_values:
-        all_filenames.append(records[0].encode('utf8'))
-        xmin = 2000
-        ymin = 2000
-        xmax = -1
-        ymax = -1
-        for kp in records[2:]:
-          keypoint_info = kp.strip().split('_')
-          if int(keypoint_info[2]) == -1:
-            continue
-          xmin = min(xmin, int(keypoint_info[0]))
-          ymin = min(ymin, int(keypoint_info[1]))
-          xmax = max(xmax, int(keypoint_info[0]))
-          ymax = max(ymax, int(keypoint_info[1]))
-        all_xmin.append(xmin)
-        all_ymin.append(ymin)
-        all_xmax.append(xmax)
-        all_ymax.append(ymax)
-      #print(all_filenames, all_xmin, all_ymin, all_xmax, all_ymax)
-      xmin_table = tf.contrib.lookup.HashTable(tf.contrib.lookup.KeyValueTensorInitializer(tf.constant(all_filenames, dtype=tf.string), tf.constant(all_xmin, dtype=tf.int64)), -1)
-      ymin_table = tf.contrib.lookup.HashTable(tf.contrib.lookup.KeyValueTensorInitializer(tf.constant(all_filenames, dtype=tf.string), tf.constant(all_ymin, dtype=tf.int64)), -1)
-      xmax_table = tf.contrib.lookup.HashTable(tf.contrib.lookup.KeyValueTensorInitializer(tf.constant(all_filenames, dtype=tf.string), tf.constant(all_xmax, dtype=tf.int64)), -1)
-      ymax_table = tf.contrib.lookup.HashTable(tf.contrib.lookup.KeyValueTensorInitializer(tf.constant(all_filenames, dtype=tf.string), tf.constant(all_ymax, dtype=tf.int64)), -1)
-      pd_df = [xmin_table, ymin_table, xmax_table, ymax_table]
-  #pred_item['file_name'].encode('utf8')
-
-  #lnorm_table = tf.contrib.lookup.HashTable(tf.contrib.lookup.KeyValueTensorInitializer(tf.constant(config.global_norm_key, dtype=tf.int64), tf.constant(config.global_norm_lvalues, dtype=tf.int64)), 0)
-  return preprocessing.preprocess_for_test(org_image, file_name, shape, FLAGS.train_image_size, FLAGS.train_image_size, data_format=('NCHW' if FLAGS.data_format=='channels_first' else 'NHWC'), bbox_border=FLAGS.bbox_border, heatmap_sigma=FLAGS.heatmap_sigma, heatmap_size=FLAGS.heatmap_size, pred_df=pd_df)
 def input_pipeline(model_scope=FLAGS.model_scope):
     #preprocessing_fn = lambda org_image, shape: preprocessing.preprocess_for_test(org_image, shape, FLAGS.train_image_size, FLAGS.train_image_size, data_format=('NCHW' if FLAGS.data_format=='channels_first' else 'NHWC'), bbox_border=FLAGS.bbox_border, heatmap_sigma=FLAGS.heatmap_sigma, heatmap_size=FLAGS.heatmap_size)
+    preprocessing_fn = lambda org_image, file_name, shape: preprocessing.preprocess_for_test_raw_output(org_image, file_name, shape, FLAGS.train_image_size, FLAGS.train_image_size, data_format=('NCHW' if FLAGS.data_format=='channels_first' else 'NHWC'), bbox_border=FLAGS.bbox_border, heatmap_sigma=FLAGS.heatmap_sigma, heatmap_size=FLAGS.heatmap_size)
 
-    images, shape, file_name, classid, offsets = dataset.slim_test_get_split(FLAGS.data_dir, preprocessing_fn, FLAGS.num_readers, FLAGS.num_preprocessing_threads, file_pattern=FLAGS.dataset_name, category=(model_scope if 'all' not in model_scope else '*'), reader=None)
+    images, shape, file_name, classid, offsets = dataset.slim_test_get_split(FLAGS.data_dir, None, FLAGS.num_readers, FLAGS.num_preprocessing_threads, file_pattern=FLAGS.dataset_name, category=(model_scope if 'all' not in model_scope else '*'), reader=None, dynamic_pad=True)
 
     return {'images': images, 'shape': shape, 'classid': classid, 'file_name': file_name, 'pred_offsets': offsets}
 
@@ -316,49 +271,138 @@ def keypoint_model_fn(features, labels, mode, params):
 
     file_name = tf.identity(file_name, name='current_file')
 
+    image = preprocessing.preprocess_for_test_raw_output(features, params['train_image_size'], params['train_image_size'], data_format=('NCHW' if FLAGS.data_format=='channels_first' else 'NHWC'), scope='first_stage')
+
     if not params['flip_on_test']:
-        with tf.variable_scope(params['model_scope'], default_name=None, values=[features], reuse=tf.AUTO_REUSE):
-            pred_outputs = hg.create_model(features, params['num_stacks'], params['feats_channals'],
+        with tf.variable_scope(params['model_scope'], default_name=None, values=[image], reuse=tf.AUTO_REUSE):
+            pred_outputs = hg.create_model(image, params['num_stacks'], params['feats_channals'],
                                 config.class_num_joints[(params['model_scope'] if 'all' not in params['model_scope'] else '*')], params['num_modules'],
                                 (mode == tf.estimator.ModeKeys.TRAIN), params['data_format'])
         if params['data_format'] == 'channels_last':
             pred_outputs = [tf.transpose(pred_outputs[ind], [0, 3, 1, 2], name='outputs_trans_{}'.format(ind)) for ind in list(range(len(pred_outputs)))]
+
+        pred_x_first_stage, pred_y_first_stage = get_keypoint(image, pred_outputs[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=True, data_format=params['data_format'])
     else:
         # test augumentation on the fly
         if params['data_format'] == 'channels_last':
-            double_features = tf.reshape(tf.stack([features, tf.map_fn(tf.image.flip_left_right, features, back_prop=False)], axis = 1), [-1, params['train_image_size'], params['train_image_size'], 3])
+            double_features = tf.reshape(tf.stack([image, tf.map_fn(tf.image.flip_left_right, image, back_prop=False)], axis = 1), [-1, params['train_image_size'], params['train_image_size'], 3])
         else:
-            double_features = tf.reshape(tf.stack([features, tf.transpose(tf.map_fn(tf.image.flip_left_right, tf.transpose(features, [0, 2, 3, 1], name='nchw2nhwc'), back_prop=False), [0, 3, 1, 2], name='nhwc2nchw')], axis = 1), [-1, 3, params['train_image_size'], params['train_image_size']])
+            double_features = tf.reshape(tf.stack([image, tf.transpose(tf.map_fn(tf.image.flip_left_right, tf.transpose(image, [0, 2, 3, 1], name='nchw2nhwc'), back_prop=False), [0, 3, 1, 2], name='nhwc2nchw')], axis = 1), [-1, 3, params['train_image_size'], params['train_image_size']])
 
         num_joints = config.class_num_joints[(params['model_scope'] if 'all' not in params['model_scope'] else '*')]
         with tf.variable_scope(params['model_scope'], default_name=None, values=[double_features], reuse=tf.AUTO_REUSE):
             pred_outputs = hg.create_model(double_features, params['num_stacks'], params['feats_channals'],
-                                num_joints, params['num_modules'],
+                                config.class_num_joints[(params['model_scope'] if 'all' not in params['model_scope'] else '*')], params['num_modules'],
                                 (mode == tf.estimator.ModeKeys.TRAIN), params['data_format'])
 
         if params['data_format'] == 'channels_last':
             pred_outputs = [tf.transpose(pred_outputs[ind], [0, 3, 1, 2], name='outputs_trans_{}'.format(ind)) for ind in list(range(len(pred_outputs)))]
-        # [[0, 0, 0, ..], [1, 1, 1, ...], ...]
-        row_indices = tf.tile(tf.reshape(tf.range(tf.shape(double_features)[0]), [-1, 1]), [1, num_joints])
-        # [[0, 1, 2, ...], [1, 0, 2, ...], [0, 1, 2], [1, 0, 2], ...]
-        col_indices = tf.reshape(tf.tile(tf.reshape(tf.stack([tf.range(num_joints), tf.constant(config.left_right_remap[(params['model_scope'] if 'all' not in params['model_scope'] else '*')])], axis=0), [-1]), [tf.shape(features)[0]]), [-1, num_joints])
-        # [[[0, 0], [0, 1], [0, 2], ...], [[1, 1], [1, 0], [1, 2], ...], [[2, 0], [2, 1], [2, 2], ...], ...]
+        row_indices = tf.tile(tf.reshape(tf.stack([tf.range(0, tf.shape(double_features)[0], delta=2), tf.range(1, tf.shape(double_features)[0], delta=2)], axis=0), [-1, 1]), [1, num_joints])
+        col_indices = tf.reshape(tf.tile(tf.reshape(tf.stack([tf.range(num_joints), tf.constant(config.left_right_remap[(params['model_scope'] if 'all' not in params['model_scope'] else '*')])], axis=0), [2, -1]), [1, tf.shape(features)[0]]), [-1, num_joints])
         flip_indices=tf.stack([row_indices, col_indices], axis=-1)
 
         #flip_indices = tf.Print(flip_indices, [flip_indices], summarize=500)
         pred_outputs = [tf.gather_nd(pred_outputs[ind], flip_indices, name='gather_nd_{}'.format(ind)) for ind in list(range(len(pred_outputs)))]
 
         def cond_flip(heatmap_ind):
-            return tf.cond(heatmap_ind[1] < 1, lambda : heatmap_ind[0], lambda : tf.transpose(tf.image.flip_left_right(tf.transpose(heatmap_ind[0], [1, 2, 0], name='pred_nchw2nhwc')), [2, 0, 1], name='pred_nhwc2nchw'))
-        # all the heatmap of the fliped image should also be fliped back, a little complicated
-        pred_outputs = [tf.map_fn(cond_flip, [pred_outputs[ind], tf.tile(tf.reshape(tf.range(2), [-1]), [tf.shape(features)[0]])], dtype=tf.float32, parallel_iterations=10, back_prop=True, swap_memory=False, infer_shape=True, name='map_fn_{}'.format(ind)) for ind in list(range(len(pred_outputs)))]
-        # average predictions of left_reight_fliped image
-        segment_indices = tf.reshape(tf.tile(tf.reshape(tf.range(tf.shape(features)[0]), [-1, 1]), [1, 2]), [-1])
-        pred_outputs = [tf.segment_mean(pred_outputs[ind], segment_indices, name='segment_mean_{}'.format(ind)) for ind in list(range(len(pred_outputs)))]
+            return tf.cond(heatmap_ind[1] < tf.shape(features)[0], lambda : heatmap_ind[0], lambda : tf.transpose(tf.image.flip_left_right(tf.transpose(heatmap_ind[0], [1, 2, 0], name='pred_nchw2nhwc')), [2, 0, 1], name='pred_nhwc2nchw'))
+        # all the heatmap of the fliped image should also be fliped back
+        pred_outputs = [tf.map_fn(cond_flip, [pred_outputs[ind], tf.range(tf.shape(double_features)[0])], dtype=tf.float32, parallel_iterations=10, back_prop=True, swap_memory=False, infer_shape=True, name='map_fn_{}'.format(ind)) for ind in list(range(len(pred_outputs)))]
+        pred_outputs = [tf.split(_, 2) for _ in pred_outputs]
+        pred_outputs_1 = [_[0] for _ in pred_outputs]
+        pred_outputs_2 = [_[1] for _ in pred_outputs]
+        pred_x_first_stage1, pred_y_first_stage1 = get_keypoint(image, pred_outputs_1[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=True, data_format=params['data_format'])
+        pred_x_first_stage2, pred_y_first_stage2 = get_keypoint(image, pred_outputs_2[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=True, data_format=params['data_format'])
+
+        dist = tf.pow(tf.pow(pred_x_first_stage1 - pred_x_first_stage2, 2.) + tf.pow(pred_y_first_stage1 - pred_y_first_stage2, 2.), .5)
+
+        pred_x_first_stage = tf.where(dist < 1e-3, pred_x_first_stage1, pred_x_first_stage1 + (pred_x_first_stage2 - pred_x_first_stage1) * 0.25 / dist)
+        pred_y_first_stage = tf.where(dist < 1e-3, pred_y_first_stage1, pred_y_first_stage1 + (pred_y_first_stage2 - pred_y_first_stage1) * 0.25 / dist)
+
+    xmin = tf.cast(tf.reduce_min(pred_x_first_stage), tf.int64)
+    xmax = tf.cast(tf.reduce_max(pred_x_first_stage), tf.int64)
+    ymin = tf.cast(tf.reduce_min(pred_y_first_stage), tf.int64)
+    ymax = tf.cast(tf.reduce_max(pred_y_first_stage), tf.int64)
+
+    xmin, ymin, xmax, ymax = xmin - 100, ymin - 80, xmax + 100, ymax + 80
+
+    xmin = tf.clip_by_value(xmin, 0, shape[0][1][0]-1)
+    ymin = tf.clip_by_value(ymin, 0, shape[0][0][0]-1)
+    xmax = tf.clip_by_value(xmax, 0, shape[0][1][0]-1)
+    ymax = tf.clip_by_value(ymax, 0, shape[0][0][0]-1)
+
+    bbox_h = ymax - ymin
+    bbox_w = xmax - xmin
+    areas = bbox_h * bbox_w
+
+    offsets=tf.stack([xmin, ymin], axis=0)
+    crop_shape = tf.stack([bbox_h, bbox_w, shape[0][2][0]], axis=0)
+
+    ymin, xmin, bbox_h, bbox_w = tf.cast(ymin, tf.int32), tf.cast(xmin, tf.int32), tf.cast(bbox_h, tf.int32), tf.cast(bbox_w, tf.int32)
+
+    single_image = tf.squeeze(features, [0])
+    crop_image = tf.image.crop_to_bounding_box(single_image, ymin, xmin, bbox_h, bbox_w)
+    crop_image = tf.expand_dims(crop_image, 0)
 
-    pred_x, pred_y = get_keypoint(features, pred_outputs[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=True, data_format=params['data_format'])
+    image, shape, offsets = tf.cond(areas > 0, lambda : (crop_image, crop_shape, offsets),
+                                    lambda : (features, shape, tf.constant([0, 0], tf.int64)))
+    offsets.set_shape([2])
+    offsets = tf.to_float(offsets)
+    shape = tf.reshape(shape, [1, 3])
 
-    predictions = {'pred_x': pred_x + pred_offsets[:, 0], 'pred_y': pred_y + pred_offsets[:, 1], 'file_name': file_name}
+    image = preprocessing.preprocess_for_test_raw_output(image, params['train_image_size'], params['train_image_size'], data_format=('NCHW' if FLAGS.data_format=='channels_first' else 'NHWC'), scope='second_stage')
+
+    if not params['flip_on_test']:
+        with tf.variable_scope(params['model_scope'], default_name=None, values=[image], reuse=True):
+            pred_outputs = hg.create_model(image, params['num_stacks'], params['feats_channals'],
+                                config.class_num_joints[(params['model_scope'] if 'all' not in params['model_scope'] else '*')], params['num_modules'],
+                                (mode == tf.estimator.ModeKeys.TRAIN), params['data_format'])
+        with tf.name_scope("refine_prediction"):
+            if params['data_format'] == 'channels_last':
+                pred_outputs = [tf.transpose(pred_outputs[ind], [0, 3, 1, 2], name='outputs_trans_{}'.format(ind)) for ind in list(range(len(pred_outputs)))]
+
+            pred_x, pred_y = get_keypoint(image, pred_outputs[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=True, data_format=params['data_format'])
+    else:
+        # test augumentation on the fly
+        with tf.name_scope("refine_prediction"):
+            if params['data_format'] == 'channels_last':
+                double_features = tf.reshape(tf.stack([image, tf.map_fn(tf.image.flip_left_right, image, back_prop=False)], axis = 1), [-1, params['train_image_size'], params['train_image_size'], 3])
+            else:
+                double_features = tf.reshape(tf.stack([image, tf.transpose(tf.map_fn(tf.image.flip_left_right, tf.transpose(image, [0, 2, 3, 1], name='nchw2nhwc'), back_prop=False), [0, 3, 1, 2], name='nhwc2nchw')], axis = 1), [-1, 3, params['train_image_size'], params['train_image_size']])
+
+        num_joints = config.class_num_joints[(params['model_scope'] if 'all' not in params['model_scope'] else '*')]
+        with tf.variable_scope(params['model_scope'], default_name=None, values=[double_features], reuse=True):
+            pred_outputs = hg.create_model(double_features, params['num_stacks'], params['feats_channals'],
+                                config.class_num_joints[(params['model_scope'] if 'all' not in params['model_scope'] else '*')], params['num_modules'],
+                                (mode == tf.estimator.ModeKeys.TRAIN), params['data_format'])
+        with tf.name_scope("refine_prediction"):
+            if params['data_format'] == 'channels_last':
+                pred_outputs = [tf.transpose(pred_outputs[ind], [0, 3, 1, 2], name='outputs_trans_{}'.format(ind)) for ind in list(range(len(pred_outputs)))]
+            row_indices = tf.tile(tf.reshape(tf.stack([tf.range(0, tf.shape(double_features)[0], delta=2), tf.range(1, tf.shape(double_features)[0], delta=2)], axis=0), [-1, 1]), [1, num_joints])
+            col_indices = tf.reshape(tf.tile(tf.reshape(tf.stack([tf.range(num_joints), tf.constant(config.left_right_remap[(params['model_scope'] if 'all' not in params['model_scope'] else '*')])], axis=0), [2, -1]), [1, tf.shape(features)[0]]), [-1, num_joints])
+            flip_indices=tf.stack([row_indices, col_indices], axis=-1)
+
+            #flip_indices = tf.Print(flip_indices, [flip_indices], summarize=500)
+            pred_outputs = [tf.gather_nd(pred_outputs[ind], flip_indices, name='gather_nd_{}'.format(ind)) for ind in list(range(len(pred_outputs)))]
+
+            def cond_flip(heatmap_ind):
+                return tf.cond(heatmap_ind[1] < tf.shape(features)[0], lambda : heatmap_ind[0], lambda : tf.transpose(tf.image.flip_left_right(tf.transpose(heatmap_ind[0], [1, 2, 0], name='pred_nchw2nhwc')), [2, 0, 1], name='pred_nhwc2nchw'))
+            # all the heatmap of the fliped image should also be fliped back
+            pred_outputs = [tf.map_fn(cond_flip, [pred_outputs[ind], tf.range(tf.shape(double_features)[0])], dtype=tf.float32, parallel_iterations=10, back_prop=True, swap_memory=False, infer_shape=True, name='map_fn_{}'.format(ind)) for ind in list(range(len(pred_outputs)))]
+            pred_outputs = [tf.split(_, 2) for _ in pred_outputs]
+            pred_outputs_1 = [_[0] for _ in pred_outputs]
+            pred_outputs_2 = [_[1] for _ in pred_outputs]
+            pred_x_first_stage1, pred_y_first_stage1 = get_keypoint(image, pred_outputs_1[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=True, data_format=params['data_format'])
+            pred_x_first_stage2, pred_y_first_stage2 = get_keypoint(image, pred_outputs_2[-1], params['heatmap_size'], shape[0][0], shape[0][1], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=True, data_format=params['data_format'])
+
+            dist = tf.pow(tf.pow(pred_x_first_stage1 - pred_x_first_stage2, 2.) + tf.pow(pred_y_first_stage1 - pred_y_first_stage2, 2.), .5)
+
+            pred_x = tf.where(dist < 1e-3, pred_x_first_stage1, pred_x_first_stage1 + (pred_x_first_stage2 - pred_x_first_stage1) * 0.25 / dist)
+            pred_y = tf.where(dist < 1e-3, pred_y_first_stage1, pred_y_first_stage1 + (pred_y_first_stage2 - pred_y_first_stage1) * 0.25 / dist)
+    # for var in tf.get_collection(tf.GraphKeys.GLOBAL_VARIABLES):#TRAINABLE_VARIABLES):
+    #   print(var.op.name)
+
+    predictions = {'pred_x': pred_x + offsets[0], 'pred_y': pred_y + offsets[1], 'file_name': file_name}
 
     if mode == tf.estimator.ModeKeys.PREDICT:
         return tf.estimator.EstimatorSpec(
diff --git a/eval_script.sh b/eval_script.sh
deleted file mode 100644
index b3f75370..00000000
--- a/eval_script.sh
+++ /dev/null
@@ -1,16 +0,0 @@
-#! /bin/bash
-
-# export CUDA_VISIBLE_DEVICES='0'
-# source /home/kapok/pyenv35/bin/activate
-# cd /media/rs/0E06CD1706CD0127/Kapok/Chi/fashionAI/Codes
-python eval_all_cpn_onepass.py --run_on_cloud=False --backbone=seresnext50_cpn
-python eval_all_cpn_onepass.py --run_on_cloud=False --backbone=detnext50_cpn
-python eval_all_cpn_onepass.py --run_on_cloud=False --backbone=large_seresnext_cpn --train_image_size=512 --heatmap_size=128
-python eval_all_cpn_onepass.py --run_on_cloud=False --backbone=large_detnext_cpn --train_image_size=512 --heatmap_size=128
-
-# for training
-python train_senet_cpn_onebyone.py --run_on_cloud=False
-python train_detxt_cpn_onebyone.py --run_on_cloud=False
-python train_large_xt_cpn_onebyone.py --run_on_cloud=False --backbone=detxt
-python train_large_xt_cpn_onebyone.py --run_on_cloud=False --backbone=sext
-
diff --git a/preprocessing/dataset.py b/preprocessing/dataset.py
index d8c490db..553f8bee 100644
--- a/preprocessing/dataset.py
+++ b/preprocessing/dataset.py
@@ -28,7 +28,7 @@
 # blouse_0000.tfrecord
 # {}_????_val.tfrecord
 #category = *
-def slim_get_split(dataset_dir, image_preprocessing_fn, batch_size, num_readers, num_preprocessing_threads, num_epochs=None, is_training=True, category='blouse', file_pattern='{}_????', reader=None):
+def slim_get_split(dataset_dir, image_preprocessing_fn, batch_size, num_readers, num_preprocessing_threads, num_epochs=None, is_training=True, category='blouse', file_pattern='{}_????', reader=None, return_keypoints=False):
     # Allowing None in the signature so that dataset_factory can use the default.
     if reader is None:
         reader = tf.TFRecordReader
@@ -97,10 +97,15 @@ def slim_get_split(dataset_dir, image_preprocessing_fn, batch_size, num_readers,
     key_x, key_y, key_v, key_id, key_gid = tf.gather(key_x, gather_ind), tf.gather(key_y, gather_ind), tf.gather(key_v, gather_ind), tf.gather(key_id, gather_ind), tf.gather(key_gid, gather_ind)
 
     shape = tf.stack([height, width, channels], axis=0)
-    image, targets, new_key_v, isvalid, norm_value = image_preprocessing_fn(org_image, classid, shape, key_x, key_y, key_v)
 
-    batch_input = tf.train.batch([image, shape,
-                                classid, targets, new_key_v, isvalid, norm_value],
+    if not return_keypoints:
+        image, targets, new_key_v, isvalid, norm_value = image_preprocessing_fn(org_image, classid, shape, key_x, key_y, key_v)
+        batch_list = [image, shape, classid, targets, new_key_v, isvalid, norm_value]
+    else:
+        image, targets, new_key_x, new_key_y, new_key_v, isvalid, norm_value = image_preprocessing_fn(org_image, classid, shape, key_x, key_y, key_v)
+        batch_list = [image, shape, classid, targets, new_key_x, new_key_y, new_key_v, isvalid, norm_value]
+
+    batch_input = tf.train.batch(batch_list,
                                 #classid, key_x, key_y, key_v, key_id, key_gid],
                                 dynamic_pad=False,#(not is_training),
                                 batch_size = batch_size,
diff --git a/preprocessing/preprocessing.py b/preprocessing/preprocessing.py
index d4533fc7..b34caacc 100644
--- a/preprocessing/preprocessing.py
+++ b/preprocessing/preprocessing.py
@@ -766,6 +766,7 @@ def preprocess_for_train(image,
                          data_format,
                          category,
                          bbox_border, heatmap_sigma, heatmap_size,
+                         return_keypoints=False,
                          resize_side_min=_RESIZE_SIDE_MIN,
                          resize_side_max=_RESIZE_SIDE_MAX,
                          fast_mode=False,
@@ -894,7 +895,10 @@ def preprocess_for_train(image,
     if data_format == 'NCHW':
       distorted_image = tf.transpose(distorted_image, perm=(2, 0, 1))
 
-    return distorted_image, targets, new_key_v, isvalid, norm_value
+    if not return_keypoints:
+      return distorted_image, targets, new_key_v, isvalid, norm_value
+    else:
+      return distorted_image, targets, new_key_x, new_key_y, new_key_v, isvalid, norm_value
 
 
 def preprocess_for_train_v0(image,
@@ -906,6 +910,7 @@ def preprocess_for_train_v0(image,
                            data_format,
                            category,
                            bbox_border, heatmap_sigma, heatmap_size,
+                           return_keypoints=False,
                            resize_side_min=_RESIZE_SIDE_MIN,
                            resize_side_max=_RESIZE_SIDE_MAX,
                            fast_mode=True,
@@ -1208,6 +1213,7 @@ def preprocess_image(image, classid, shape, output_height, output_width,
                     data_format='NCHW',
                     category='*',
                     bbox_border=25., heatmap_sigma=1., heatmap_size=64,
+                    return_keypoints=False,
                     resize_side_min=_RESIZE_SIDE_MIN,
                     resize_side_max=_RESIZE_SIDE_MAX):
   """Preprocesses the given image.
@@ -1231,7 +1237,7 @@ def preprocess_image(image, classid, shape, output_height, output_width,
   """
   if is_training:
     return preprocess_for_train(image, classid, shape, output_height, output_width, key_x, key_y, key_v, norm_table, data_format,
-                              category, bbox_border, heatmap_sigma, heatmap_size, resize_side_min, resize_side_max)
+                              category, bbox_border, heatmap_sigma, heatmap_size, return_keypoints, resize_side_min, resize_side_max)
   else:
     return preprocess_for_eval(image, classid, shape, output_height, output_width, key_x, key_y, key_v, norm_table, data_format,
                               category, bbox_border, heatmap_sigma, heatmap_size, min(output_height, output_width))
diff --git a/tf_replicate_model_fn.py b/tf_replicate_model_fn.py
index 129004a3..c9153c93 100644
--- a/tf_replicate_model_fn.py
+++ b/tf_replicate_model_fn.py
@@ -780,7 +780,7 @@ def _predict_spec(tower_specs, aggregation_device):
 def _concat_tensor_dicts(*tensor_dicts):
   return {
       name: array_ops.concat(tensors, axis=0, name=name)
-      for name, tensors in six.iteritems(_dict_concat(*tensor_dicts))tf_replicate_model_fn.py
+      for name, tensors in six.iteritems(_dict_concat(*tensor_dicts))
   }
 
 
diff --git a/train_cpn_onebyone.py b/train_cpn_onebyone.py
index 1b59d4ad..424b9407 100644
--- a/train_cpn_onebyone.py
+++ b/train_cpn_onebyone.py
@@ -559,8 +559,8 @@ def main(_):
         detail_params = {
             'blouse': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'blouse'),
-                'train_epochs': 40,
-                'epochs_per_eval': 15,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'blouse',
@@ -571,8 +571,8 @@ def main(_):
             },
             'dress': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'dress'),
-                'train_epochs': 40,
-                'epochs_per_eval': 15,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'dress',
@@ -583,8 +583,8 @@ def main(_):
             },
             'outwear': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'outwear'),
-                'train_epochs': 40,
-                'epochs_per_eval': 15,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'outwear',
@@ -595,8 +595,8 @@ def main(_):
             },
             'skirt': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'skirt'),
-                'train_epochs': 40,
-                'epochs_per_eval': 15,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'skirt',
@@ -607,8 +607,8 @@ def main(_):
             },
             'trousers': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'trousers'),
-                'train_epochs': 40,
-                'epochs_per_eval': 15,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'trousers',
diff --git a/train_detnet_cpn_onebyone.py b/train_detnet_cpn_onebyone.py
index ae9b315d..d355f8f0 100644
--- a/train_detnet_cpn_onebyone.py
+++ b/train_detnet_cpn_onebyone.py
@@ -559,8 +559,8 @@ def main(_):
         detail_params = {
             'blouse': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'blouse'),
-                'train_epochs': 40,
-                'epochs_per_eval': 16,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'blouse',
@@ -571,8 +571,8 @@ def main(_):
             },
             'dress': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'dress'),
-                'train_epochs': 40,
-                'epochs_per_eval': 16,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'dress',
@@ -583,8 +583,8 @@ def main(_):
             },
             'outwear': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'outwear'),
-                'train_epochs': 40,
-                'epochs_per_eval': 16,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'outwear',
@@ -595,8 +595,8 @@ def main(_):
             },
             'skirt': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'skirt'),
-                'train_epochs': 40,
-                'epochs_per_eval': 16,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'skirt',
@@ -607,8 +607,8 @@ def main(_):
             },
             'trousers': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'trousers'),
-                'train_epochs': 40,
-                'epochs_per_eval': 16,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'trousers',
diff --git a/train_detxt_cpn_onebyone.py b/train_detxt_cpn_onebyone.py
index 91859b50..04d8a7fa 100644
--- a/train_detxt_cpn_onebyone.py
+++ b/train_detxt_cpn_onebyone.py
@@ -559,8 +559,8 @@ def main(_):
         detail_params = {
             'blouse': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'blouse'),
-                'train_epochs': 40,
-                'epochs_per_eval': 16,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'blouse',
@@ -571,8 +571,8 @@ def main(_):
             },
             'dress': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'dress'),
-                'train_epochs': 40,
-                'epochs_per_eval': 16,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'dress',
@@ -583,8 +583,8 @@ def main(_):
             },
             'outwear': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'outwear'),
-                'train_epochs': 40,
-                'epochs_per_eval': 16,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'outwear',
@@ -595,8 +595,8 @@ def main(_):
             },
             'skirt': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'skirt'),
-                'train_epochs': 40,
-                'epochs_per_eval': 16,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'skirt',
@@ -607,8 +607,8 @@ def main(_):
             },
             'trousers': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'trousers'),
-                'train_epochs': 40,
-                'epochs_per_eval': 16,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'trousers',
@@ -620,13 +620,6 @@ def main(_):
         }
     model_to_train = [s.strip() for s in FLAGS.model_to_train.split(',')]
 
-    # import datetime
-    # import time
-    # while True:
-    #     time.sleep(1600)
-    #     if '8' in datetime.datetime.now().time().strftime('%H'):
-    #         break
-
     for m in model_to_train:
         sub_loop(keypoint_model_fn, m, detail_params[m]['model_dir'], run_config, detail_params[m]['train_epochs'], detail_params[m]['epochs_per_eval'], detail_params[m]['lr_decay_factors'], detail_params[m]['decay_boundaries'], detail_params[m]['checkpoint_path'], detail_params[m]['checkpoint_exclude_scopes'], detail_params[m]['checkpoint_model_scope'], detail_params[m]['ignore_missing_vars'])
 
diff --git a/train_head_senet_cpn_onebyone.py b/train_head_senet_cpn_onebyone.py
deleted file mode 100644
index 56fe8b0e..00000000
--- a/train_head_senet_cpn_onebyone.py
+++ /dev/null
@@ -1,640 +0,0 @@
-# Copyright 2018 Changan Wang
-
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-
-#     http://www.apache.org/licenses/LICENSE-2.0
-
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-# =============================================================================
-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
-import os
-import sys
-import numpy as np
-#from scipy.misc import imread, imsave, imshow, imresize
-import tensorflow as tf
-
-from net import seresnet_cpn as cpn
-from utility import train_helper
-from utility import mertric
-
-from preprocessing import preprocessing
-from preprocessing import dataset
-import config
-
-# hardware related configuration
-tf.app.flags.DEFINE_integer(
-    'num_readers', 16,#16
-    'The number of parallel readers that read data from the dataset.')
-tf.app.flags.DEFINE_integer(
-    'num_preprocessing_threads', 48,#48
-    'The number of threads used to create the batches.')
-tf.app.flags.DEFINE_integer(
-    'num_cpu_threads', 0,
-    'The number of cpu cores used to train.')
-tf.app.flags.DEFINE_float(
-    'gpu_memory_fraction', 1., 'GPU memory fraction to use.')
-# scaffold related configuration
-tf.app.flags.DEFINE_string(
-    'data_dir', '../Datasets/tfrecords',#'/media/rs/0E06CD1706CD0127/Kapok/Chi/Datasets/tfrecords',
-    'The directory where the dataset input data is stored.')
-tf.app.flags.DEFINE_string(
-    'dataset_name', '{}_????', 'The pattern of the dataset name to load.')
-tf.app.flags.DEFINE_string(
-    'model_dir', './logs_head_sext_cpn/',
-    'The parent directory where the model will be stored.')
-tf.app.flags.DEFINE_integer(
-    'log_every_n_steps', 10,
-    'The frequency with which logs are print.')
-tf.app.flags.DEFINE_integer(
-    'save_summary_steps', 100,
-    'The frequency with which summaries are saved, in seconds.')
-tf.app.flags.DEFINE_integer(
-    'save_checkpoints_secs', 3600,
-    'The frequency with which the model is saved, in seconds.')
-# model related configuration
-tf.app.flags.DEFINE_integer(
-    'train_image_size', 384,
-    'The size of the input image for the model to use.')
-tf.app.flags.DEFINE_integer(
-    'heatmap_size', 96,
-    'The size of the output heatmap of the model.')
-tf.app.flags.DEFINE_float(
-    'heatmap_sigma', 1.,
-    'The sigma of Gaussian which generate the target heatmap.')
-tf.app.flags.DEFINE_float(
-    'bbox_border', 25.,
-    'The nearest distance of the crop border to al keypoints.')
-tf.app.flags.DEFINE_integer(
-    'train_epochs', 50,
-    'The number of epochs to use for training.')
-tf.app.flags.DEFINE_integer(
-    'epochs_per_eval', 20,
-    'The number of training epochs to run between evaluations.')
-tf.app.flags.DEFINE_integer(
-    'batch_size', 10,
-    'Batch size for training and evaluation.')
-tf.app.flags.DEFINE_integer(
-    'xt_batch_size', 10,
-    'Batch size for training and evaluation.')
-tf.app.flags.DEFINE_boolean(
-    'use_ohkm', True,
-    'Wether we will use the ohkm for hard keypoints.')
-tf.app.flags.DEFINE_string(
-    'data_format', 'channels_first', # 'channels_first' or 'channels_last'
-    'A flag to override the data format used in the model. channels_first '
-    'provides a performance boost on GPU but is not always compatible '
-    'with CPU. If left unspecified, the data format will be chosen '
-    'automatically based on whether TensorFlow was built for CPU or GPU.')
-# optimizer related configuration
-tf.app.flags.DEFINE_integer(
-    'tf_random_seed', 20180417, 'Random seed for TensorFlow initializers.')
-tf.app.flags.DEFINE_float(
-    'weight_decay', 1e-5, 'The weight decay on the model weights.')
-tf.app.flags.DEFINE_float(
-    'mse_weight', 1., 'The weight decay on the model weights.')
-tf.app.flags.DEFINE_float(
-    'momentum', 0.9,
-    'The momentum for the MomentumOptimizer and RMSPropOptimizer.')
-tf.app.flags.DEFINE_float('learning_rate', 1e-4, 'Initial learning rate.')#1e-3
-tf.app.flags.DEFINE_float(
-    'end_learning_rate', 0.000001,
-    'The minimal end learning rate used by a polynomial decay learning rate.')
-tf.app.flags.DEFINE_float(
-    'warmup_learning_rate', 0.00001,
-    'The start warm-up learning rate to avoid NAN.')
-tf.app.flags.DEFINE_integer(
-    'warmup_steps', 100,
-    'The total steps to warm-up.')
-# for learning rate piecewise_constant decay
-tf.app.flags.DEFINE_string(
-    'decay_boundaries', '2, 3',
-    'Learning rate decay boundaries by global_step (comma-separated list).')
-tf.app.flags.DEFINE_string(
-    'lr_decay_factors', '1, 0.5, 0.1',
-    'The values of learning_rate decay factor for each segment between boundaries (comma-separated list).')
-# checkpoint related configuration
-tf.app.flags.DEFINE_string(
-    'checkpoint_path', './model',
-    'The path to a checkpoint from which to fine-tune.')
-tf.app.flags.DEFINE_string(
-    'checkpoint_model_scope', '',
-    'Model scope in the checkpoint. None if the same as the trained model.')
-tf.app.flags.DEFINE_string(
-    #'blouse', 'dress', 'outwear', 'skirt', 'trousers', 'all'
-    'model_scope', None,
-    'Model scope name used to replace the name_scope in checkpoint.')
-tf.app.flags.DEFINE_string(
-    'checkpoint_exclude_scopes', None,
-    'Comma-separated list of scopes of variables to exclude when restoring from a checkpoint.')
-tf.app.flags.DEFINE_boolean(
-    'ignore_missing_vars', True,
-    'When restoring a checkpoint would ignore missing variables.')
-tf.app.flags.DEFINE_boolean(
-    'run_on_cloud', True,
-    'Wether we will train on cloud.')
-tf.app.flags.DEFINE_boolean(
-    'seq_train', False,
-    'Wether we will train a sequence model.')
-tf.app.flags.DEFINE_string(
-    'model_to_train', 'blouse, dress, outwear, skirt, trousers', #'all, blouse, dress, outwear, skirt, trousers', 'skirt, dress, outwear, trousers',
-    'The sub-model to train (comma-separated list).')
-
-FLAGS = tf.app.flags.FLAGS
-#--model_scope=blouse --checkpoint_path=./logs/all --data_format=channels_last --batch_size=1
-def input_pipeline(is_training=True, model_scope=FLAGS.model_scope, num_epochs=FLAGS.epochs_per_eval):
-    if 'all' in model_scope:
-        lnorm_table = tf.contrib.lookup.HashTable(tf.contrib.lookup.KeyValueTensorInitializer(tf.constant(config.global_norm_key, dtype=tf.int64),
-                                                                tf.constant(config.global_norm_lvalues, dtype=tf.int64)), 0)
-        rnorm_table = tf.contrib.lookup.HashTable(tf.contrib.lookup.KeyValueTensorInitializer(tf.constant(config.global_norm_key, dtype=tf.int64),
-                                                                tf.constant(config.global_norm_rvalues, dtype=tf.int64)), 1)
-    else:
-        lnorm_table = tf.contrib.lookup.HashTable(tf.contrib.lookup.KeyValueTensorInitializer(tf.constant(config.local_norm_key, dtype=tf.int64),
-                                                                tf.constant(config.local_norm_lvalues, dtype=tf.int64)), 0)
-        rnorm_table = tf.contrib.lookup.HashTable(tf.contrib.lookup.KeyValueTensorInitializer(tf.constant(config.local_norm_key, dtype=tf.int64),
-                                                                tf.constant(config.local_norm_rvalues, dtype=tf.int64)), 1)
-
-    preprocessing_fn = lambda org_image, classid, shape, key_x, key_y, key_v: preprocessing.preprocess_image(org_image, classid, shape, FLAGS.train_image_size, FLAGS.train_image_size, key_x, key_y, key_v, (lnorm_table, rnorm_table), is_training=is_training, data_format=('NCHW' if FLAGS.data_format=='channels_first' else 'NHWC'), category=(model_scope if 'all' not in model_scope else '*'), bbox_border=FLAGS.bbox_border, heatmap_sigma=FLAGS.heatmap_sigma, heatmap_size=FLAGS.heatmap_size)
-
-    images, shape, classid, targets, key_v, isvalid, norm_value = dataset.slim_get_split(FLAGS.data_dir, preprocessing_fn, FLAGS.batch_size, FLAGS.num_readers, FLAGS.num_preprocessing_threads, num_epochs=num_epochs, is_training=is_training, file_pattern=FLAGS.dataset_name, category=(model_scope if 'all' not in model_scope else '*'), reader=None)
-
-    return images, {'targets': targets, 'key_v': key_v, 'shape': shape, 'classid': classid, 'isvalid': isvalid, 'norm_value': norm_value}
-
-if config.PRED_DEBUG:
-  from scipy.misc import imread, imsave, imshow, imresize
-  def save_image_with_heatmap(image, height, width, heatmap_size, targets, pred_heatmap, indR, indG, indB):
-      if not hasattr(save_image_with_heatmap, "counter"):
-          save_image_with_heatmap.counter = 0  # it doesn't exist yet, so initialize it
-      save_image_with_heatmap.counter += 1
-
-      img_to_save = np.array(image.tolist()) + 128
-      #print(img_to_save.shape)
-
-      img_to_save = img_to_save.astype(np.uint8)
-
-      heatmap0 = np.sum(targets[indR, ...], axis=0).astype(np.uint8)
-      heatmap1 = np.sum(targets[indG, ...], axis=0).astype(np.uint8)
-      heatmap2 = np.sum(targets[indB, ...], axis=0).astype(np.uint8) if len(indB) > 0 else np.zeros((heatmap_size, heatmap_size), dtype=np.float32)
-
-      img_to_save = imresize(img_to_save, (height, width), interp='lanczos')
-      heatmap0 = imresize(heatmap0, (height, width), interp='lanczos')
-      heatmap1 = imresize(heatmap1, (height, width), interp='lanczos')
-      heatmap2 = imresize(heatmap2, (height, width), interp='lanczos')
-
-      img_to_save = img_to_save/2
-      img_to_save[:,:,0] = np.clip((img_to_save[:,:,0] + heatmap0 + heatmap2), 0, 255)
-      img_to_save[:,:,1] = np.clip((img_to_save[:,:,1] + heatmap1 + heatmap2), 0, 255)
-      #img_to_save[:,:,2] = np.clip((img_to_save[:,:,2]/4. + heatmap2), 0, 255)
-      file_name = 'targets_{}.jpg'.format(save_image_with_heatmap.counter)
-      imsave(os.path.join(config.DEBUG_DIR, file_name), img_to_save.astype(np.uint8))
-
-      pred_heatmap = np.array(pred_heatmap.tolist())
-      #print(pred_heatmap.shape)
-      for ind in range(pred_heatmap.shape[0]):
-        img = pred_heatmap[ind]
-        img = img - img.min()
-        img *= 255.0/img.max()
-        file_name = 'heatmap_{}_{}.jpg'.format(save_image_with_heatmap.counter, ind)
-        imsave(os.path.join(config.DEBUG_DIR, file_name), img.astype(np.uint8))
-      return save_image_with_heatmap.counter
-
-def get_keypoint(image, targets, predictions, heatmap_size, height, width, category, clip_at_zero=True, data_format='channels_last', name=None):
-    predictions = tf.reshape(predictions, [1, -1, heatmap_size*heatmap_size])
-
-    pred_max = tf.reduce_max(predictions, axis=-1)
-    pred_indices = tf.argmax(predictions, axis=-1)
-    pred_x, pred_y = tf.cast(tf.floormod(pred_indices, heatmap_size), tf.float32), tf.cast(tf.floordiv(pred_indices, heatmap_size), tf.float32)
-
-    width, height = tf.cast(width, tf.float32), tf.cast(height, tf.float32)
-    pred_x, pred_y = pred_x * width / tf.cast(heatmap_size, tf.float32), pred_y * height / tf.cast(heatmap_size, tf.float32)
-
-    if clip_at_zero:
-      pred_x, pred_y =  pred_x * tf.cast(pred_max>0, tf.float32), pred_y * tf.cast(pred_max>0, tf.float32)
-      pred_x = pred_x * tf.cast(pred_max>0, tf.float32) + tf.cast(pred_max<=0, tf.float32) * (width / 2.)
-      pred_y = pred_y * tf.cast(pred_max>0, tf.float32) + tf.cast(pred_max<=0, tf.float32) * (height / 2.)
-
-    if config.PRED_DEBUG:
-      pred_indices_ = tf.squeeze(pred_indices)
-      image_ = tf.squeeze(image) * 255.
-      pred_heatmap = tf.one_hot(pred_indices_, heatmap_size*heatmap_size, on_value=1., off_value=0., axis=-1, dtype=tf.float32)
-
-      pred_heatmap = tf.reshape(pred_heatmap, [-1, heatmap_size, heatmap_size])
-      if data_format == 'channels_first':
-        image_ = tf.transpose(image_, perm=(1, 2, 0))
-      save_image_op = tf.py_func(save_image_with_heatmap,
-                                  [image_, height, width,
-                                  heatmap_size,
-                                  tf.reshape(pred_heatmap * 255., [-1, heatmap_size, heatmap_size]),
-                                  tf.reshape(predictions, [-1, heatmap_size, heatmap_size]),
-                                  config.left_right_group_map[category][0],
-                                  config.left_right_group_map[category][1],
-                                  config.left_right_group_map[category][2]],
-                                  tf.int64, stateful=True)
-      with tf.control_dependencies([save_image_op]):
-        pred_x, pred_y = pred_x * 1., pred_y * 1.
-    return pred_x, pred_y
-
-def gaussian_blur(inputs, inputs_filters, sigma, data_format, name=None):
-    with tf.name_scope(name, "gaussian_blur", [inputs]):
-        data_format_ = 'NHWC' if data_format=='channels_last' else 'NCHW'
-        if data_format_ == 'NHWC':
-            inputs = tf.transpose(inputs, [0, 2, 3, 1])
-        ksize = int(6 * sigma + 1.)
-        x = tf.expand_dims(tf.range(ksize, delta=1, dtype=tf.float32), axis=1)
-        y = tf.transpose(x, [1, 0])
-        kernel_matrix = tf.exp(- ((x - ksize/2.) ** 2 + (y - ksize/2.) ** 2) / (2 * sigma ** 2))
-        #print(kernel_matrix)
-        kernel_filter = tf.reshape(kernel_matrix, [ksize, ksize, 1, 1])
-        kernel_filter = tf.tile(kernel_filter, [1, 1, inputs_filters, 1])
-        #kernel_filter = tf.transpose(kernel_filter, [1, 0, 2, 3])
-        outputs = tf.nn.depthwise_conv2d(inputs, kernel_filter, strides=[1, 1, 1, 1], padding='SAME', data_format=data_format_, name='blur')
-        if data_format_ == 'NHWC':
-            outputs = tf.transpose(outputs, [0, 3, 1, 2])
-        return outputs
-
-def keypoint_model_fn(features, labels, mode, params):
-    targets = labels['targets']
-    shape = labels['shape']
-    classid = labels['classid']
-    key_v = labels['key_v']
-    isvalid = labels['isvalid']
-    norm_value = labels['norm_value']
-
-    cur_batch_size = tf.shape(features)[0]
-    #features= tf.ones_like(features)
-
-    with tf.variable_scope(params['model_scope'], default_name=None, values=[features], reuse=tf.AUTO_REUSE):
-        pred_outputs = cpn.head_xt_cascaded_pyramid_net(features, config.class_num_joints[(params['model_scope'] if 'all' not in params['model_scope'] else '*')], params['heatmap_size'], (mode == tf.estimator.ModeKeys.TRAIN), params['data_format'])
-
-    if params['data_format'] == 'channels_last':
-        pred_outputs = [tf.transpose(pred_outputs[ind], [0, 3, 1, 2], name='outputs_trans_{}'.format(ind)) for ind in list(range(len(pred_outputs)))]
-
-    score_map = pred_outputs[-1]
-
-    pred_x, pred_y = get_keypoint(features, targets, score_map, params['heatmap_size'], params['train_image_size'], params['train_image_size'], (params['model_scope'] if 'all' not in params['model_scope'] else '*'), clip_at_zero=True, data_format=params['data_format'])
-
-    # this is important!!!
-    targets = 255. * targets
-    blur_list = [1., 1.37, 1.73, 2.4, None]#[1., 1.5, 2., 3., None]
-    #blur_list = [None, None, None, None, None]
-
-    targets_list = []
-    for sigma in blur_list:
-        if sigma is None:
-            targets_list.append(targets)
-        else:
-            # always channels first foe targets
-            targets_list.append(gaussian_blur(targets, config.class_num_joints[(params['model_scope'] if 'all' not in params['model_scope'] else '*')], sigma, params['data_format'], 'blur_{}'.format(sigma)))
-
-    # print(key_v)
-    #targets = tf.reshape(255.*tf.one_hot(tf.ones_like(key_v,tf.int64)*(params['heatmap_size']*params['heatmap_size']//2+params['heatmap_size']), params['heatmap_size']*params['heatmap_size']), [cur_batch_size,-1,params['heatmap_size'],params['heatmap_size']])
-    #norm_value = tf.ones_like(norm_value)
-    # score_map = tf.reshape(tf.one_hot(tf.ones_like(key_v,tf.int64)*(31*64+31), params['heatmap_size']*params['heatmap_size']), [cur_batch_size,-1,params['heatmap_size'],params['heatmap_size']])
-
-    #with tf.control_dependencies([pred_x, pred_y]):
-    ne_mertric = mertric.normalized_error(targets, score_map, norm_value, key_v, isvalid,
-                             cur_batch_size,
-                             config.class_num_joints[(params['model_scope'] if 'all' not in params['model_scope'] else '*')],
-                             params['heatmap_size'],
-                             params['train_image_size'])
-
-    # last_pred_mse = tf.metrics.mean_squared_error(score_map, targets,
-    #                             weights=1.0 / tf.cast(cur_batch_size, tf.float32),
-    #                             name='last_pred_mse')
-    # filter all invisible keypoint maybe better for this task
-    # all_visible = tf.logical_and(key_v>0, isvalid>0)
-    # targets_list = [tf.boolean_mask(targets_list[ind], all_visible) for ind in list(range(len(targets_list)))]
-    # pred_outputs = [tf.boolean_mask(pred_outputs[ind], all_visible, name='boolean_mask_{}'.format(ind)) for ind in list(range(len(pred_outputs)))]
-    all_visible = tf.expand_dims(tf.expand_dims(tf.cast(tf.logical_and(key_v>0, isvalid>0), tf.float32), axis=-1), axis=-1)
-    targets_list = [targets_list[ind] * all_visible for ind in list(range(len(targets_list)))]
-    pred_outputs = [pred_outputs[ind] * all_visible for ind in list(range(len(pred_outputs)))]
-
-    sq_diff = tf.reduce_sum(tf.squared_difference(targets, pred_outputs[-1]), axis=-1)
-    last_pred_mse = tf.metrics.mean_absolute_error(sq_diff, tf.zeros_like(sq_diff), name='last_pred_mse')
-
-    metrics = {'normalized_error': ne_mertric, 'last_pred_mse':last_pred_mse}
-    predictions = {'normalized_error': ne_mertric[1]}
-    ne_mertric = tf.identity(ne_mertric[1], name='ne_mertric')
-
-    base_learning_rate = params['learning_rate']
-    mse_loss_list = []
-    if params['use_ohkm']:
-        base_learning_rate = 1. * base_learning_rate
-        for pred_ind in list(range(len(pred_outputs) - 1)):
-            mse_loss_list.append(0.5 * tf.losses.mean_squared_error(targets_list[pred_ind], pred_outputs[pred_ind],
-                                weights=1.0 / tf.cast(cur_batch_size, tf.float32),
-                                scope='loss_{}'.format(pred_ind),
-                                loss_collection=None,#tf.GraphKeys.LOSSES,
-                                # mean all elements of all pixels in all batch
-                                reduction=tf.losses.Reduction.MEAN))# SUM, SUM_OVER_BATCH_SIZE, default mean by all elements
-
-        temp_loss = tf.reduce_mean(tf.reshape(tf.losses.mean_squared_error(targets_list[-1], pred_outputs[-1], weights=1.0, loss_collection=None, reduction=tf.losses.Reduction.NONE), [cur_batch_size, config.class_num_joints[(params['model_scope'] if 'all' not in params['model_scope'] else '*')], -1]), axis=-1)
-
-        num_topk = config.class_num_joints[(params['model_scope'] if 'all' not in params['model_scope'] else '*')] // 2
-        gather_col = tf.nn.top_k(temp_loss, k=num_topk, sorted=True)[1]
-        gather_row = tf.reshape(tf.tile(tf.reshape(tf.range(cur_batch_size), [-1, 1]), [1, num_topk]), [-1, 1])
-        gather_indcies = tf.stop_gradient(tf.stack([gather_row, tf.reshape(gather_col, [-1, 1])], axis=-1))
-
-        select_targets = tf.gather_nd(targets_list[-1], gather_indcies)
-        select_heatmap = tf.gather_nd(pred_outputs[-1], gather_indcies)
-
-        mse_loss_list.append(tf.losses.mean_squared_error(select_targets, select_heatmap,
-                                weights=1.0 / tf.cast(cur_batch_size, tf.float32),
-                                scope='loss_{}'.format(len(pred_outputs) - 1),
-                                loss_collection=None,#tf.GraphKeys.LOSSES,
-                                # mean all elements of all pixels in all batch
-                                reduction=tf.losses.Reduction.MEAN))
-    else:
-        for pred_ind in list(range(len(pred_outputs))):
-            mse_loss_list.append(tf.losses.mean_squared_error(targets_list[pred_ind], pred_outputs[pred_ind],
-                                weights=1.0 / tf.cast(cur_batch_size, tf.float32),
-                                scope='loss_{}'.format(pred_ind),
-                                loss_collection=None,#tf.GraphKeys.LOSSES,
-                                # mean all elements of all pixels in all batch
-                                reduction=tf.losses.Reduction.MEAN))# SUM, SUM_OVER_BATCH_SIZE, default mean by all elements
-
-    mse_loss = tf.multiply(params['mse_weight'], tf.add_n(mse_loss_list), name='mse_loss')
-    tf.summary.scalar('mse', mse_loss)
-    tf.losses.add_loss(mse_loss)
-
-    # bce_loss_list = []
-    # for pred_ind in list(range(len(pred_outputs))):
-    #     bce_loss_list.append(tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=pred_outputs[pred_ind], labels=targets_list[pred_ind]/255., name='loss_{}'.format(pred_ind)), name='loss_mean_{}'.format(pred_ind)))
-
-    # mse_loss = tf.multiply(params['mse_weight'] / params['num_stacks'], tf.add_n(bce_loss_list), name='mse_loss')
-    # tf.summary.scalar('mse', mse_loss)
-    # tf.losses.add_loss(mse_loss)
-
-    # Add weight decay to the loss. We exclude the batch norm variables because
-    # doing so leads to a small improvement in accuracy.
-    loss = mse_loss + params['weight_decay'] * tf.add_n([tf.nn.l2_loss(v) for v in tf.trainable_variables() if 'batch_normalization' not in v.name])
-    total_loss = tf.identity(loss, name='total_loss')
-    tf.summary.scalar('loss', total_loss)
-
-    if mode == tf.estimator.ModeKeys.EVAL:
-        return tf.estimator.EstimatorSpec(mode=mode, loss=loss, predictions=predictions, eval_metric_ops=metrics)
-
-    if mode == tf.estimator.ModeKeys.TRAIN:
-        global_step = tf.train.get_or_create_global_step()
-
-        lr_values = [params['warmup_learning_rate']] + [base_learning_rate * decay for decay in params['lr_decay_factors']]
-        learning_rate = tf.train.piecewise_constant(tf.cast(global_step, tf.int32),
-                                                    [params['warmup_steps']] + [int(float(ep)*params['steps_per_epoch']) for ep in params['decay_boundaries']],
-                                                    lr_values)
-        truncated_learning_rate = tf.maximum(learning_rate, tf.constant(params['end_learning_rate'], dtype=learning_rate.dtype), name='learning_rate')
-        tf.summary.scalar('lr', truncated_learning_rate)
-
-        optimizer = tf.train.MomentumOptimizer(learning_rate=truncated_learning_rate,
-                                                momentum=params['momentum'])
-
-        # Batch norm requires update_ops to be added as a train_op dependency.
-        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
-        with tf.control_dependencies(update_ops):
-            train_op = optimizer.minimize(loss, global_step)
-    else:
-        train_op = None
-
-    return tf.estimator.EstimatorSpec(
-                          mode=mode,
-                          predictions=predictions,
-                          loss=loss,
-                          train_op=train_op,
-                          eval_metric_ops=metrics,
-                          scaffold=tf.train.Scaffold(init_fn=train_helper.get_init_fn_for_scaffold_(params['checkpoint_path'], params['model_dir'], params['checkpoint_exclude_scopes'], params['model_scope'], params['checkpoint_model_scope'], params['ignore_missing_vars'])))
-
-def parse_comma_list(args):
-    return [float(s.strip()) for s in args.split(',')]
-
-def sub_loop(model_fn, model_scope, model_dir, run_config, train_epochs, epochs_per_eval, lr_decay_factors, decay_boundaries, checkpoint_path=None, checkpoint_exclude_scopes='', checkpoint_model_scope='', ignore_missing_vars=True):
-    steps_per_epoch = config.split_size[(model_scope if 'all' not in model_scope else '*')]['train'] // FLAGS.batch_size
-    fashionAI = tf.estimator.Estimator(
-        model_fn=model_fn, model_dir=model_dir, config=run_config,
-        params={
-            'checkpoint_path': checkpoint_path,
-            'model_dir': model_dir,
-            'checkpoint_exclude_scopes': checkpoint_exclude_scopes,
-            'model_scope': model_scope,
-            'checkpoint_model_scope': checkpoint_model_scope,
-            'ignore_missing_vars': ignore_missing_vars,
-            'train_image_size': FLAGS.train_image_size,
-            'heatmap_size': FLAGS.heatmap_size,
-            'data_format': FLAGS.data_format,
-            'steps_per_epoch': steps_per_epoch,
-            'use_ohkm': FLAGS.use_ohkm,
-            'batch_size': FLAGS.batch_size,
-            'weight_decay': FLAGS.weight_decay,
-            'mse_weight': FLAGS.mse_weight,
-            'momentum': FLAGS.momentum,
-            'learning_rate': FLAGS.learning_rate,
-            'end_learning_rate': FLAGS.end_learning_rate,
-            'warmup_learning_rate': FLAGS.warmup_learning_rate,
-            'warmup_steps': FLAGS.warmup_steps,
-            'decay_boundaries': parse_comma_list(decay_boundaries),
-            'lr_decay_factors': parse_comma_list(lr_decay_factors),
-        })
-
-    tf.gfile.MakeDirs(model_dir)
-    tf.logging.info('Starting to train model {}.'.format(model_scope))
-    for _ in range(train_epochs // epochs_per_eval):
-        tensors_to_log = {
-            'lr': 'learning_rate',
-            'loss': 'total_loss',
-            'mse': 'mse_loss',
-            'ne': 'ne_mertric',
-        }
-
-        logging_hook = tf.train.LoggingTensorHook(tensors=tensors_to_log, every_n_iter=FLAGS.log_every_n_steps, formatter=lambda dicts: '{}:'.format(model_scope) + (', '.join(['%s=%.6f' % (k, v) for k, v in dicts.items()])))
-
-        tf.logging.info('Starting a training cycle.')
-        fashionAI.train(input_fn=lambda : input_pipeline(True, model_scope, epochs_per_eval), hooks=[logging_hook], max_steps=(steps_per_epoch*train_epochs))
-
-        tf.logging.info('Starting to evaluate.')
-        eval_results = fashionAI.evaluate(input_fn=lambda : input_pipeline(False, model_scope, 1))
-        tf.logging.info(eval_results)
-    tf.logging.info('Finished model {}.'.format(model_scope))
-
-def main(_):
-    # Using the Winograd non-fused algorithms provides a small performance boost.
-    os.environ['TF_ENABLE_WINOGRAD_NONFUSED'] = '1'
-
-    gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction = FLAGS.gpu_memory_fraction)
-    sess_config = tf.ConfigProto(allow_soft_placement = True, log_device_placement = False, intra_op_parallelism_threads = FLAGS.num_cpu_threads, inter_op_parallelism_threads = FLAGS.num_cpu_threads, gpu_options = gpu_options)
-
-    # Set up a RunConfig to only save checkpoints once per training cycle.
-    run_config = tf.estimator.RunConfig().replace(
-                                        save_checkpoints_secs=FLAGS.save_checkpoints_secs).replace(
-                                        save_checkpoints_steps=None).replace(
-                                        save_summary_steps=FLAGS.save_summary_steps).replace(
-                                        keep_checkpoint_max=5).replace(
-                                        tf_random_seed=FLAGS.tf_random_seed).replace(
-                                        log_step_count_steps=FLAGS.log_every_n_steps).replace(
-                                        session_config=sess_config)
-
-    if FLAGS.seq_train:
-        detail_params = {
-            'all': {
-                'model_dir' : os.path.join(FLAGS.model_dir, 'all'),
-                'train_epochs': 6,
-                'epochs_per_eval': 4,
-                'lr_decay_factors': '1, 0.5, 0.1',
-                'decay_boundaries': '3, 4',
-                'model_scope': 'all',
-                'checkpoint_path': None,
-                'checkpoint_model_scope': '',
-                'checkpoint_exclude_scopes': '',
-                'ignore_missing_vars': True,
-            },
-            'blouse': {
-                'model_dir' : os.path.join(FLAGS.model_dir, 'blouse'),
-                'train_epochs': 50,
-                'epochs_per_eval': 30,
-                'lr_decay_factors': '1, 0.5, 0.1',
-                'decay_boundaries': '15, 30',
-                'model_scope': 'blouse',
-                'checkpoint_path': os.path.join(FLAGS.model_dir, 'all'),
-                'checkpoint_model_scope': 'all',
-                'checkpoint_exclude_scopes': 'blouse/feature_pyramid/conv_heatmap, blouse/global_net/conv_heatmap',
-                'ignore_missing_vars': True,
-            },
-            'dress': {
-                'model_dir' : os.path.join(FLAGS.model_dir, 'dress'),
-                'train_epochs': 50,
-                'epochs_per_eval': 30,
-                'lr_decay_factors': '1, 0.5, 0.1',
-                'decay_boundaries': '15, 30',
-                'model_scope': 'dress',
-                'checkpoint_path': os.path.join(FLAGS.model_dir, 'all'),
-                'checkpoint_model_scope': 'all',
-                'checkpoint_exclude_scopes': 'dress/feature_pyramid/conv_heatmap, dress/global_net/conv_heatmap',
-                'ignore_missing_vars': True,
-            },
-            'outwear': {
-                'model_dir' : os.path.join(FLAGS.model_dir, 'outwear'),
-                'train_epochs': 50,
-                'epochs_per_eval': 30,
-                'lr_decay_factors': '1, 0.5, 0.1',
-                'decay_boundaries': '15, 30',
-                'model_scope': 'outwear',
-                'checkpoint_path': os.path.join(FLAGS.model_dir, 'all'),
-                'checkpoint_model_scope': 'all',
-                'checkpoint_exclude_scopes': 'outwear/feature_pyramid/conv_heatmap, outwear/global_net/conv_heatmap',
-                'ignore_missing_vars': True,
-            },
-            'skirt': {
-                'model_dir' : os.path.join(FLAGS.model_dir, 'skirt'),
-                'train_epochs': 50,
-                'epochs_per_eval': 30,
-                'lr_decay_factors': '1, 0.5, 0.1',
-                'decay_boundaries': '15, 30',
-                'model_scope': 'skirt',
-                'checkpoint_path': os.path.join(FLAGS.model_dir, 'all'),
-                'checkpoint_model_scope': 'all',
-                'checkpoint_exclude_scopes': 'skirt/feature_pyramid/conv_heatmap, skirt/global_net/conv_heatmap',
-                'ignore_missing_vars': True,
-            },
-            'trousers': {
-                'model_dir' : os.path.join(FLAGS.model_dir, 'trousers'),
-                'train_epochs': 50,
-                'epochs_per_eval': 30,
-                'lr_decay_factors': '1, 0.5, 0.1',
-                'decay_boundaries': '15, 30',
-                'model_scope': 'trousers',
-                'checkpoint_path': os.path.join(FLAGS.model_dir, 'all'),
-                'checkpoint_model_scope': 'all',
-                'checkpoint_exclude_scopes': 'trousers/feature_pyramid/conv_heatmap, trousers/global_net/conv_heatmap',
-                'ignore_missing_vars': True,
-            },
-        }
-    else:
-        detail_params = {
-            'blouse': {
-                'model_dir' : os.path.join(FLAGS.model_dir, 'blouse'),
-                'train_epochs': 40,
-                'epochs_per_eval': 15,
-                'lr_decay_factors': '1, 0.5, 0.1',
-                'decay_boundaries': '10, 20',
-                'model_scope': 'blouse',
-                'checkpoint_path': os.path.join(FLAGS.data_dir, 'seresnext50') if FLAGS.run_on_cloud else os.path.join(FLAGS.checkpoint_path, 'seresnext50'),
-                'checkpoint_model_scope': '',
-                'checkpoint_exclude_scopes': 'blouse/feature_pyramid, blouse/global_net',
-                'ignore_missing_vars': True,
-            },
-            'dress': {
-                'model_dir' : os.path.join(FLAGS.model_dir, 'dress'),
-                'train_epochs': 40,
-                'epochs_per_eval': 15,
-                'lr_decay_factors': '1, 0.5, 0.1',
-                'decay_boundaries': '10, 20',
-                'model_scope': 'dress',
-                'checkpoint_path': os.path.join(FLAGS.data_dir, 'seresnext50') if FLAGS.run_on_cloud else os.path.join(FLAGS.checkpoint_path, 'seresnext50'),
-                'checkpoint_model_scope': '',
-                'checkpoint_exclude_scopes': 'dress/feature_pyramid, dress/global_net',
-                'ignore_missing_vars': True,
-            },
-            'outwear': {
-                'model_dir' : os.path.join(FLAGS.model_dir, 'outwear'),
-                'train_epochs': 40,
-                'epochs_per_eval': 15,
-                'lr_decay_factors': '1, 0.5, 0.1',
-                'decay_boundaries': '10, 20',
-                'model_scope': 'outwear',
-                'checkpoint_path': os.path.join(FLAGS.data_dir, 'seresnext50') if FLAGS.run_on_cloud else os.path.join(FLAGS.checkpoint_path, 'seresnext50'),
-                'checkpoint_model_scope': '',
-                'checkpoint_exclude_scopes': 'outwear/feature_pyramid, outwear/global_net',
-                'ignore_missing_vars': True,
-            },
-            'skirt': {
-                'model_dir' : os.path.join(FLAGS.model_dir, 'skirt'),
-                'train_epochs': 40,
-                'epochs_per_eval': 15,
-                'lr_decay_factors': '1, 0.5, 0.1',
-                'decay_boundaries': '10, 20',
-                'model_scope': 'skirt',
-                'checkpoint_path': os.path.join(FLAGS.data_dir, 'seresnext50') if FLAGS.run_on_cloud else os.path.join(FLAGS.checkpoint_path, 'seresnext50'),
-                'checkpoint_model_scope': '',
-                'checkpoint_exclude_scopes': 'skirt/feature_pyramid, skirt/global_net',
-                'ignore_missing_vars': True,
-            },
-            'trousers': {
-                'model_dir' : os.path.join(FLAGS.model_dir, 'trousers'),
-                'train_epochs': 40,
-                'epochs_per_eval': 15,
-                'lr_decay_factors': '1, 0.5, 0.1',
-                'decay_boundaries': '10, 20',
-                'model_scope': 'trousers',
-                'checkpoint_path': os.path.join(FLAGS.data_dir, 'seresnext50') if FLAGS.run_on_cloud else os.path.join(FLAGS.checkpoint_path, 'seresnext50'),
-                'checkpoint_model_scope': '',
-                'checkpoint_exclude_scopes': 'trousers/feature_pyramid, trousers/global_net',
-                'ignore_missing_vars': True,
-            },
-        }
-    model_to_train = [s.strip() for s in FLAGS.model_to_train.split(',')]
-
-    # import datetime
-    # import time
-    # while True:
-    #     time.sleep(1600)
-    #     if '8' in datetime.datetime.now().time().strftime('%H'):
-    #         break
-
-    for m in model_to_train:
-        sub_loop(keypoint_model_fn, m, detail_params[m]['model_dir'], run_config, detail_params[m]['train_epochs'], detail_params[m]['epochs_per_eval'], detail_params[m]['lr_decay_factors'], detail_params[m]['decay_boundaries'], detail_params[m]['checkpoint_path'], detail_params[m]['checkpoint_exclude_scopes'], detail_params[m]['checkpoint_model_scope'], detail_params[m]['ignore_missing_vars'])
-
-if __name__ == '__main__':
-  tf.logging.set_verbosity(tf.logging.INFO)
-  tf.app.run()
-
-# 0.045620328343892416
-# blouse: 0.04301484169892338
-# dress: 0.04210286934923448
-# outwear: 0.04589965355198962
-# skirt: 0.056256085847705986
-# trousers: 0.05055153938503116
diff --git a/train_large_xt_cpn_onebyone.py b/train_large_xt_cpn_onebyone.py
index 8524fa80..269701ee 100644
--- a/train_large_xt_cpn_onebyone.py
+++ b/train_large_xt_cpn_onebyone.py
@@ -62,20 +62,20 @@
     'save_summary_steps', 100,
     'The frequency with which summaries are saved, in seconds.')
 tf.app.flags.DEFINE_integer(
-    'save_checkpoints_secs', 3600,
-    'The frequency with which the model is saved, in seconds.')
+    'save_checkpoints_steps', 8000,
+    'The frequency with which the model is saved, in steps.')
 # model related configuration
 tf.app.flags.DEFINE_string(
-    'backbone', 'detxt', # 'detxt' or 'sext'
+    'backbone', 'sext', # 'detxt' or 'sext'
     'The backbone network to use for feature extraction.')
 tf.app.flags.DEFINE_integer(
-    'net_depth', 50,
+    'net_depth', 101,
     'The depth of the backbone network for the model to use.')
 tf.app.flags.DEFINE_integer(
-    'train_image_size', 512,
+    'train_image_size', 384,
     'The size of the input image for the model to use.')
 tf.app.flags.DEFINE_integer(
-    'heatmap_size', 128,
+    'heatmap_size', 96,
     'The size of the output heatmap of the model.')
 tf.app.flags.DEFINE_float(
     'heatmap_sigma', 1.,
@@ -84,7 +84,7 @@
     'bbox_border', 25.,
     'The nearest distance of the crop border to al keypoints.')
 tf.app.flags.DEFINE_integer(
-    'batch_size', 5,
+    'batch_size', 4,
     'Batch size for training and evaluation.')
 tf.app.flags.DEFINE_boolean(
     'use_ohkm', True,
@@ -105,9 +105,9 @@
 tf.app.flags.DEFINE_float(
     'momentum', 0.9,
     'The momentum for the MomentumOptimizer and RMSPropOptimizer.')
-tf.app.flags.DEFINE_float('learning_rate', 5e-5, 'Initial learning rate.')#1e-3
+tf.app.flags.DEFINE_float('learning_rate', 7e-5, 'Initial learning rate.')#1e-3
 tf.app.flags.DEFINE_float(
-    'end_learning_rate', 0.000001,
+    'end_learning_rate', 0.0000001,
     'The minimal end learning rate used by a polynomial decay learning rate.')
 tf.app.flags.DEFINE_float(
     'warmup_learning_rate', 0.00001,
@@ -142,12 +142,12 @@
 tf.app.flags.DEFINE_boolean(
     'run_on_cloud', True,
     'Wether we will train on cloud.')
+tf.app.flags.DEFINE_boolean(
+    'multi_gpu', True,
+    'Wether we will use multi-GPUs to train.')
 tf.app.flags.DEFINE_string(
     'cloud_checkpoint_path', 'seresnext{}',
     'The path to a checkpoint from which to fine-tune.')
-tf.app.flags.DEFINE_boolean(
-    'seq_train', False,
-    'Wether we will train a sequence model.')
 tf.app.flags.DEFINE_string(
     'model_to_train', 'blouse, dress, outwear, skirt, trousers', #'all, blouse, dress, outwear, skirt, trousers', 'skirt, dress, outwear, trousers',
     'The sub-model to train (comma-separated list).')
@@ -163,13 +163,16 @@ def validate_batch_size_for_multi_gpu(batch_size):
     directly. Multi-GPU support is currently experimental, however,
     so doing the work here until that feature is in place.
     """
+    if not FLAGS.multi_gpu:
+        return 0
+
     from tensorflow.python.client import device_lib
 
     local_device_protos = device_lib.list_local_devices()
     num_gpus = sum([1 for d in local_device_protos if d.device_type == 'GPU'])
     if not num_gpus:
         raise ValueError('Multi-GPU mode was specified, but no GPUs '
-                        'were found. To use CPU, run without --multi_gpu.')
+                        'were found. To use CPU, run without --multi_gpu=False.')
 
     remainder = batch_size % num_gpus
     if remainder:
@@ -180,7 +183,7 @@ def validate_batch_size_for_multi_gpu(batch_size):
         raise ValueError(err)
     return num_gpus
 
-def input_pipeline(is_training=True, model_scope=FLAGS.model_scope, num_epochs=FLAGS.epochs_per_eval):
+def input_pipeline(is_training=True, model_scope=FLAGS.model_scope, num_epochs=None):
     if 'all' in model_scope:
         lnorm_table = tf.contrib.lookup.HashTable(tf.contrib.lookup.KeyValueTensorInitializer(tf.constant(config.global_norm_key, dtype=tf.int64),
                                                                 tf.constant(config.global_norm_lvalues, dtype=tf.int64)), 0)
@@ -306,8 +309,6 @@ def keypoint_model_fn(features, labels, mode, params):
     with tf.variable_scope(params['model_scope'], default_name=None, values=[features], reuse=tf.AUTO_REUSE):
         pred_outputs = backbone_(features, config.class_num_joints[(params['model_scope'] if 'all' not in params['model_scope'] else '*')], params['heatmap_size'], (mode == tf.estimator.ModeKeys.TRAIN), params['data_format'], net_depth=params['net_depth'])
 
-    #print(pred_outputs)
-
     if params['data_format'] == 'channels_last':
         pred_outputs = [tf.transpose(pred_outputs[ind], [0, 3, 1, 2], name='outputs_trans_{}'.format(ind)) for ind in list(range(len(pred_outputs)))]
 
@@ -455,7 +456,7 @@ def sub_loop(model_fn, model_scope, model_dir, run_config, train_epochs, epochs_
     _replicate_model_fn = tf_replicate_model_fn.replicate_model_fn(model_fn, loss_reduction=tf.losses.Reduction.MEAN)
 
     fashionAI = tf.estimator.Estimator(
-        model_fn=_replicate_model_fn, model_dir=model_dir, config=run_config,
+        model_fn=_replicate_model_fn, model_dir=model_dir, config=run_config.replace(save_checkpoints_steps=2*steps_per_epoch),
         params={
             'checkpoint_path': checkpoint_path,
             'model_dir': model_dir,
@@ -510,8 +511,8 @@ def main(_):
 
     # Set up a RunConfig to only save checkpoints once per training cycle.
     run_config = tf.estimator.RunConfig().replace(
-                                        save_checkpoints_secs=FLAGS.save_checkpoints_secs).replace(
-                                        save_checkpoints_steps=None).replace(
+                                        save_checkpoints_secs=None).replace(
+                                        save_checkpoints_steps=FLAGS.save_checkpoints_steps).replace(
                                         save_summary_steps=FLAGS.save_summary_steps).replace(
                                         keep_checkpoint_max=5).replace(
                                         tf_random_seed=FLAGS.tf_random_seed).replace(
@@ -524,10 +525,10 @@ def main(_):
     detail_params = {
         'blouse': {
             'model_dir' : os.path.join(full_model_dir, 'blouse'),
-            'train_epochs': 40,
-            'epochs_per_eval': 16,
+            'train_epochs': 25,
+            'epochs_per_eval': 5,
             'lr_decay_factors': '1, 0.5, 0.1',
-            'decay_boundaries': '10, 20',
+            'decay_boundaries': '15, 20',
             'model_scope': 'blouse',
             'checkpoint_path': os.path.join(FLAGS.data_dir, FLAGS.cloud_checkpoint_path.format(FLAGS.net_depth)) if FLAGS.run_on_cloud else FLAGS.checkpoint_path.format(FLAGS.net_depth),
             'checkpoint_model_scope': '',
@@ -536,10 +537,10 @@ def main(_):
         },
         'dress': {
             'model_dir' : os.path.join(full_model_dir, 'dress'),
-            'train_epochs': 40,
-            'epochs_per_eval': 16,
+            'train_epochs': 25,
+            'epochs_per_eval': 5,
             'lr_decay_factors': '1, 0.5, 0.1',
-            'decay_boundaries': '10, 20',
+            'decay_boundaries': '15, 20',
             'model_scope': 'dress',
             'checkpoint_path': os.path.join(FLAGS.data_dir, FLAGS.cloud_checkpoint_path.format(FLAGS.net_depth)) if FLAGS.run_on_cloud else FLAGS.checkpoint_path.format(FLAGS.net_depth),
             'checkpoint_model_scope': '',
@@ -548,10 +549,10 @@ def main(_):
         },
         'outwear': {
             'model_dir' : os.path.join(full_model_dir, 'outwear'),
-            'train_epochs': 40,
-            'epochs_per_eval': 16,
+            'train_epochs': 25,
+            'epochs_per_eval': 5,
             'lr_decay_factors': '1, 0.5, 0.1',
-            'decay_boundaries': '10, 20',
+            'decay_boundaries': '15, 20',
             'model_scope': 'outwear',
             'checkpoint_path': os.path.join(FLAGS.data_dir, FLAGS.cloud_checkpoint_path.format(FLAGS.net_depth)) if FLAGS.run_on_cloud else FLAGS.checkpoint_path.format(FLAGS.net_depth),
             'checkpoint_model_scope': '',
@@ -560,10 +561,10 @@ def main(_):
         },
         'skirt': {
             'model_dir' : os.path.join(full_model_dir, 'skirt'),
-            'train_epochs': 40,
-            'epochs_per_eval': 16,
+            'train_epochs': 25,
+            'epochs_per_eval': 5,
             'lr_decay_factors': '1, 0.5, 0.1',
-            'decay_boundaries': '10, 20',
+            'decay_boundaries': '15, 20',
             'model_scope': 'skirt',
             'checkpoint_path': os.path.join(FLAGS.data_dir, FLAGS.cloud_checkpoint_path.format(FLAGS.net_depth)) if FLAGS.run_on_cloud else FLAGS.checkpoint_path.format(FLAGS.net_depth),
             'checkpoint_model_scope': '',
@@ -572,10 +573,10 @@ def main(_):
         },
         'trousers': {
             'model_dir' : os.path.join(full_model_dir, 'trousers'),
-            'train_epochs': 40,
-            'epochs_per_eval': 16,
+            'train_epochs': 25,
+            'epochs_per_eval': 5,
             'lr_decay_factors': '1, 0.5, 0.1',
-            'decay_boundaries': '10, 20',
+            'decay_boundaries': '15, 20',
             'model_scope': 'trousers',
             'checkpoint_path': os.path.join(FLAGS.data_dir, FLAGS.cloud_checkpoint_path.format(FLAGS.net_depth)) if FLAGS.run_on_cloud else FLAGS.checkpoint_path.format(FLAGS.net_depth),
             'checkpoint_model_scope': '',
diff --git a/train_senet_cpn_onebyone.py b/train_senet_cpn_onebyone.py
index c92177eb..7b00339f 100644
--- a/train_senet_cpn_onebyone.py
+++ b/train_senet_cpn_onebyone.py
@@ -564,8 +564,8 @@ def main(_):
         detail_params = {
             'blouse': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'blouse'),
-                'train_epochs': 40,
-                'epochs_per_eval': 15,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'blouse',
@@ -576,8 +576,8 @@ def main(_):
             },
             'dress': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'dress'),
-                'train_epochs': 40,
-                'epochs_per_eval': 15,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'dress',
@@ -588,8 +588,8 @@ def main(_):
             },
             'outwear': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'outwear'),
-                'train_epochs': 40,
-                'epochs_per_eval': 15,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'outwear',
@@ -600,8 +600,8 @@ def main(_):
             },
             'skirt': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'skirt'),
-                'train_epochs': 40,
-                'epochs_per_eval': 15,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'skirt',
@@ -612,8 +612,8 @@ def main(_):
             },
             'trousers': {
                 'model_dir' : os.path.join(FLAGS.model_dir, 'trousers'),
-                'train_epochs': 40,
-                'epochs_per_eval': 15,
+                'train_epochs': 28,
+                'epochs_per_eval': 7,
                 'lr_decay_factors': '1, 0.5, 0.1',
                 'decay_boundaries': '10, 20',
                 'model_scope': 'trousers',
@@ -625,13 +625,6 @@ def main(_):
         }
     model_to_train = [s.strip() for s in FLAGS.model_to_train.split(',')]
 
-    # import datetime
-    # import time
-    # while True:
-    #     time.sleep(1600)
-    #     if '8' in datetime.datetime.now().time().strftime('%H'):
-    #         break
-
     for m in model_to_train:
         sub_loop(keypoint_model_fn, m, detail_params[m]['model_dir'], run_config, detail_params[m]['train_epochs'], detail_params[m]['epochs_per_eval'], detail_params[m]['lr_decay_factors'], detail_params[m]['decay_boundaries'], detail_params[m]['checkpoint_path'], detail_params[m]['checkpoint_exclude_scopes'], detail_params[m]['checkpoint_model_scope'], detail_params[m]['ignore_missing_vars'])
 
diff --git a/utility/mertric.py b/utility/mertric.py
index 06309e12..aa6b4e0c 100644
--- a/utility/mertric.py
+++ b/utility/mertric.py
@@ -89,8 +89,8 @@ def normalized_error(targets, predictions, norm_value, visible, isvalid,
     dist = tf.boolean_mask(dist, tf.logical_and(visible>0, isvalid>0))
     #dist = dist * tf.cast(tf.logical_and(visible>0, isvalid>0), tf.float32)
 
-    update_total_op = state_ops.assign(total, math_ops.reduce_sum(dist))#assign_add
-    update_count_op = state_ops.assign(count, tf.cast(tf.shape(dist)[0], tf.float32))#assign_add
+    update_total_op = state_ops.assign(total, math_ops.reduce_sum(dist))#assign_add #assign
+    update_count_op = state_ops.assign(count, tf.cast(tf.shape(dist)[0], tf.float32))#assign_add #assign
 
     mean_t = _safe_div(total, count, 'value')
     update_op = _safe_div(update_total_op, update_count_op, 'update_op')