At the beginning of training, the GPU memory occupied was very large. #3596

gjxgjxgjxgjx · 2020-08-21T09:03:54Z

At the beginning of training, the GPU memory occupied was very large, 10G, and it became 7G when the training was stable, which made me unable to set a larger batch. Is there any solution? My GPU's memory is 11G.

yhcao6 · 2020-08-21T10:51:19Z

Could you provide your training config and command? I will have a try.

gjxgjxgjxgjx · 2020-08-21T13:20:57Z

I modified the neck network and adding some convolutions to FPN. Other configurations are the same as FCOS.
@yhcao6

base = [
'../base/default_runtime.py'
]

model settings

find_unused_parameters=True
model = dict(
type='FCOS',
pretrained='checkpoints/state_dict_93.98.pth',
backbone=dict(
type='GhostNet',
),
neck=dict(
type='GhostFPN',
in_channels=[160, 160, 160, 160],
out_channels=256,
start_level=1,
add_extra_convs=True,
extra_convs_on_inputs=False, # use P5
num_outs=5,
relu_before_extra_convs=True),
bbox_head=dict(
type='FCOSHead',
num_classes=20,
in_channels=256,
stacked_convs=4,
feat_channels=256,
strides=[8, 16, 32, 64, 128],
norm_cfg=None,
loss_cls=dict(
type='FocalLoss',
use_sigmoid=True,
gamma=2.0,
alpha=0.25,
loss_weight=1.0),
loss_bbox=dict(type='IoULoss', loss_weight=1.0),
loss_centerness=dict(
type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0)))

training and testing settings

train_cfg = dict(
assigner=dict(
type='MaxIoUAssigner',
pos_iou_thr=0.5,
neg_iou_thr=0.4,
min_pos_iou=0,
ignore_iof_thr=-1,
), #
allowed_border=-1,
pos_weight=-1,
debug=False)
test_cfg = dict(
nms_pre=1000,
min_bbox_size=0,
score_thr=0.05,
nms=dict(type='nms', iou_thr=0.5),
max_per_img=100)

dataset settings

dataset_type = 'VOCDataset'
data_root = 'data/VOCdevkit/'
img_norm_cfg = dict(
mean=[102.9801, 115.9465, 122.7717], std=[1.0, 1.0, 1.0], to_rgb=False)
train_pipeline = [
dict(type='LoadImageFromFile'),
dict(type='LoadAnnotations', with_bbox=True),
dict(type='Resize', img_scale=(1000, 600), keep_ratio=True),
dict(type='RandomFlip', flip_ratio=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle'),
dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
dict(type='LoadImageFromFile'),
dict(
type='MultiScaleFlipAug',
img_scale=(1000, 600),
flip=False,
transforms=[
dict(type='Resize', keep_ratio=True),
dict(type='RandomFlip'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='ImageToTensor', keys=['img']),
dict(type='Collect', keys=['img']),
])
]

data = dict(
imgs_per_gpu=1,
workers_per_gpu=2,
train=dict(
type=dataset_type,
ann_file=[
data_root + 'VOC2007/ImageSets/Main/trainval.txt',
data_root + 'VOC2012/ImageSets/Main/trainval.txt'
],
img_prefix=[data_root + 'VOC2007/', data_root + 'VOC2012/'],
pipeline=train_pipeline),
val=dict(
type=dataset_type,
ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
img_prefix=data_root+'VOC2007/',
pipeline=test_pipeline),
test=dict(
type=dataset_type,
ann_file=data_root + 'VOC2007/ImageSets/Main/test.txt',
img_prefix=data_root+'VOC2007/',
pipeline=test_pipeline))

optimizer

optimizer = dict(type='SGD',lr=0.002, momentum=0.9, weight_decay=0.0001,
paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.)
)
optimizer_config = dict(
grad_clip=None)

learning policyss

lr_config = dict(policy='step',
warmup='constant',
warmup_iters=500,
warmup_ratio=0.3333333333333333,
step=[14,24,30])

runtime setting

total_epochs = 36

hellock assigned yhcao6 Aug 26, 2020

hhaAndroid added the community discussion label Apr 19, 2021

FANGAreNotGnu pushed a commit to FANGAreNotGnu/mmdetection that referenced this issue Oct 23, 2023

Add predict_children (open-mmlab#3596)

b54ad05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

At the beginning of training, the GPU memory occupied was very large. #3596

At the beginning of training, the GPU memory occupied was very large. #3596

gjxgjxgjxgjx commented Aug 21, 2020

yhcao6 commented Aug 21, 2020

gjxgjxgjxgjx commented Aug 21, 2020

At the beginning of training, the GPU memory occupied was very large. #3596

At the beginning of training, the GPU memory occupied was very large. #3596

Comments

gjxgjxgjxgjx commented Aug 21, 2020

yhcao6 commented Aug 21, 2020

gjxgjxgjxgjx commented Aug 21, 2020

model settings

training and testing settings

dataset settings

optimizer

learning policyss

runtime setting