Skip to content

CFG Parameters in the different layers

Alexey edited this page Jul 9, 2020 · 5 revisions

CFG-Parameters in the different layers

Image processing [N x C x H x W]:

  • [convolutional] - convolutional layer

    • batch_normalize=1 - if 1 - will be used batch-normalization, if 0 will not (0 by default)

    • filters=64 - number of kernel-filters (1 by default)

    • size=3 - kernel_size of filter (1 by default)

    • groups = 32 - number of groups for grouped-convolutional (depth-wise) (1 by default)

    • stride=1 - stride (offset step) of kernel filter (1 by default)

    • padding=1 - size of padding (0 by default)

    • pad=1 - if 1 will be used padding = size/2, if 0 the will be used parameter padding= (0 by default)

    • dilation=1 - size of dilation (1 by default)

    • activation=leaky - activation function after convolution: logistic (by default), loggy, relu, elu, selu, relie, plse, hardtan, lhtan, linear, ramp, leaky, tanh, stair, relu6, swish, mish


  • [activation] - separate activation layer

    • activation=leaky - activation function: linear (by default), loggy, relu, elu, selu, relie, plse, hardtan, lhtan, linear, ramp, leaky, tanh, stair

  • [batchnorm] - separate Batch-normalization layer

  • [maxpool] - max-pooling layer (the maximum value)

    • size=2 - size of max-pooling kernel

    • stride=2 - stirde (offset step) of max-pooling kernel


  • [avgpool] - average pooling layer input W x H x C -> output 1 x 1 x C

  • [shortcut] - residual connection (ResNet)

    • from=-3,-5 - relative layer numbers, preforms element-wise adding of several layers: previous-layer and layers specified in from= parameter

    • weights_type=per_feature - will be used weights for shortcut y[i] = w1*layer1[i] + w2*layer2[i] ...

      • per_feature - 1 weights per layer/feature
      • per_channel - 1 weights per channel
      • none - weights will not be used (by default)
    • weights_normalization=softmax - will be used weights normalization

      • softmax - softmax normalization
      • relu - relu normalization
      • none - without weights normalization - unbound weights (by default)
    • activation=linear - activation function after shortcut/residual connection (linear by default)


  • [upsample] - upsample layer (increase W x H resolution of input by duplicating elements)

    • stride=2 - factor for increasing both Width and Height (new_w = w*stride, new_h = h*stride)

  • [scale_channels] - scales channels (SE: squeeze-and-excitation blocks) or (ASFF: adaptively spatial feature fusion) -it multiplies elements of one layer by elements of another layer

    • from=-3 - relative layer number, performs multiplication of all elements of channel N from layer -3, by one element of channel N from the previous layer -1 (i.e. for(int i=0; i < b*c*h*w; ++i) output[i] = from_layer[i] * previous_layer[i/(w*h)]; )

    • scale_wh=0 - SE-layer (previous layer 1x1xC), scale_wh=1 - ASFF-layer (previous layer WxHx1)

    • activation=linear - activation function after scale_channels-layer (linear by default)


  • [sam] - Spatial Attention Module (SAM) - it multiplies elements of one layer by elements of another layer

    • from=-3 - relative layer number (this and previous layers should be the same size WxHxC)

  • [reorg3d] - reorg layer (resize W x H x C)

    • stride=2 - if reverse=0 input will be resized to W/2 x H/2 x C4, if reverse=1thenW2 x H*2 x C/4`, (1 by default)

    • reverse=1 - if 0(by default) then decrease WxH, if1thenincrease WxH (0 by default)


  • [reorg] - OLD reorg layer from Yolo v2 - has incorrect logic (resize W x H x C) - depracated

    • stride=2 - if reverse=0 input will be resized to W/2 x H/2 x C4, if reverse=1thenW2 x H*2 x C/4`, (1 by default)

    • reverse=1 - if 0(by default) then decrease WxH, if1thenincrease WxH (0 by default)


  • [route] - concatenation layer, Concat for several input-layers, or Identity for one input-layer

    • layers = -1, 61 - layers that will be concatenated, output: W x H x C_layer_1 + C_layer_2
      • if index < 0, then it is relative layer number (-1 means previous layer)
      • if index >= 0, then it is absolute layer number

  • [yolo] - detection layer for Yolo v3 / v4

    • mask = 3,4,5 - indexes of anchors which are used in this [yolo]-layer

    • anchors = 10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326 - initial sizes if bounded_boxes that will be adjusted

    • num=9 - total number of anchors

    • classes=80 - number of classes of objects which can be detected

    • ignore_thresh = .7 - keeps duplicated detections if IoU(detect, truth) > ignore_thresh, which will be fused during NMS (is used for training only)

    • truth_thresh = 1 - adjusts duplicated detections if IoU(detect, truth) > truth_thresh, which will be fused during NMS (is used for training only)

    • jitter=.3 - randomly crops and resizes images with changing aspect ratio from x(1 - 2*jitter) to x(1 + 2*jitter) (data augmentation parameter is used only from the last layer)

    • random=1 - randomly resizes network for each 10 iterations from 1/1.4 to 1.4(data augmentation parameter is used only from the last layer)

    • resize=1.5 - randomly resizes image in range: 1/1.5 - 1.5x

    • max=200 - maximum number of objects per image during training

    • counters_per_class=100,10,1000 - number of objects per class in Training dataset to eliminate the imbalance

    • label_smooth_eps=0.1 - label smoothing

    • scale_x_y=1.05 - eliminate grid sensitivity

    • iou_thresh=0.2 - use many anchors per object if IoU(Obj, Anchor) > 0.2

    • iou_loss=mse - IoU-loss: mse, giou, diou, ciou

    • iou_normalizer=0.07 - normalizer for delta-IoU

    • cls_normalizer=1.0 - normalizer for delta-Objectness

    • max_delta=5 - limits delta for each entry

  • parameters for tracking if contrastive learning is used:

    • track_history_size = 5 - find similiraty on 5 previous frames [1 - inf)

    • sim_thresh = 0.8 - similarity threshold to consider an object on two frames the same (0.0 to 1.0)

    • dets_for_show = 2 - number of frames with this object before Show it [0 - inf)

    • dets_for_track = 8 - number of frames with this object before Track it [0 - inf)

    • track_ciou_norm = 0.3 - take into account CIoU (0.0 to 1.0)


  • [crnn] - convolutional RNN-layer (recurrent)

    • batch_normalize=1 - if 1 - will be used batch-normalization, if 0 will not (0 by default)

    • size=1 - convolutional kernel_size of filter (1 by default)

    • pad=0 - if 1 will be used padding = size/2, if 0 the will be used parameter padding= (0 by default)

    • output = 1024 - number of kernel-filters in one output convolutional layer (1 by default)

    • hidden=1024 - number of kernel-filters in two (input and hidden) convolutional layers (1 by default)

    • activation=leaky - activation function for each of 3 convolutional-layers in the [crnn]-layer (logistic by default)


  • [conv_lstm] - convolutional LSTM-layer (recurrent)

    • batch_normalize=1 - if 1 - will be used batch-normalization, if 0 will not (0 by default)

    • size=3 - convolutional kernel_size of filter (1 by default)

    • padding=1 - convolutional size of padding (0 by default)

    • pad=1 - if 1 will be used padding = size/2, if 0 the will be used parameter padding= (by default)

    • stride=1 - convolutional stride (offset step) of kernel filter (1 by default)

    • dilation=1 - convolutional size of dilation (1 by default)

    • output=256 - number of kernel-filters in each of 8 or 11 convolutional layers (1 by default)

    • groups=4 - number of groups for grouped-convolutional (depth-wise) (1 by default)

    • state_constrain=512 - constrains LSTM-state values [-512; +512] after each inference (time_steps*32 by default)

    • peephole=0 - if 1 then will be used Peephole (additional 3 conv-layers), if 0 will not (1 by default)

    • bottleneck=0 - if 1 then will be used reduced optimal versionn of conv-lstm layer

    • activation=leaky - activation function for each of 8 or 11 convolutional-layers in the [conv_lstm]-layer (linear by default)

    • lstm_activation=tanh - activation for G (gate: g = tanh(wg + ug)) and C (memory cell: h = o * tanh(c))

Detailed-architecture-of-the-peephole-LSTM


Free-form data processing [Inputs]:

  • [connected] - fully connected layer
    • output=256 - number of outputs (1 by default), so number of connections is equal to inputs*outputs
    • activation=leaky - activation after layer (logistic by default)

  • [dropout] - dropout layer
    • probability=0.5 - dropout probability - what part of inputs will be zeroed (0.5 = 50% by default)

    • dropblock=1 - use as DropBlock

    • dropblock_size_abs=7 - size of DropBlock in pixels 7x7


  • [softmax] - SoftMax CE (cross entropy) layer - Categorical cross-entropy for multi-class classification

  • [contrastive] - Contrastive loss layer for Supervised and Unsupervised learning (should be set [net] contrastive=1 and optionally [net] unsupervised=1)

    • yolo_layer= -2 - index (absolute or relative) of reletated [yolo] layer

    • classes=1000 - number of classes

    • temperature=1.0 - temperature

    • cls_normalizer=1.0 - normalizer for delta-Objectness

    • max_delta=5 - limits delta for each entry


  • [cost] - cost layer calculates (linear)Delta and (squared)Loss
    • type=sse - cost type: sse (L2), masked, smooth (smooth-L1) (SSE by default)

  • [rnn] - fully connected RNN-layer (recurrent)
    • batch_normalize=1 - if 1 - will be used batch-normalization, if 0 will not (0 by default)
    • output = 1024 - number of outputs in one connected layer (1 by default)
    • hidden=1024 - number of outputs in two (input and hidden) connected layers (1 by default)
    • activation=leaky - activation after layer (logistic by default)

  • [lstm] - fully connected LSTM-layer (recurrent)
    • batch_normalize=1 - if 1 - will be used batch-normalization, if 0 will not (0 by default)
    • output = 1024 - number of outputs in all connected layers (1 by default)

  • [gru] - fully connected GRU-layer (recurrent)
    • batch_normalize=1 - if 1 - will be used batch-normalization, if 0 will not (0 by default)
    • output = 1024 - number of outputs in all connected layers (1 by default)