Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consistent error: caught signal (SIGSEGV) #383

Closed
yunfeilu-dev opened this issue Jul 15, 2022 · 9 comments
Closed

consistent error: caught signal (SIGSEGV) #383

yunfeilu-dev opened this issue Jul 15, 2022 · 9 comments

Comments

@yunfeilu-dev
Copy link

Hi all,

  • Fluentbit version: amazon/aws-for-fluent-bit:2.21.0
  • Deployment mode: Daemon set
  • Programming language: .NET
  • Log format: JSON

I have an issue when running fluentbit in my EKS cluster. It consistently shows caught signal(SIGSEGV) error, and I checked the error code from fluent-bit container is 139.

Here is the error log from fluent-bit pod:

[2022/07/14 13:15:55] [debug] [filter:modify:modify.4] Input map size 5 elements, output map size 6 elements
[2022/07/14 13:15:55] [debug] [input:tail:tail.1] inode=54655592 events: IN_MODIFY
[2022/07/14 13:15:55] [debug] [filter:modify:modify.4] Input map size 6 elements, output map size 7 elements
[2022/07/14 13:15:56] [debug] [upstream] KA connection #29 to [kinesis.ap-south-1.amazonaws.com:443](http://kinesis.ap-south-1.amazonaws.com:443/) (http://[kinesis.ap-south-1.amazonaws.com:443](http://kinesis.ap-south-1.amazonaws.com:443/)/) has been disconnected by the remote service
[2022/07/14 13:15:56] [debug] [socket] could not validate socket status for #29 (don't worry)
[2022/07/14 13:15:58] [debug] [input:tail:tail.0] scanning path /var/log/containers/saicappsaibotgatewayhttpapihost*default*
[2022/07/14 13:15:58] [debug] [input:tail:tail.0] scan_blog add(): dismissed: /var/log/containers/saicappsaibotgatewayhttpapihost-784cf7bc56-4qk8z_default_saicappsaibotgatewayhttpapihost-f3752e7b7eff45466ccc9723407dab2efb6bf2db8cc262a3e056ecd80c04c543.log, inode 157463288
[2022/07/14 13:15:58] [debug] [input:tail:tail.0] 0 new files found on path '/var/log/containers/saicappsaibotgatewayhttpapihost*default*'
[2022/07/14 13:15:58] [debug] [storage] [cio file] synced at: tail.1/1-1657804554.722565011.flb
[2022/07/14 13:15:58] [debug] [task] created task=0x7f3c3fd56720 id=0 OK
[2022/07/14 13:15:58] [debug] [output:kinesis_streams:kinesis_streams.1] Sending 33 records
[2022/07/14 13:15:58] [debug] [output:kinesis_streams:kinesis_streams.1] Sending log records to stream LogHub-EKS-Cluster-PodLog-Pipeline-2a1cd-Stream790BDEE4-QMoFe5ACaOsa
[2022/07/14 13:15:58] [debug] [input:tail:tail.1] scanning path /var/log/containers/saicappsaibotgatewayidentityserver*default*
[2022/07/14 13:15:58] [debug] [input:tail:tail.1] scan_blog add(): dismissed: /var/log/containers/saicappsaibotgatewayidentityserver-6596dc44d9-9t99g_default_saicappsaibotgatewayidentityserver-8ac181cc0f5612ec9f3fe3445215a0fdfff5b42eb20826e771ce3e86da44bb2a.log, inode 54655592
[2022/07/14 13:15:58] [debug] [input:tail:tail.1] 0 new files found on path '/var/log/containers/saicappsaibotgatewayidentityserver*default*'
[2022/07/14 13:15:58] [debug] [http_client] not using http_proxy for header
[2022/07/14 13:15:58] [debug] [aws_credentials] Requesting credentials from the EKS provider..
[2022/07/14 13:15:58] [debug] [upstream] KA connection #28 to kinesis.ap-south-1.amazonaws.com:443 (http://kinesis.ap-south-1.amazonaws.com:443/) is now available
[2022/07/14 13:15:58] [debug] [output:kinesis_streams:kinesis_streams.1] PutRecords http status=200
[2022/07/14 13:15:58] [debug] [output:kinesis_streams:kinesis_streams.1] Sent events to LogHub-EKS-Cluster-PodLog-Pipeline-2a1cd-Stream790BDEE4-QMoFe5ACaOsa
[2022/07/14 13:15:58] [debug] [output:kinesis_streams:kinesis_streams.1] Processed 33 records, sent 33 to LogHub-EKS-Cluster-PodLog-Pipeline-2a1cd-Stream790BDEE4-QMoFe5ACaOsa
[2022/07/14 13:15:58] [debug] [out coro] cb_destroy coro_id=1
[2022/07/14 13:15:58] [debug] [task] destroy task=0x7f3c3fd56720 (task_id=0)
[2022/07/14 13:16:00] [debug] [input:tail:tail.0] inode=157463288 events: IN_MODIFY
[2022/07/14 13:16:00] [engine] caught signal (SIGSEGV)

Here is the yaml for fluent-bit and its config

---
apiVersion: v1
kind: Namespace
metadata:
  name: logging

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluent-bit
  namespace: logging
  annotations:
    eks.amazonaws.com/role-arn: arn:aws:iam::123456789012:role/LogHub-EKS-LogAgent-Role-1415ebe4f6a84d3688c2241e48d40911

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluent-bit-read
rules:
  - nonResourceURLs:
      - /metrics
    verbs:
      - get
  - apiGroups: [""]
    resources:
      - namespaces
      - pods
      - pods/logs
      - nodes
      - nodes/proxy
    verbs: ["get", "list", "watch"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: fluent-bit-read
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: fluent-bit-read
subjects:
- kind: ServiceAccount
  name: fluent-bit
  namespace: logging

---
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
  labels:
    k8s-app: fluent-bit
data:
  # Configuration files: server, input, filters and output
  # ======================================================
  uniform-time-format.lua: |
    function cb_print(tag, timestamp, record)
        record['time'] = string.format(
            '%s.%sZ',
            os.date('%Y-%m-%dT%H:%M:%S', timestamp['sec']),
            string.sub(string.format('%06d', timestamp['nsec']), 1, 6)
        )
        return 2, timestamp, record
    end
    
  fluent-bit.conf: |
    [SERVICE]
        Flush                       5
        Daemon                      off
        Log_level                   Debug
        Http_server                 On
        Http_listen                 0.0.0.0
        Http_port                   2022
        Parsers_File                parsers.conf
        storage.path                /var/fluent-bit/state/flb-storage/
        storage.sync                normal
        storage.checksum            off
        storage.backlog.mem_limit   5M
    
    [INPUT]
        Name                tail
        Tag                 kube.var.log.containers.5aa94922-a0e1-4aaa-9adc-1a3ef6957fb7.cc48429c-2035-471d-a7d4-e033187373b5.*
        Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
        Path                /var/log/containers/appsaibotgatewayhttpapihost*default*
        Path_Key            file_name
        Parser              docker
        DB                  /var/fluent-bit/state/flb_container-5aa94922-a0e1-4aaa-9adc-1a3ef6957fb7.cc48429c-2035-471d-a7d4-e033187373b5.db
        DB.locking          True
        Docker_Mode         On
        
        Mem_Buf_Limit       50MB
        Refresh_Interval    10
        Rotate_Wait         30
        storage.type        filesystem
        Read_from_Head      True

    [OUTPUT]
        Name                kinesis_streams
        Match               kube.var.log.containers.5aa94922-a0e1-4aaa-9adc-1a3ef6957fb7.cc48429c-2035-471d-a7d4-e033187373b5.*
        Region              ap-south-1
        Stream              LogHub-EKS-Cluster-PodLog-Pipeline-cc484-Stream790BDEE4-t1L0rHxo66pM
        Retry_Limit         False



    [FILTER]
        Name                parser
        Match               kube.var.log.containers.5aa94922-a0e1-4aaa-9adc-1a3ef6957fb7.cc48429c-2035-471d-a7d4-e033187373b5.*
        Key_Name            log
        Parser              json_5aa94922-a0e1-4aaa-9adc-1a3ef6957fb7

    [FILTER]
        Name                kubernetes
        Match               kube.var.log.containers.5aa94922-a0e1-4aaa-9adc-1a3ef6957fb7.cc48429c-2035-471d-a7d4-e033187373b5.*
        Kube_Tag_Prefix     kube.var.log.containers.5aa94922-a0e1-4aaa-9adc-1a3ef6957fb7.cc48429c-2035-471d-a7d4-e033187373b5.

        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token

        Merge_Log           On
        Merge_Log_Trim      On
        Merge_Log_Key       log_processed

        Buffer_Size         512k
        Use_Kubelet         True
        Kubelet_Port        10250

    [INPUT]
        Name                tail
        Tag                 kube.var.log.containers.a9622cf2-c350-4796-ad90-7d74963ed565.2a1cd39a-b9b1-40ca-a10d-3d2a74849ce2.*
        Exclude_Path        /var/log/containers/cloudwatch-agent*, /var/log/containers/fluent-bit*, /var/log/containers/aws-node*, /var/log/containers/kube-proxy*
        Path                /var/log/containers/appsaibotgatewayidentityserver*default*
        Path_Key            file_name
        Parser              docker
        DB                  /var/fluent-bit/state/flb_container-a9622cf2-c350-4796-ad90-7d74963ed565.2a1cd39a-b9b1-40ca-a10d-3d2a74849ce2.db
        DB.locking          True
        Docker_Mode         On
        
        Mem_Buf_Limit       50MB
        Refresh_Interval    10
        Rotate_Wait         30
        storage.type        filesystem
        Read_from_Head      True

    [OUTPUT]
        Name                kinesis_streams
        Match               kube.var.log.containers.a9622cf2-c350-4796-ad90-7d74963ed565.2a1cd39a-b9b1-40ca-a10d-3d2a74849ce2.*
        Region              ap-south-1
        Stream              LogHub-EKS-Cluster-PodLog-Pipeline-2a1cd-Stream790BDEE4-QMoFe5ACaOsa
        Retry_Limit         False



    [FILTER]
        Name                parser
        Match               kube.var.log.containers.a9622cf2-c350-4796-ad90-7d74963ed565.2a1cd39a-b9b1-40ca-a10d-3d2a74849ce2.*
        Key_Name            log
        Parser              json_a9622cf2-c350-4796-ad90-7d74963ed565

    [FILTER]
        Name                kubernetes
        Match               kube.var.log.containers.a9622cf2-c350-4796-ad90-7d74963ed565.2a1cd39a-b9b1-40ca-a10d-3d2a74849ce2.*
        Kube_Tag_Prefix     kube.var.log.containers.a9622cf2-c350-4796-ad90-7d74963ed565.2a1cd39a-b9b1-40ca-a10d-3d2a74849ce2.

        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token

        Merge_Log           On
        Merge_Log_Trim      On
        Merge_Log_Key       log_processed

        Buffer_Size         512k
        Use_Kubelet         True
        Kubelet_Port        10250


    [FILTER]
        Name                modify
        Match               *
        Set                 cluster ${CLUSTER_NAME}

    [FILTER]
        Name                lua
        Match               *
        time_as_table       on
        script              uniform-time-format.lua
        call                cb_print
    

  parsers.conf: |
    [PARSER]
        Name   json
        Format json
        Time_Key time
        Time_Format %Y-%m-%dT%H:%M:%S.%LZ

    [PARSER]
        Name         docker
        Format       json
        Time_Key     container_log_time
        Time_Format  %Y-%m-%dT%H:%M:%S.%LZ
        Time_Keep    On

    [PARSER]
        Name        cri_regex
        Format      regex
        Regex       ^(?<container_log_time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$      
        Time_Key    container_log_time
        Time_Format %Y-%m-%dT%H:%M:%S.%LZ
        Time_Keep    On        


    [PARSER]
        Name        json_5aa94922-a0e1-4aaa-9adc-1a3ef6957fb7
        Format      json
        
        Time_Key    time
        Time_Format ""

    [PARSER]
        Name        json_a9622cf2-c350-4796-ad90-7d74963ed565
        Format      json
        
        Time_Key    time
        Time_Format ""



---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  namespace: logging
  labels:
    app.kubernetes.io/name: fluent-bit-logging
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "2022"
    prometheus.io/path: /api/v1/metrics/prometheus
spec:
  selector:
    matchLabels:
      app: fluent-bit-logging
  updateStrategy:
        type: RollingUpdate    
  template:
    metadata:
      labels:
        app: fluent-bit-logging
        version: v1
        kubernetes.io/cluster-service: "true"
    spec:
      containers:
      - name: fluent-bit
        image: amazon/aws-for-fluent-bit:2.21.0
        imagePullPolicy: Always
        env:
          - name: CLUSTER_NAME
            value: "voice-eks-cluster-2022-Jan-05"
        ports:
          - containerPort: 2022
        # command: ["/fluent-bit/bin/fluent-bit", "-c"]
        # args:
        # - /fluent-bit/etc/fluent-bit.conf
        resources:
            limits:
              memory: 500Mi
            requests:
              cpu: 100m
              memory: 300Mi
        volumeMounts:
        #reference volume name
        - name: fluentbitstate
          mountPath: /var/fluent-bit/state  
        - name: var-log
          mountPath: /var/log
        - name: var-lib-docker-containers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/    
          readOnly: true
      terminationGracePeriodSeconds: 10
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      volumes:
      #define volume name  
      - name: fluentbitstate
        hostPath:
          path: /var/fluent-bit/state    
      - name: var-log
        hostPath:
          path: /var/log
      - name: var-lib-docker-containers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluent-bit-config
        configMap:
          name: fluent-bit-config 
      serviceAccountName: fluent-bit
      tolerations:
      - key: node-role.kubernetes.io/master
        operator: Exists
        effect: NoSchedule
      - operator: "Exists"
        effect: "NoExecute"
      - operator: "Exists"
        effect: "NoSchedule"

I have modified the memory buffer in fluent-bit config and container request memory but none of them helped.

@PettitWesley
Copy link
Contributor

I suspect this is caused by running a version that is older and must have a bug, we do not have reports of SIGSEGV with our latest and current stable tags: https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#downgrading-or-upgrading-your-version

@PettitWesley
Copy link
Contributor

Fluentbit version: amazon/aws-for-fluent-bit:2.21.0

The release you are on is from October, it is very old:
https://github.com/aws/aws-for-fluent-bit/releases/tag/v2.21.0

It has known bugs: #269

As noted here, you can search old issues by version: https://github.com/aws/aws-for-fluent-bit/blob/mainline/troubleshooting/debugging.md#searching-old-issues

@PettitWesley
Copy link
Contributor

We believe this crash report is likely the same as is described here: fluent/fluent-bit#5753 (comment)

The fix will be released in 2.28.1 #418

@aws-patrickc
Copy link

This bug seem to be persistent on the latest tag (2.29.0). "log_router" have 256MB of RAM and still exits with 139.

@PettitWesley
Copy link
Contributor

@aws-patrickc Can you please share the Fluent Bit configuration and task definition that you used when you encountered the issue. Thanks!

@jensenak
Copy link

I believe I'm also encountering this same problem running the latest tag (2.29) as a sidecar to a Fargate service. My Container Definition is as follows:

{
    "essential" = true
    "image"     = "amazon/aws-for-fluent-bit:latest"
    "name"      = "log_router"
    "firelensConfiguration" = {
        "type" = "fluentbit"
        "options" = {
            "enable-ecs-log-metadata" = "true",
            "config-file-type"        = "file",
            "config-file-value"       = "/fluent-bit/configs/parse-json.conf"
        }
    }
},

This was working previously in a Task Definition that had 512M memory. However, since updating versions recently, it has been exiting with code 139 (SIGSEGV) consistently, even when memory was increased to 2048M for the task group.

Not sure if that's helpful.

@PettitWesley
Copy link
Contributor

@jensenak Apologies for the issue you are experiencing. Can you please open a new issue, and add this info as well as Fluent Bit log output, and which version you are using.

@jensenak
Copy link

Sorry, wasn't trying to raise a new issue. I just switched from latest to stable and everything is working for me. I was only trying to provide a little extra detail in case it helped you troubleshoot this problem. I don't have any logs Fluent Bit failures. I see now that this issue is actually about 2.21, so apologies if my comment was off-topic.

@PettitWesley
Copy link
Contributor

@jensenak Thanks, that's why I asked for a new issue. We do not and can not patch old versions.

Everyone affected by this issue, please upgrade to our latest or latest stable. If you still see SIGSEGV, please open a new issue. Thanks!

https://github.com/aws/aws-for-fluent-bit#using-the-stable-tag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants