Knime Business Hub - Deploy Failed

BaptMann · July 18, 2023, 4:27pm

Hello everybody,

I have some problem with the installation of knime business hub…
I try to understand the problem since two days but a lot of modules are in error, and I can’t identify who is the initial responsibly.
I have tried to reinstall from scratch two times with the guide of installation but the result is always the same.
I join the support-bundle with all the log here. https://drive.google.com/file/d/1P09E0f9LoCQjIiCGUkYzCJgcgDPMGepj/view?usp=drive_link

The server is on Microsoft Azure with 16vCPU and 32Gb of RAM, the firewall configure on the specification.

This is few captures of my server:

get pods :

keycloak and state-persistor are in “CrahLoopBackOff” and restart in loop.

events keycloak-0 :

- kind: Event
  apiVersion: v1
  metadata:
    name: keycloak-0.1772e72382cc88c2
    namespace: knime
    uid: 762ef33c-085e-4ca8-b306-e20d325e522f
    resourceVersion: '98551'
    creationTimestamp: '2023-07-18T13:33:50Z'
    managedFields:
      - manager: kubelet
        operation: Update
        apiVersion: v1
        time: '2023-07-18T13:33:50Z'
        fieldsType: FieldsV1
        fieldsV1:
          'f:count': {}
          'f:firstTimestamp': {}
          'f:involvedObject': {}
          'f:lastTimestamp': {}
          'f:message': {}
          'f:reason': {}
          'f:source':
            'f:component': {}
            'f:host': {}
          'f:type': {}
  involvedObject:
    kind: Pod
    namespace: knime
    name: keycloak-0
    uid: 55c83124-9864-4903-9b5b-6c57abe668d9
    apiVersion: v1
    resourceVersion: '46242'
    fieldPath: 'spec.containers{keycloak}'
  reason: Pulling
  message: 'Pulling image "quay.io/keycloak/keycloak:19.0.3-legacy"'
  source:
    component: kubelet
    host: knime-hub-server
  firstTimestamp: '2023-07-18T07:55:49Z'
  lastTimestamp: '2023-07-18T13:33:50Z'
  count: 68
  type: Normal
  eventTime: null
  reportingComponent: ''
  reportingInstance: ''
- kind: Event
  apiVersion: v1
  metadata:
    name: keycloak-0.1772e744b2cd9b91
    namespace: knime
    uid: 7c9d9fb3-92da-4969-8c43-5da5a653acc1
    resourceVersion: '105041'
    creationTimestamp: '2023-07-18T07:58:12Z'
    managedFields:
      - manager: kubelet
        operation: Update
        apiVersion: v1
        time: '2023-07-18T07:58:12Z'
        fieldsType: FieldsV1
        fieldsV1:
          'f:count': {}
          'f:firstTimestamp': {}
          'f:involvedObject': {}
          'f:lastTimestamp': {}
          'f:message': {}
          'f:reason': {}
          'f:source':
            'f:component': {}
            'f:host': {}
          'f:type': {}
  involvedObject:
    kind: Pod
    namespace: knime
    name: keycloak-0
    uid: 55c83124-9864-4903-9b5b-6c57abe668d9
    apiVersion: v1
    resourceVersion: '46242'
    fieldPath: 'spec.containers{keycloak}'
  reason: BackOff
  message: Back-off restarting failed container
  source:
    component: kubelet
    host: knime-hub-server
  firstTimestamp: '2023-07-18T07:58:12Z'
  lastTimestamp: '2023-07-18T14:18:49Z'
  count: 1589
  type: Warning
  eventTime: null
  reportingComponent: ''
  reportingInstance: ''

Thanks for your helping!

AlexKlos · July 18, 2023, 7:05pm

Hello BaptMann,

hope you are doing well!
Oh, no we are sorry to hear that the installation has not been smooth.

I just requested access to your support bundle so I can check on that in detail.

I checked a bit based on the provided information of your request.
Two things come to my mind.

Is a valid license uploaded?
Did you run “sudo sysctl fs.inotify.max_user_instances=8192” this command?

Looking forward to getting access to the support bundle for more in-depth analysis.

Best regards,
Alex

BaptMann · July 19, 2023, 7:59am

Hello Alex,
Thank you for your respond,

Yes, I’ve upload a valid license provide by knime, and yes I’ve run this command before my seconde installation.

Kind regards,

AlexKlos · July 19, 2023, 1:04pm

Thank you for your patience.
We checked the support bundle and found the following.

Pod “KeyCloak-0” is in a failed state cause the Postgres database has an issue.
I checked on the postgres-operator-755d955486-rdvwk pod which shows running but the logs do show errors.

This blocks KeyCloak-0 and therefore down the line a lot of other pods from working as intended.

We suggest restarting the pods with these two calls.
After that, the pods should restart and perform correctly.
So please check the statuses a bit later, if you still encounter issues, please provide another support bundle after performing these actions.

kubectl -n knime delete pod knime-postgres-cluster-0

kubectl -n knime delete pod postgres-operator-755d955486-rdvwk

Thank you and best regards,
Alex.

BaptMann · July 19, 2023, 3:40pm

Thanks, now, postgres-operator-XXX is running, but knime-postgres-cluster-0 is in pending state, with the error “1 Insufficient cpu, 1 Insufficient memory”.

last support bundle: https://drive.google.com/file/d/1uUxYsMvjX-a-We3BXwU5lAsPPrXuJwZT/view?usp=drive_link

Event knime-postgres-cluster-0

- kind: Event
  apiVersion: v1
  metadata:
    name: knime-postgres-cluster-0.17733be6adcae6b0
    namespace: knime
    uid: 216538a0-2380-41e1-8c79-cb5871dddd3e
    resourceVersion: '185171'
    creationTimestamp: '2023-07-19T09:49:07Z'
    managedFields:
      - manager: kube-scheduler
        operation: Update
        apiVersion: events.k8s.io/v1
        time: '2023-07-19T09:50:09Z'
        fieldsType: FieldsV1
        fieldsV1:
          'f:action': {}
          'f:eventTime': {}
          'f:note': {}
          'f:reason': {}
          'f:regarding': {}
          'f:reportingController': {}
          'f:reportingInstance': {}
          'f:series':
            .: {}
            'f:count': {}
            'f:lastObservedTime': {}
          'f:type': {}
  involvedObject:
    kind: Pod
    namespace: knime
    name: knime-postgres-cluster-0
    uid: 98879ab6-a4f0-40a7-9fc9-2cd6d1b82dd5
    apiVersion: v1
    resourceVersion: '133606'
  reason: FailedScheduling
  message: '0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory.'
  source: {}
  firstTimestamp: null
  lastTimestamp: null
  type: Warning
  eventTime: '2023-07-19T09:49:07.138331Z'
  series:
    count: 260
    lastObservedTime: '2023-07-19T14:12:07.446772Z'
  action: Scheduling
  reportingComponent: default-scheduler
  reportingInstance: default-scheduler-Knime-Hub-Server
- kind: Event
  apiVersion: v1
  metadata:
    name: knime-postgres-cluster-0.17734a4df7f4c4ac
    namespace: knime
    uid: b64e3538-491a-4663-b7e6-02a601785dbe
    resourceVersion: '184306'
    creationTimestamp: '2023-07-19T14:13:03Z'
    managedFields:
      - manager: kube-scheduler
        operation: Update
        apiVersion: events.k8s.io/v1
        time: '2023-07-19T14:13:03Z'
        fieldsType: FieldsV1
        fieldsV1:
          'f:action': {}
          'f:eventTime': {}
          'f:note': {}
          'f:reason': {}
          'f:regarding': {}
          'f:reportingController': {}
          'f:reportingInstance': {}
          'f:type': {}
  involvedObject:
    kind: Pod
    namespace: knime
    name: knime-postgres-cluster-0
    uid: 98879ab6-a4f0-40a7-9fc9-2cd6d1b82dd5
    apiVersion: v1
    resourceVersion: '184304'
  reason: FailedScheduling
  message: 'skip schedule deleting pod: knime/knime-postgres-cluster-0'
  source: {}
  firstTimestamp: null
  lastTimestamp: null
  type: Warning
  eventTime: '2023-07-19T14:13:03.927011Z'
  action: Scheduling
  reportingComponent: default-scheduler
  reportingInstance: default-scheduler-Knime-Hub-Server
- kind: Event
  apiVersion: v1
  metadata:
    name: knime-postgres-cluster-0.17734a4dfb8faa1f
    namespace: knime
    uid: e2288dc9-9184-48f1-8cbf-74ae1b3da2c7
    resourceVersion: '184309'
    creationTimestamp: '2023-07-19T14:13:03Z'
    managedFields:
      - manager: kube-scheduler
        operation: Update
        apiVersion: events.k8s.io/v1
        time: '2023-07-19T14:13:03Z'
        fieldsType: FieldsV1
        fieldsV1:
          'f:action': {}
          'f:eventTime': {}
          'f:note': {}
          'f:reason': {}
          'f:regarding': {}
          'f:reportingController': {}
          'f:reportingInstance': {}
          'f:type': {}
  involvedObject:
    kind: Pod
    namespace: knime
    name: knime-postgres-cluster-0
    uid: 089be399-cdd8-4017-b551-d2fc0e76afef
    apiVersion: v1
    resourceVersion: '184308'
  reason: FailedScheduling
  message: '0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory.'
  source: {}
  firstTimestamp: null
  lastTimestamp: null
  type: Warning
  eventTime: '2023-07-19T14:13:03.987496Z'
  action: Scheduling
  reportingComponent: default-scheduler
  reportingInstance: default-scheduler-Knime-Hub-Server
- kind: Event
  apiVersion: v1
  metadata:
    name: knime-postgres-cluster-0.17734a5478545845
    namespace: knime
    uid: bce20cba-b3a5-4ec3-9192-7bf0f8376775
    resourceVersion: '185172'
    creationTimestamp: '2023-07-19T14:13:31Z'
    managedFields:
      - manager: kube-scheduler
        operation: Update
        apiVersion: events.k8s.io/v1
        time: '2023-07-19T14:14:37Z'
        fieldsType: FieldsV1
        fieldsV1:
          'f:action': {}
          'f:eventTime': {}
          'f:note': {}
          'f:reason': {}
          'f:regarding': {}
          'f:reportingController': {}
          'f:reportingInstance': {}
          'f:series':
            .: {}
            'f:count': {}
            'f:lastObservedTime': {}
          'f:type': {}
  involvedObject:
    kind: Pod
    namespace: knime
    name: knime-postgres-cluster-0
    uid: 089be399-cdd8-4017-b551-d2fc0e76afef
    apiVersion: v1
    resourceVersion: '184312'
  reason: FailedScheduling
  message: '0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory.'
  source: {}
  firstTimestamp: null
  lastTimestamp: null
  type: Warning
  eventTime: '2023-07-19T14:13:31.850564Z'
  series:
    count: 5
    lastObservedTime: '2023-07-19T14:18:37.453612Z'
  action: Scheduling
  reportingComponent: default-scheduler
  reportingInstance: default-scheduler-Knime-Hub-Server

AlexKlos · July 20, 2023, 8:59am

Hi Baptmann,

thank you for the update and the new support bundle.
We checked on the new bundle.

The reason why the deployment gets stuck with
message: '0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory.'
seems to be that the pod knime-postgres-cluster-0 has all server resources assigned to it.

If we run a kubectl describe pod knime-postgres-cluster-0 -n knime command, we get back the container resource limits and requests:

Containers:
  postgres:
    Image:       registry.opensource.zalan.do/acid/spilo-12:1.6-p2
    Ports:       8008/TCP, 5432/TCP, 8080/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Limits:
      cpu:     16
      memory:  32Gi
    Requests:
      cpu:     16
      memory:  32Gi

With kubectl describe node knime-hub-server we get information about how the cluster’s resources are allocated and used. This shows as well that all resources are loaded into this Pod, which blocks anything else from running as there is no memory left.

Usually, this pod only gets these limits.

 Limits:
      cpu:     1
      memory:  1Gi
    Requests:
      cpu:     1
      memory:  1Gi
    Environment:

Was this value changed by hand?
If yes we suggest editing the YAML configuration using this command.
kubectl edit pod knime-postgres-cluster-0

You can edit the limits to our suggested default of 1 core 1GB as shown above.
After another start message: '0/1 nodes are available: 1 Insufficient cpu, 1 Insufficient memory.' should not appear again.

If this configuration was never touched, please let us know that too.

Thank you a lot for your effort and collaboration.

Best regards,
Alex

BaptMann · July 20, 2023, 11:14am

Many thanks Alex,

I made the modification, and now knime-postgres-cluster-0 is running!
It’s a mistake of mine! In the configuration, I specify the resources (CPU and RAM) of the server but it was in the postgreSQL part!!
Sorry for that…

Just a little modification to the command:

kubectl edit pod -n knime knime-postgres-cluster-0

And I had need to use this command to apply the modification:

kubectl replace -f /tmp/kubectl-edit-4077169692.yaml --force

After few minutes keycloak-0 is also switched in running state.

Thank you for your time

BaptMann · July 21, 2023, 12:17pm

Sorry I’m back,

All the pod knime is running and the config is fix, but I have always the error deployment failed during the deployement of the version 1.5 with a different error in the logs.

This is the pod hub:

This is the support bundle:
https://drive.google.com/file/d/1N0CuczcvWlQmeHPQ3z0ObmfNGhBZesG7/view?usp=drive_link

Thanks for your help!

AlexKlos · July 21, 2023, 2:44pm

Hey BaptMann,

no problem, welcome back!

I checked the logs of the CrashLoop pods and interestingly they all fail with the same error.
“PKIX path building failed”

Which indicates an issue with or surrounding the certificate.
Can you please check your settings from this page?
Please upload this information not as a picture on the thread as it might contain confidential data.

I will also investigate this further. Looking forward to your feedback.

Have a nice weekend and best regards,
Alex.

BaptMann · July 21, 2023, 3:12pm

This is my config:

(For the moment it is a self-signed key and certificate)

Maybe, I need to Enable Custom CA?

AlexKlos · July 24, 2023, 11:24am

Hello BaptMann,

thank you for your patience.
We discussed this installation in a wider context with the team.
Neal from our Partner Pod will reach out to you directly to discuss open questions regarding the installation of the KNIME Business Hub.

The error you getting definitely indicates issues with the certification setup.
Can you please go over these points?

Try Disabling the TLS and check if the KNIME HUB Installation works fine, Post successful installation we can try to add the certificates and troubleshoot further
Can we check if the cert.pem contains the full chain including all the intermediate certs and root certs?
Can we know the signing authority of the certificate? If you are using your own organization to sign the certificates then we may need to enable the Custom CA part and then upload the Intermediate and Root CA as part of the “Enable Custom CA”

Best regards,
Alex.

BaptMann · July 25, 2023, 9:56am

Thank you Alex and all your team,

The installation was complete without TLS, but I can’t access to the other subdomains (The connection was reset).
I try to add the key and cert file after the installation but it isn’t work…

Inside my two files, I respectively have my keys and my certificates for mydomain.com, hub.mydomain.com and *.hub.mydomaine.com.
For the moment I don’t have a custom CA.

I will see this with Neal this week.

Kind regards,

system · August 1, 2023, 9:56am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.