Questions about usage tracking in KNIME AP

Hi Team,

I just realized my KNIME Analytics Platform sends telemetry to the KNIME infrastructure:

DEBUG HttpConnection:692 - Open connection to stats.knime.com:443
DEBUG header:70 - >> "POST /store/rest/usage/v1/01-7f905a0da8df26e5 HTTP/1.1[\r][\n]"
DEBUG HttpMethodBase:1352 - Adding Host request header
DEBUG header:70 - >> "Authorization: Basic ###############[\r][\n]"
DEBUG header:70 - >> "User-Agent: Jakarta Commons-HttpClient/3.1[\r][\n]"
DEBUG header:70 - >> "Host: stats.knime.com[\r][\n]"
DEBUG header:70 - >> "Content-Length: 793[\r][\n]"
DEBUG header:70 - >> "[\r][\n]"
DEBUG content:70 - >> "[\n]"
DEBUG content:70 - >> "{[\n]"
DEBUG content:70 - >> "    "version": "4.1.2.v202003050920",[\n]"
DEBUG content:70 - >> "    "created": "2020-03-14 09:50",[\n]"
DEBUG content:70 - >> "    "nodestats": {[\n]"
DEBUG content:70 - >> "        "nodes": [[\n]"
DEBUG content:70 - >> "            {[\n]"
DEBUG content:70 - >> "                "id": "org.knime.base.node.io.tablecreator.TableCreator2NodeFactory#Table Creator",[\n]"
DEBUG content:70 - >> "                "nrexecs": 1,[\n]"
DEBUG content:70 - >> "                "nrfails": 0,[\n]"
DEBUG content:70 - >> "                "exectime": 292,[\n]"
DEBUG content:70 - >> "                "nrcreated": 0,[\n]"
DEBUG content:70 - >> "                "successor": "n/a"[\n]"
DEBUG content:70 - >> "            }[\n]"
DEBUG content:70 - >> "        ],[\n]"
DEBUG content:70 - >> "        "metaNodes": {[\n]"
DEBUG content:70 - >> "        },[\n]"
DEBUG content:70 - >> "        "wrappedNodes": {[\n]"
DEBUG content:70 - >> "        }[\n]"
DEBUG content:70 - >> "    },[\n]"
DEBUG content:70 - >> "    "uptime": 2966,[\n]"
DEBUG content:70 - >> "    "workflowsOpened": 1,[\n]"
DEBUG content:70 - >> "    "remoteWorkflowsOpened": 0,[\n]"
DEBUG content:70 - >> "    "workflowsImported": 0,[\n]"
DEBUG content:70 - >> "    "workflowsExported": 0,[\n]"
DEBUG content:70 - >> "    "launches": 1,[\n]"
DEBUG content:70 - >> "    "lastApplicationID": "org.knime.product.KNIME_BATCH_APPLICATION",[\n]"
DEBUG content:70 - >> "    "timeSinceLastStart": 2966,[\n]"
DEBUG content:70 - >> "    "crashes": 0,[\n]"
DEBUG content:70 - >> "    "properlyShutDown": true[\n]"
DEBUG content:84 - >> "}"
DEBUG EntityEnclosingMethod:508 - Request body sent
DEBUG header:70 - << "HTTP/1.1 200 [\r][\n]"
DEBUG header:70 - << "HTTP/1.1 200 [\r][\n]"
DEBUG header:70 - << "Date: Sat, 14 Mar 2020 09:50:15 GMT[\r][\n]"
DEBUG header:70 - << "Server: Apache[\r][\n]"
DEBUG header:70 - << "Strict-Transport-Security: max-age=15768000; includeSubDomains; preload[\r][\n]"
DEBUG header:70 - << "Content-Length: 0[\r][\n]"
DEBUG header:70 - << "[\r][\n]"
DEBUG NodeTimer$GlobalNodeStats:639 - Successfully sent node usage stats to server

In the KNIME FAQ, I found a section about what is actually sent (BTW: might be worth to update the list as I see a few more information in the request as listed in the FAQ section). For me it looks like, sending telemetry is enabled by default and requires an opt-out.

  1. How do I disable the usage tracking? Is there a way to do this via knime.ini or startup parameters (e.g., for my KNIME Server executors)?
  2. How do I globally disable this for all of my employees? Instead of updating each and every installation, might it be sufficient to block stats.knime.com in our firewall or will this harm in any way? Do you keep a list of telemetry servers?
  3. Do you report workflow and node statistics for “private” nodes and workflows as well? Is there a way to exclude individual feature, nodes and workflows from reporting?
  4. What is the ID 01-7f905a0da8df26e5 derived from? Is this some machine or installation ID that stays constant during KNIME sessions?
  5. Do you track public IP addresses on the stats server side to be able to correlate those statistics?
  6. Do you think about making this process more transparent, e.g., asking the user when starting KNIME for the first time instead of requiring an opt-out?

I hope you are able to help me out here. Always happy for a bit of transparency when it comes to usage data.

Best regards,
Daniel

5 Likes

Any comments or feedback to this?

Best regards,
Daniel

Hi @danielesser -

Sorry for the delayed response here. Let me ask internally for some additional info.

I can already say one thing (well: two): for your point (6) KNIME Analytics Platform does ask if it can send anonymized usage data at first startup and I think we are pretty transparent about what gets collected (and why!.. The Node Recommendation Engine relies on this data). But you are right, in recent times we’ve made a few small adjustments (for instance also checking on the #crashes and unsuccessful node executions to get a feel for stability) which aren’t listed in our FAQ. We will update this.

Cheers, Michael

3 Likes

Hi @ScottF and @berthold,

Thanks for getting back to me. Looking forward to get some more answers regarding questions (1) to (5).

Regarding (6): This is only partially true. The KNIME AP GUI asks for permission to send usage information on first start, yes. Unfortunately, this is not true for the batch execution. Downloading a vanilla KNIME and running a workflow using application org.knime.product.KNIME_BATCH_APPLICATION in a fresh workspace leads to usage data being sent without asking for permission (see log in my initial post). Here, the default seems to be “send always”.

I haven’t checked this for my KNIME Server Executors as I don’t have access to them right know, but as this is technically a headless KNIME AP, I am afraid the behavior could be similar. But I am very happy to be proven wrong!

I appreciate your effort regarding transparency and keeping the FAQ up-to-date about what is actually sent. Indeed, I realized this has already happened :+1:

Best regards,
Daniel

1 Like

I just updated the FAQ about this, to cover the most recent entries as well.
As @berthold mentioned when you are starting KNIME for the first time you get asked if you want to send those files or not and can always deactivate this via the preferences. We are using these files to train our Workflow Coach and use the randomized ID to filter duplicate files.
The KNIME Server Executor is not sending these files, we did deactivate this completely.
In general, for the usage files and all data you are submitting to us, we are following our Privacy Terms.

Iris

1 Like

Thanks everybody for your answers. Let me repeat and rephrase the still open questions according to what has already been answered:

  1. Is there a way to deactivate tracking via knime.ini or startup parameters, e.g. the batch executor sends tracking information by default and I would prefer to deactivate this without the hassle of preference files.

  2. How do I globally disable tracking for all installations of my employees? Might it be sufficient to block stats.knime.com in our firewall or will this harm in any way? Do you keep a list of telemetry servers? Any other possibilities?

  3. Do you report workflow and node statistics for “private” nodes and workflows as well? Is there a way to exclude individual feature, nodes and workflows from reporting – in case I generell agree with sending data but want to have protected my IP?

  4. What is the ID 01-7f905a0da8df26e5 derived from? Is this some machine or installation ID that stays constant during KNIME sessions? @Iris Randomized according to what? On installation, on every AP start? On every workflow execution? Are there any hardware/computer parameters that go into this ID?

  5. Do you track public IP addresses on the stats server side to be able to correlate those statistics? @Iris Unfortunately, the privacy terms are not really helpful here as they explicitly just contain information about PII gathered on website and forum and hardly contain any information about (a) the data sent from the software itself and (b) the purpose why this data is gathered.

Best regards,
Daniel

I appreciate your efforts to provide some clarity and transparency regarding usage tracking in KNIME.

As stated in my previous comment, the Privacy Terms and FAQ still leave a lot of room for interpretation and the implementation does not seem to be consistent (e.g., KNIME Batch). Besides, I did not get any answers regarding the existence of technical means to deactivate tracking on a company level for a larger group of employees. If you mind, please review the still open questions.

This just brings another question to my mind: I am wondering, what KNIME’s official point of contact for those kind of questions but also technical ones is, especially for customers with Cloud deployments? Do you provide other entry points besides the forum that allow tracking of requests and maybe guarantees at least some SLAs?

Best regards,
Daniel

2 Likes

Hi Daniel,

sure, happy to help.

  1. In this case you need to deactivate it with the preferences file. Please note that you should not use the Batch Executor on the KNIME Server, we there have a dedicated scheduler taking care of this.
  2. That is possible with the Server Managed Customization Blocking the stats page does as well, in this case they can also not access the Workflow Coach.
  3. All nodes installed are reported. If you do not want to send us this, please deactivate the sending.
  4. Randomized Installation ID. It is generated on the first start of a fresh installation. We use it to keep only the most recent file for each user. No, the ID is completely random.
  5. For this one(and the others if more information is needed) may I propose a call? I will send you a message with my Email. I would like to understand what your concerns here are.

If you have for example a AWS Server, you can find the support information here: https://docs.knime.com/2019-12/aws_marketplace_server_guide/index.html#support

Short: Server Medium and Large have access to our additional KNIME Server Support via the support@knime.com email. We are replying within 48 hours.
If you need additional Support or SLAs, we can work with you on this as well.

Best, Iris

4 Likes

Thanks @Iris. That’s very helpful!

Let me go through the documents. Especially the Server Managed Customization sounds very promising. I will come back to you regarding your mail.

Best regards,
Daniel

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.