oauth2 Authenticator losing token after a period of time

Hi Folks

We are using the oAuth2 Authenticator to access google APis.
However we notice that after a period of time (approx 1 hour?) the token is lost and unable to get a refresh token.

This causes our data transfers to fail midway.

The token from google doesn’t to my knowledge have a time expiry.

I can’t use the Palladian oAuth2 connector as it has no blue box connector to the Googel Analytics Connector.

2 things, can the bug be fixed at some point please. Second, is there a way to create a blue connector ?

Gavin

Hi @Gavin_Attard,

have you checked the Console for any errors?

Best
Mike

1 Like

Ye the issue is not there.

The connector does not even make the call, it fails as it has an empty refresh_token which ought to be coming from the oAuth2 Authenticator. For some reason it appears after an hour or so that value is no longer available so the GA node fails and spits out the error, never making the request.

kr

Thanks for the additional details. Maybe I missed something in your initial post. Did you tried the Google Auth node and have you updated Knime as well as all extensions to the most recent version?

image

Yep all up to date

We are not using the Google Authenticator as the Knime project is maxed out for our purposes.

so we use oAuth to connect to our own end point due to an issue we ran into with the Google Auth tool custom end point detailed here with URGENT ISSUE| GA Authentication | Interactive | quota exceeded

That is quite interesting. I never ran into any API limitations querying GA using the Google API event though I pretty much exploited the API extracting raw data on a per event basis (each marked with a unix time stamp) with 50k results per query over a time span of years. So with 1000 queries with 50k results each I managed to get 50.000.000 (million) results and didn’t even scratched the quote limit.

It rather feels, reading the post you linked, that the query done might be inefficient. But I strive away form the topic.

Though, even if you use a custom endpoint the data still needs to come from a Google API, doesn’t it?

I roughly recall that the Google API is used got unintentionally exhausted by peers of the company I once worked for. But that whole topic is so complex that, without direct access to anything, it is virtually impossible to make any recommendation. Feel rest assured I do feel your pain.

That reminds me that I slapped my forehead some time ago when GA4 was announced as I wrote ana article about quote limitations. It might be that or another one:

https://scandiweb.com/blog/ga4-api-quota-apocalypse-and-how-to-survive-it/

Maybe you can spot something there that explains the struggles?

PS: Did you consider connecting your GA4 property to Google Big Query?

1 Like

This is for GA UA - Data Archiving.

Same as yourself i dont usually have quota issues, however this time 96 views * 365 days * 4 years (i know, client insisted), * 8 queries * avg of 4.5 pages per query = 5M requests.

Quota limit is 50K per day per project :smiley:

gonna try another tack on the other thread. if some kind soul from Knime can let me now what localhost url they use :smiley:

I believe this is to application specific as the limitation is presumably imposed by the API but not Knime. Since you mentioned UA I recalled to have read an article about backing up the data recently too.

Did you, by any chance, read one or both of the following support articles?

About your math resulting in 5 Million queries. 50k queries * 50k results (max) per query = 2.500.000.000 results / day

That should work out. I suppose only specific data is required like sales but not everything, correct?

On another note, you said you are using oAuth. When I try using interactive with Googel Auth Node I see:

That reminded me of the issue I mentioned earlier that someone else consumed all credits. Having your own project in the Google Dev Console with own credentials would help. However, I might have gotten something wrong in our exchange as, like I mentioned, it’s pretty complex without seeing anything.

PS: I believe I was able to reproduce the issue and recall the cause. Since the OAauth method bundles all app specific requests (difficult to describe) amongst all using that method in Knime, it counts towards that project.

Several years ago I faced that very same issue but cannot locate the post from back then. Though, my memory is quite vivid so I have a high confidence it the cause.

To resolve it you really need to create your own Project, credentials etc. which is … challenging.

ERROR Google Analytics Query (legacy) 4:1406     Execute failed: 403 Forbidden
GET https://analytics.googleapis.com/analytics/v3/management/segments
{
  "code": 403,
  "details": [
    {
      "@type": "type.googleapis.com/google.rpc.ErrorInfo",
      "reason": "RATE_LIMIT_EXCEEDED",
      "domain": "googleapis.com",
      "metadata": {
        "quota_limit_value": "50000",
        "quota_metric": "analytics.googleapis.com/default",
        "service": "analytics.googleapis.com",
        "quota_location": "global",
        "quota_limit": "defaultPerDayPerProject",
        "consumer": "projects/868899538500"
      }
    },
    {
      "@type": "type.googleapis.com/google.rpc.Help",
      "links": [
        {
          "description": "Request a higher quota limit.",
          "url": "https://cloud.google.com/docs/quota#requesting_higher_quota"
        }
      ]
    }
  ],
  "errors": [
    {
      "domain": "usageLimits",
      "message": "Quota exceeded for quota metric 'Queries' and limit 'Queries per day' of service 'analytics.googleapis.com' for consumer 'project_number:868899538500'.",
      "reason": "rateLimitExceeded"
    }
  ],
  "message": "Quota exceeded for quota metric 'Queries' and limit 'Queries per day' of service 'analytics.googleapis.com' for consumer 'project_number:868899538500'.",
  "status": "PERMISSION_DENIED"
}

Adding to this I noticed the new Google Analytics Connector node might only support GA4. So try use the legacy node.

Here is a brief guideline what to do (I just did it myself again to a greater extend):

  1. Create a Project in and enable the API “Google Analytics API”
  2. Navigate to “Credentials”
  3. Create a Service Account and copy the email (for UA), set the permissions and create a p12 key
  4. Add the email from the service account to all your Google Universal Analytics Profile in question
  5. Configure the Authenticator and Connector node.

Dump question about the timeframe you client desires. Chances are that data got deleted based on the chosen retention rate.

2 Likes

ya setting up our own credentials was not the issue, that was the easy bit.

we have a few of them to help us get the data we need in good time (15 credentials, used 5 at a time with 2 parallel query nodes gives us a nice run rate with no errors and we are able to download a years data at a time relatively reasonably. ) However one query is geendartin a lot of data which is taking longer then then oauth refresh token timeout. We are doing archiving of top level metrics at daily level (non ranged, so 1 query per day for each master query - that avoids sampling)
Also events and pageviews with several other metrics so quite a data haul.

THe issue is when you setup a credential you need to specify a redirect_url - this way you can do the auth just like the Interactive method on google auth

i wanted to use our credentials with the google auth tool, but we don’t know what redirect_url was set int eh Knime google cloud project (they hardcoded this value into the tool you see).

If i use oAuth2 connector, it all works as i can set hte redirect_url to whatever i configured in google cloud, however the issue is the timing out or loss of refresh token, which doesn’t happen with the google auth, presumably cause they factored in that it doesn’t expire.

my temp solution is to half the query date range to half years rather than yearly, which is a faff…

case yer curious :smiley:

Sometimes when you are having these conversations in the forum you end up giving yourself clues as to the solution.

in this case i realised the redirect_url is specified in the auth url call from the node

so i got it now and can align my projects :smiley: happy days

1 Like

Glad to have been of any help. Just to add my two cents about what I did / approached peak query performance (with redundancy in case of failure):

  • Loop per day for a max. of 10k results
  • Use Try-Catch for fail over in conjunction with variable loop condition end
  • Appen current query results to temporary CSV
  • Parallelize for each profile ID, query and day (in chunks)
  • It is also possible to create a “pre-query” to determine the amount of results, further increase parallelism using the start index

Since I pass the a hit time stamp to all events / hits I use that to ensure sampling isn’t triggered when extracting data.

Here is my approach (the majority of it). Haven’t got the chance to update the nodes which I usually do xD


PS: My former calculation was wrong. It weren’t 50k50k but 50k10k … still 500.000.000 results which are plenty. It also seems to be possible to apply for a higher quota.

About your workflow, personally I’d use the API credentials (as JSON or p12) as you can manage them programmatically which drastically streamlines your workflow. The Parallel Chunk Nodes I’d use with caution as on many occasions they caused instability (also within Knime).

In regards to the redirect URL. Iif you unauthorize Knime and authorize it again, open the browser console and enable persistent logging. That way you should be able to get the redirect url.

Cheers

1 Like

Haha nice ye, seems like we hit similar approaches. The 120Feet GA connector does that.

Looks like the GA authenticator keeps asking for a diff redirect. Will have to settle for half year chunks.

1 Like