Tracking GTM.js in GitHub - This is what happens

March 6, 2019

See the pros and cons of trying to monitor your live GTM snippet using GitHub.

Maybe you already wondered, wouldn’t it be nice to track changes of just any GTM container out there over time? For example, to spy on your competitors and their tracking endeavors?

If you already took closer looks at the JavaScript file that is being loaded when you implement Google Tag Manager you can anticipate that while it’s basically possible, it comes with certain disadvantages as to readability.

Nonetheless, this article will provide you

  1. The Basics

  2. The Code GitHub repo incl. a Node script which crawls and processes GTM .js files ready to run as Google Cloud Function

  3. The Example Screenshots to understand pros and cons

Here we go.

The Basics

A Google Tag Manager implementation works by having the client load the designated library file incl. the respective container’s specifics. For example, the German grocery store real.de currently loads gtm.js?id=GTM-THPRGJ8

This file is basically split into two parts. It starts with a data object which contains all the container’s tags, variables, and triggers. What follows then is the minified GTM library.

At this point we can already conclude: It will not be interesting to look at the library code for our change tracking, only the data object with the container details should be monitored.

The data object looks like this:

    var data = {
    "resource": {
     "version":"XXX",
     "**macros**":[{
      ...
     }],
     "**tags**":[{
      ...
     }],
     "**predicates**":[{
      ...
     }],
     "**rules**":[{
      ...
     }],
    },
    "runtime":[
    [],[]
    ]
    }

That is, in the data object we have the currently published version number and four relevant arrays. The macros array contains all variables, the tags array contains — yes, and predicates plus rules reflect the trigger logic.

What is important to note here, since it bloats our GitHub tracking, is that GTM just uses the sequential array index to reference across the different elements. For instance, in a predicate the input variable is maybe referenced as [“macro”,76] where 76 is the respective index of the macros array.

It follows that whenever a macro is newly added or removed, these references will shake up between two commits in our target tracking repo depending on the macro’s position in the array → We will see in the screenshot section of this article what I mean.

The Code

The code can be found in the repo’s master branch. There’s also a tracking branch which contains the examples I set up.

To get running with your own container monitoring you should

if to be run locally

  • Clone the repo and push it to your own GitHub account

  • npm install

  • Modify config.json > Add your targets and GitHub details

  • Set GITHUB_AUTH env variable to your GitHub personal access token

  • node index.js

if to be run in the cloud

  • Clone the repo and make it available in your GitHub account

  • Link your GitHub repo to Google Cloud Source

  • Modify config.json > Add your targets and GitHub details

  • Create a Google Cloud Function based on the Google Cloud Source repo and Pub/Sub as trigger, add your GitHub personal access token as env variable GITHUB_AUTH

  • Setup Google Cloud Scheduler to run the function as often as you want via Pub/Sub

This is how you setup a Google Cloud Function linked to a Google Cloud Source Repo and “launch” as a topic for the Pub/Sub service

The Example

Having had finalized the code I immediately added some demo websites to the config.json file and started to crawl. Luckily, after one day I could already see some changes in the container of German grocery chain real.de.

I will use the screenshots from this example to explore exactly what you will see when you monitor GTM.js changes.

The obvious: Different version numbers

As you can see below, my function didn’t run often enough so I missed version 379. The latest commit shows the difference between version 380 and version 378 of the container

I take this example as it shows the difficulty in reading the data object. As you can see, the format of how Google stores the container details varies from, for example, what you get when you export a container from your account.

With the help of the Tag Manager documentation and reasonable thinking I can understand that this is a new 1st party cookie variable with the cookie name _gaexp. Further, looking up _gaexp in Google I get to know that this is most likely an indicator for real.de doing some Optimize 360 stuff.

The bloat

I already mentioned it above. Because real.de added a new macro (the _gaexp cookie variable) and this new macro ended up in the middle of the macro array, all subsequent array elements have now different keys. That’s why our commit is full of reference changes which, however, doesn’t have any particular meaning for us.

The funny: Watch people correct their typos

It happens to all of us :)

If you’re a practitioner I hope this article could give you some fun. If you find it actually useful you can for example add GitHub actions or extend the code to get (Slack) notifications whenever container modifications are found.

Neugierig geworden?
Skalieren auch Sie mit der richtigen Marketing Technology. Sprechen Sie uns an.