March 6, 2019
Tracking GTM.js in GitHub - This is what happens
See the pros and cons of trying to monitor your live GTM snippet using GitHub.
Maybe you already wondered, wouldn’t it be nice to track changes of just any GTM container out there over time? For example, to spy on your competitors and their tracking endeavors?
If you already took closer looks at the JavaScript file that is being loaded when you implement Google Tag Manager you can anticipate that while it’s basically possible, it comes with certain disadvantages as to readability.
Nonetheless, this article will provide you
The Basics
The Code
GitHub repo incl. a Node script which crawls and processes GTM .js files ready to run as Google Cloud FunctionThe Example
Screenshots to understand pros and cons
Here we go.
The Basics
A Google Tag Manager implementation works by having the client load the designated library file incl. the respective container’s specifics. For example, the German grocery store real.de currently loads gtm.js?id=GTM-THPRGJ8
This file is basically split into two parts. It starts with a data object which contains all the container’s tags, variables, and triggers. What follows then is the minified GTM library.
At this point we can already conclude: It will not be interesting to look at the library code for our change tracking, only the data object with the container details should be monitored.
The data object looks like this:
var data = {
"resource": {
"version":"XXX",
"**macros**":[{
...
}],
"**tags**":[{
...
}],
"**predicates**":[{
...
}],
"**rules**":[{
...
}],
},
"runtime":[
[],[]
]
}
That is, in the data object we have the currently published version number and four relevant arrays. The macros array contains all variables, the tags array contains — yes, and predicates plus rules reflect the trigger logic.
What is important to note here, since it bloats our GitHub tracking, is that GTM just uses the sequential array index to reference across the different elements. For instance, in a predicate the input variable is maybe referenced as [“macro”,76] where 76 is the respective index of the macros array.
It follows that whenever a macro is newly added or removed, these references will shake up between two commits in our target tracking repo depending on the macro’s position in the array → We will see in the screenshot section of this article what I mean.
The Code
The code can be found in the repo’s master branch. There’s also a tracking branch which contains the examples I set up.
To get running with your own container monitoring you should
if to be run locally
Clone the repo and push it to your own GitHub account
npm install
Modify config.json > Add your targets and GitHub details
Set GITHUB_AUTH env variable to your GitHub personal access token
node index.js
if to be run in the cloud
Clone the repo and make it available in your GitHub account
Link your GitHub repo to Google Cloud Source
Modify config.json > Add your targets and GitHub details
Create a Google Cloud Function based on the Google Cloud Source repo and Pub/Sub as trigger, add your GitHub personal access token as env variable GITHUB_AUTH
Setup Google Cloud Scheduler to run the function as often as you want via Pub/Sub
The Example
Having had finalized the code I immediately added some demo websites to the config.json file and started to crawl. Luckily, after one day I could already see some changes in the container of German grocery chain real.de.
I will use the screenshots from this example to explore exactly what you will see when you monitor GTM.js changes.
The obvious: Different version numbers
As you can see below, my function didn’t run often enough so I missed version 379. The latest commit shows the difference between version 380 and version 378 of the container
The actually interesting: Here, a new 1st party cookie variable
I take this example as it shows the difficulty in reading the data object. As you can see, the format of how Google stores the container details varies from, for example, what you get when you export a container from your account.
With the help of the Tag Manager documentation and reasonable thinking I can understand that this is a new 1st party cookie variable with the cookie name _gaexp. Further, looking up _gaexp in Google I get to know that this is most likely an indicator for real.de doing some Optimize 360 stuff.
The bloat
I already mentioned it above. Because real.de added a new macro (the _gaexp cookie variable) and this new macro ended up in the middle of the macro array, all subsequent array elements have now different keys. That’s why our commit is full of reference changes which, however, doesn’t have any particular meaning for us.
The funny: Watch people correct their typos
It happens to all of us :)
If you’re a practitioner I hope this article could give you some fun. If you find it actually useful you can for example add GitHub actions or extend the code to get (Slack) notifications whenever container modifications are found.