How to transmit OpenLiteSpeed log files to Matomo?

#1
I want to take the access log files from OpenLiteSpeed and feed it to Matomo for tracking purposes. This way I do not depend on tracking scripts on the page that can be blocked by security aware browsers. I am trying to decide the best approach to do that.

In my past provider I got an access log every day and was reading that in a PHP script that was triggered by a cron job. Just in OpenLiteSpeed the logs are not rotated every day. And actually it would be even better to transfer this data in real time. Just how to do that the best way?

I have seen there is an option in the configuration to use a piped logger. Just I am looking where to find a documentation, or even better a tutorial on how to set this up. I also got a suggestion from ChatGPT to create a FIFO log file and use inotifywait to trigger my PHP script for every new log entry. This looks actually like a fairly simple solution to get real time access data transfered into Matomo.

Anybody has a feedback what would be the best approach for that?
 

LiteCache

Active Member
#2
Anybody has a feedback what would be the best approach for that?
There are 2 alternatives that I use with Matomo myself. First of all, both alternatives are better than the use of the Access_log and secondly, both alternatives cannot be blocked by various browser plugins.

1. Matomo PHP Tracker Client.
Easy to integrate into any application.
https://github.com/matomoorg/matomophp-tracker/
https://developer.matomo.org/api-reference/php-matomo tracker

2. Modified JavaScript Tracking Code
Browser plugins, which Matomo block only check the Matomo file name. Matomo.js, piwik.js., Matomo.php. It is therefore very easy to avoid this primitive protection. Use the adapted JS Tracker code for your site.

JavaScript:
<!-- Matomo -->
<script>
  var _paq = window._paq = window._paq || [];
  _paq.push(["setRequestMethod","POST"]); // POST instead of GET
  /* tracker methods like "setCustomDimension" should be called before "trackPageView" */
  _paq.push(['trackPageView']);
  _paq.push(['enableLinkTracking']);
  (function() {
    var u="https://www.domain.com/[matomo_directory]/"; // directory name should not be matomo or piwik
    _paq.push(['setTrackerUrl', u+'mtbp']);
    _paq.push(['setSiteId', '1']);
    var d=document, g=d.createElement('script'), s=d.getElementsByTagName('script')[0];
    g.async=true; g.src=u+'mtbj'; s.parentNode.insertBefore(g,s);
  })();
</script>
<!-- End Matomo Code -->
Add this rewrite rules to .htaccess in Matomo directory

Apache config:
RewriteRule ^mtbj matomo.js
RewriteRule ^mtbp matomo.php
Check it out at: https://www.cachecrawler.com/
 
Last edited:
#3
Thanks for the input. That trick for modifying the script is really interesting, and I will use it on some sites. The PHP tracker I guess will not work when caching is enabled, so that is a real limitation. I still would like to check a way to ingest the log files.
 

LiteCache

Active Member
#4
There is an improved version of the tracking code that bypasses the optimization function of the cache plugin and enables better loading of matomo.js. I use this improved tracking code myself.
 
#5
Yes if you use a tracking script, I think your modified script is pretty good. But any script will have some issues. Besides the blocking issue you have greatly improved here, you need to take extra precautions to make the tracking script GDPR compliant. And I want to avoid the use of cookie banners at any cost. Also, adding a tracking script will influence performance, so all of that problems can be avoided by reading the log files.
 

LiteCache

Active Member
#6
And I want to avoid the use of cookie banners at any cost.
Use DNT header to automatically control if a cookie banner is displayed or not. Matomo supports it.

Also, adding a tracking script will influence performance, so all of that problems can be avoided by reading the log files.
As said, the my modified tracker code is based on default tracker code, but as also said there is a much more improved version of the default code. This improved version doesn't influence performance. Check https://www.cachecrawler.com with any speed test page.

so all of that problems can be avoided by reading the log files.
Yes, but using the access_log for Matomo is expensive and causes high load for each data import, because access_log contains a lot of data that must be ignored by Matomo. Also, you can only analyse the number of requests and nothing else. Not really valuable....
 
Last edited:
#7
So this is the improved version?


JavaScript:
<script>var _paq = window._paq = window._paq || [];
_paq.push(["setRequestMethod","POST"]);
_paq.push(["trackPageView"]);
_paq.push(["enableLinkTracking"]);
_paq.push(["enableHeartBeatTimer", 5]);
var ab = "//www.cachecrawler.com/pwk/";
_paq.push(["setTrackerUrl", ab + "mtbp"]);
_paq.push(["setSiteId", "1"]);
if (localStorage.getItem("hasTag") !== null) {
    const userId = localStorage.getItem("hasTag");
    _paq.push(['setUserId', userId]);
}
var ib = function() {
    var m = document.createElement("script");
    m.src = "https://www.cachecrawler.com/pwk/mtbj";
    m.setAttribute("fetchPriority", "low");
    var n = document.getElementsByTagName("head")[0];
    n.parentNode.insertBefore(m, n);
};
var sbe = requestAnimationFrame || mozRequestAnimationFrame || webkitRequestAnimationFrame || msRequestAnimationFrame;
if (sbe) raf(ib);else window.addEventListener("load", ib);
</script>
Seems that code is doing something more than just page tracking. Can you explain a bit more what is the difference and how to implement it?
 

LiteCache

Active Member
#8
Seems that code is doing something more than just page tracking. Can you explain a bit more what is the difference and how to implement it?
This is additional code makes loading of the Matomo script "magic". Use the inspector tab in browser dev console and have a look at the very top of the html output, just after <html> tag. This additional code makes the real defer loading because the default method by using the defer attribut parses the code before the document is completely ready and this method blocks rendering html code. The default defer method with the defer attribute is not bad, but every CMS uses it incorrectly.
 
#9
I know Javascript but I do not understand what you are talking about. How can a Javascript defer Javascript rendering? If you speak about loading matomo.js it might, but in both example this script is instantiated by Javascript. So please explain what all this additional code is doing and what makes it superior.
 
#11
Well, I have seen already that it does more then just delaying the script. What is this thing that checks for a User ID in local storage?

Code:
if (localStorage.getItem("hasTag") !== null) {
    // If there's a value, retrieve it and set it as the user ID for Matomo tracking.
    const userId = localStorage.getItem("hasTag");
    _paq.push(['setUserId', userId]);
}
I guess that part can be left out, right?

And I wonder whether it is intentional that the protocol is not defined for setTrackerUrl
Also I wonder whether requestAnimationFrame really brings much benefits over the load event. And speaking about that, I wonder whether the syntax is right:
if (sbe) raf(ib);

I think it should be:
if (sbe) sbe(ib);

because raf is not defined as far as I can see.

Let me know where you got that script from and whether it is documented in any place. Anyways, It is surely an interesting approach and I will consider to adapt my tracking scripts in one way or the other.

And I still want to pursue my log reading endeavors which is definitely the most foolproof way of tracking my visitors.
 
#12
I found the following script on the Matomo website that will defer loading:


Code:
var _paq = window._paq = window._paq || [];
_paq.push(["trackPageView"]);
_paq.push(["enableLinkTracking"]);

function embedTrackingCode() {
   var u="https://your.piwik.domain/";
   _paq.push(["setTrackerUrl", u+"piwik.php"]);
   _paq.push(["setSiteId", "1"]);

   var d=document, g=d.createElement("script"), s=d.getElementsByTagName("script")[0]; g.type="text/javascript";
   g.defer=true; g.async=true; g.src=u+"piwik.js"; s.parentNode.insertBefore(g,s);   
}

if (window.addEventListener) {
    window.addEventListener("load", embedTrackingCode, false);
} else if (window.attachEvent) {
    window.attachEvent("onload",embedTrackingCode);
} else {
    embedTrackingCode();
}
I guess it is a reliable way to wait for the onload event and resolve the tracking script interfering with page load. Of course, if somebody gets off the page before that, I will lose his tracking, but If somebody does not wait some 3 seconds, I do not need to count him as a page visit anyways.

Later on I will look into the reading of the log files.
 
Top