Analytics CLI
The analytics CLI is python code. One of the reasons behind the choice of python is to try to stay accessible to most code-aware people.
There are good reasons you might want to contribute the code (it's always a good reason anyway, but still). For instance, you want the CLI to be able to process log records from an app that is not yet supported. This page will try to get you on the right track.
If you're still having trouble figuring your way, please reach out, either by the github issues system or on geOrchestra community channels.
The code is located in the analytics repo, in the analytics-cli folder.
Setup a development environment
cf https://github.com/georchestra/analytics/blob/main/analytics-cli/README.md
CLI's entrypoint
The entrypoint is the __main__.py.
It uses the click library to handle the interactions with the user.
Depending on the command that was called, the AccessLogProcessor
module will rely on one of the log_parsers modules.
Log parsers
cf. log_parsers
Each of them implement the BaseLogParser class or at least the AbstractLogParser class.
The log parser loads its corresponding configuration from the config file. It then parses the log records, being on the timescaledb table, from a log file, or to generate fake ones.
For each log record, it tries to identify which app is behind this record. This is done in the _get_app_processor
function and is probably the trickiest part.
On most cases, the app name is given by the first word in the path of the logged request. For instance, the path
/geoserver/admin/wms will be explicit enough to tell this is the Geoserver app. In that case, the function above will load the eponym
module in app_processors
if present. It will look for the lower-cased name, so geoserver.py in this case.
In some cases, the app name does not match the path. For instance, the dataapi is served under /data. In this case,
the correspondence needs to be explicitly provided in the config file, in the apps_mapping section:
apps_mapping:
"/data/": dataapi
In the geoserver case, the mapping is implicit. In the second case, it is explicit.
There is another possible situation, where collecting log records from apps served outside of geOrchestra and at the root path. For instance a mapserver instance, served on a given subdomain, at the root of the server (let's say for the sake of this discussion at https://ms.georchestra.org/). In this situation, there is no way to guess which app this might be. To support this, we need to
- enable the multidomains support for tha apps_mapping:
support_multiple_dn: true. When activated, the mapping expects the domain to prefix the app path. A painful consequence is that any implicit mapping will be disabled and all mappings will have to be explicitly declared in the config. - declare the mappings in the config:
support_multiple_dn: true apps_mapping: "demo.georchestra.org/geoserver/": geoserver "demo.georchestra.org/data/": dataapi "ms.georchestra.org/": mapserver
App processors
For now, apart from the dataapi app processor, all the implementations are about OGC records. geoserver, mapserver,
mapproxy rely on the more generic ogcserver module and implement specific features for each of their respective
software.
You can look at the dataapi processor for a very simple use-case. A tutorial covers its implementation.
The app processors add an extra-level of parsing to extract the app-specific information. This information is ultimately
stored in the request_details field.
Implementing a new app processor
You can have a look at the tutorial: dataapi tutorial.
You can also have a look at how the OGC apps are covered.
All app processors have to implement the abstract.py module.
The file name needs to be lowercase, so that it can be properly detected. And if necessary, an app_mapping entry
needs to provide the mapping between the path and the module's name. Then, it should be recognized automatically when
matching log records are met.
For now, the code still needs to be included/compiled in the analytics-cli package. There is no plugin support as of now.