How to add a new app processor
What are app processors?
LogProcessors are the classes that implement app-specific logic when processing the logs.
They are located in analytics-cli/src/georchestra_analytics_cli/access_logs/app_processors and implement the
AbstractLogProcessor interface defined in
analytics-cli/src/georchestra_analytics_cli/access_logs/log_processors/abstract.py.
A generic app processor if provided, that does pretty much nothing.
Geoserver app processor extracts the Geoserver query elements mostly from the query parameters but also from the URL path (e.g. workspace can be located in the params or in the path) to provide a unified way of telling the parameters.
Datapi is a very simple app processor matching the https://github.com/georchestra/data-api project.
This tutorial follows the implementation of the datapi app processor.
Here are the main steps to consider:
- configure logging: make sure that the log records that you want to process are passed through the log chain. This will
be configured at the gateway/security proxy level. If this step is working, you should see the log records arrive in
the
analytics.opentelemetry_buffertable in the timescaleDB base. - implement the corresponding app processor in the CLI. When this works well, after running the CLI, the log records
should be processed and written in the timescaleDB base, in the
analytics.access_logstable - configure some materialized views to exploit the data you have processed. Those are the views that will be used in Superset.
- configure the Superset dashboards.
How does the CLI identify which app processor to use for a given log record ?
Available app processors are automatically discovered and lazy loaded when needed (cf log_parsers.BaseLogParser parse
function) based on the path of the request.
The CLI looks for the first level of the request's path (e.g. geoserver/, data/, geonetwork/, ...). By default, this path is supposed to match the name of the corresponding processor. But since sometimes, we choose to serve an app on an alternative path, it is possible to define a mapping, in the configuration.
In the case of the dataapi, for instance, the standard path is data, the app processor's name will be dataapi to be
more explicit. We will have to declare the mapping in the configuration.
See the configuration
Adding an app processor for the dataapi
Configure the logging in the gateway/security proxy
In the Gateway logging configuration log entries are filtered. So, if you don't adapt the config, it is most likely that the log records matching your application will never reach the CLI.
See the configuration instructions regarding the Gateway.
For instance, for the dataapi, you could use the following filter:
logging:
accesslog:
enabled: true
info:
- .*/(?:ows|ogc|wms|wfs|wcs|wps)(?:/.*|\?.*)?$
- .*/ogcapi/.*$
(ogcapi will match the dataapi)
Restart the gateway, and make sure that data api queries flow through the log chain: to Vector then into the DB, table
opentelemetry_buffer. From which the CLI will consume the records when run.
Configure the mapping path / processor's name
in the configuration file, add the following section:
# apps_mapping is a dict of app_path:app_name, used to determine which *software* is running for a given path.
# Apps which path match the app name don't have to be listed.
# App names have to match the name used in this project's app_processors subpackage.
apps_mapping:
data: dataapi
We will add more configuration later in this tutorial.
Implement the module
Create a python file called analytics-cli/src/georchestra_analytics_cli/access_logs/app_processors/dataapi.py. You
need to implement the AbstractLogProcessor interface. Look at the existing implementation of
datapi processor.
Add some app-specific configuration
For each app processor, the Config module can load some configuration that is specific to the app. It will look at the
app_processors: key. We will add a configuration block for the datapi, to rephrase the format of the downloadable
formats and match with the ones used in Geoserver:
app_processors:
...
datapi:
download_formats:
vector:
# List of formats with a human-friendly label. Keys (format name) should match the WFS Getfeature supported
# outputFormats but *in lowercase*
shapefile: "Shapefile"
json: "JSON"
geojson: "GeoJSON"
csv: "CSV"
ooxml: "Excel"
Testing your module
You should add some unit tests. In the analytics-cli/tests directory, add a test file.
You can look at the datapi test file for inspiration.
You can run it using tox, or simply with pytest tests/test_dataapi_log_processor.py.
Configure materialized views
Like for Geoserver, you can configure some continuous aggregates in your timescaleDB database, to improve and simplify the exploitation of the log records matching your app.
For instance, you could load the DDL from https://github.com/georchestra/analytics/blob/main/config/analytics/db.entrypoints.d/102_analytics_dataapi_views.sql.
Configure the Superset dashboards
See the Superset documentation.