Server-side Map Clustering

4 min readMar 17, 2021

Background

Marker Clustering is used to display large number of markers on the map. It is a visualization technique for the analyst to make sense out of the large data on a map.

Google Map offers marker clustering which is performed on the client-side. The spatial information needs to be sent to the client to perform the clustering, which is also how Deck.gl implement this. This could pose issues if data is sensitive or large to be transferred over via mobile broadband.

An alternative solution is to move the clustering to server-side and send the results to the client for viewing. This solution, by performing on WMS protocol, offers a trade-off on the interactivity but does not require sending of raw data, has lower development complexity/time and can be optimized using caching techniques.

Guide

1. Generate random points (100K points)

We will be generating 100K dummy data points for the demo. To make the dummy data realistic, they were generated over a reference geometry (Singapore map) obtained from data.gov.sg. Generating points using the reference geometry will ensure the dummy points are on top of reference polygons. The geometry from data.gov.sg includes part of Malaysia which could be removed using QGIS.

SQL to create table and generate dummy points

View (Without clustering)

At island-wide view, we cannot differentiate the points.

Zooming in, we can observe the individual points displayed almost distinctively. However, overlapping points makes it hard to determine the magnitude and absolute count within the area.

It is therefore be more helpful to the analyst if the visualization could cluster spatially. The analyst could further perform a time-series analysis if the spatial boundaries remains the same. Therefore, summarizing data against a pre-fixed grid is one common way of aggregating the data. Fixed size grids with equal sizes allows for fair comparison across area. While it does not take into account for the existing territorial landscape, it is an unbiased aggregation start with.

2.1 Clustering SQL

The first thing we will need is to generate a list of grid over a bound. I am using the ST_SquareGrid() function in PostGIS (Do note that this function is only available in V3.1 or higher). I can determine the map bound using ST_Envelope on existing GIS features.

SELECT ST_AsText(ST_Envelope(geom)) FROM (SELECT ST_Union(geom) AS geom FROM singapore) AS

Next, we perform a spatial joint with the grid features and summarize the total number of points within each grid.

Cluster by Grid

The number represents the total count of points within the grid. We can also visualize the count via color interpolation. The size of the grid could also be dynamic depending on the zoom level of the map. This would allow the analyst to investigate into the source of concentration within the grid as he/she zoom into the map.

If the preferred choice of aggregation is to use pre-determined boundaries, the SQL could also be customized to fit. Geoserver allows for customization of SQL via the administrative web front makes it easy for development.

One such example would be the electoral boundaries, which again was obtained it from data.gov.sg.

SELECT B.geom, COUNT(A.id) FROM electoral B LEFT JOIN cluster_locations A ON ST_Intersects(A.geom, B.geom) GROUP BY B.geom

Joining all the aggregation and the all 100K points on the fly requires high amount of processing. The entire join took approximately 30 seconds to complete, and in the case of using WMS, each pan/zoom/identify will trigger the SQL. Thus, it is not practical nor user friendly to have such an experience.

Caching via a materialized view would provide a near instant results.

CREATE MATERIALIZED VIEW electoral_count AS SELECT B.geom, COUNT(B.geom) FROM electoral B LEFT JOIN cluster_locations A ON A.electoral_id = B.id GROUP BY B.geom;

Conclusion

Using server-side clustering offers a light-weight client and low network consumption solution to clustering/aggregation of data. This implementation is easy, fast but would require backend setup of Geoserver and PostGIS. All in all, this offers one possible solution kit depending on the requirement and use case.