🤖 AI Summary
This paper addresses the problem of compactly representing large-scale trajectory data (e.g., GPS traces): given *n* input trajectories, select the minimum number of representative polygonal curves—each of complexity at most *l*—such that every point on any input trajectory lies within Fréchet distance Δ of some subtrajectory of a representative curve. We propose a novel geometric set cover framework that, for the first time, supports multi-segment polyline representatives (not merely line segments), reducing the required number of representatives from *O(kl log(kl))* to *O(k log n)*, where *k* is the optimal cover size. Our algorithm guarantees coverage radius 11Δ and runs in Õ(*l²n⁴ + kln⁴*) time. Extensive evaluation on ocean current and human motion datasets demonstrates significant improvements in modeling complex real-world movement patterns and practical applicability.
📝 Abstract
Clustering trajectories is a central challenge when faced with large amounts of movement data such as GPS data. We study a clustering problem that can be stated as a geometric set cover problem: Given a polygonal curve of complexity $n$, find the smallest number $k$ of representative trajectories of complexity at most $l$ such that any point on the input trajectories lies on a subtrajectory of the input that has Fr'echet distance at most $Delta$ to one of the representative trajectories. In previous work, Br""uning et al.~(2022) developed a bicriteria approximation algorithm that returns a set of curves of size $O(kllog(kl))$ which covers the input with a radius of $11Delta$ in time $widetilde{O}((kl)^2n + kln^3)$, where $k$ is the smallest number of curves of complexity $l$ needed to cover the input with a radius of $Delta$. The representative trajectories computed by this algorithm are always line segments. In the applications however, one is usually interested in more complex representative curves which consist of several edges. We present a new approach that builds upon previous work computing a set of curves of size $O(klog(n))$ in time $widetilde{O}(l^2n^4 + kln^4)$ with the same distance guarantee of $11Delta$, where each curve may consist of curves of complexity up to the given complexity parameter~$l$. We conduct experiments on tracking data of ocean currents and full body motion data suggesting its validity as a tool for analyzing large spatio-temporal data sets.