🤖 AI Summary
This study addresses the color frequency reporting problem: efficiently reporting all distinct colors and their frequencies within a given query region—such as an axis-aligned rectangle or a dominance range—with query time strictly dependent on the output size. The work presents the first data structure in two dimensions achieving $O(\log n + k \log_s n)$ worst-case output-sensitive query time, where $k$ denotes the output size. It also establishes a lower bound in the arithmetic model for the weighted variant, devises a space-compressed transformation, and provides a batch query algorithm using linear space. The proposed structure occupies $O(ns \log_s n)$ space, generalizes to higher dimensions, and maintains near-optimal performance.
📝 Abstract
Given a set of $n$ colored points $P \subset \mathbb{R}^d$ we wish to store $P$ such that, given some query region $Q$, we can efficiently report the colors of the points appearing in the query region, along with their frequencies. This is the \emph{color frequency reporting} problem. We study the case where query regions $Q$ are axis-aligned boxes or dominance ranges. If $Q$ contains $k$ colors, the main goal is to achieve ``strictly output sensitive'' query time $O(f(n) + k)$. Firstly, we show that, for every $s \in \{ 2, \dots, n \}$, there exists a simple $O(ns\log_s n)$ size data structure for points in $\mathbb{R}^2$ that allows frequency reporting queries in $O(\log n + k\log_s n)$ time. Secondly, we give a lower bound for the weighted version of the problem in the arithmetic model of computation, proving that with $O(m)$ space one can not achieve query times better than $Ω\left(φ\frac{\log (n / φ)}{\log (m / n)}\right)$, where $φ$ is the number of possible colors. This means that our data structure is near-optimal. We extend these results to higher dimensions as well. Thirdly, we present a transformation that allows us to reduce the space usage of the aforementioned datastructure to $O(n(s φ)^\varepsilon \log_s n)$. Finally, we give an $O(n^{1+\varepsilon} + m \log n + K)$-time algorithm that can answer $m$ dominance queries $\mathbb{R}^2$ with total output complexity $K$, while using only linear working space.