Synthesizing Scoring Functions for Rankings Using Symbolic Gradient Descent

📅 2024-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem of automatically synthesizing concise linear scoring functions from given relations and tuple rankings, without prior knowledge of ranking functions. The proposed method introduces Symbolic Gradient Descent (Sym-GD), an approximation algorithm, and formulates the synthesis task as a Mixed-Integer Linear Program (MILP) that supports position-aware error minimization and customizable weight constraints. To overcome the limitations of traditional polynomial-time algorithms—which solve subproblems in isolation—the approach leverages LP decomposition and constrained optimization techniques. Experimental results demonstrate that the method achieves speedups of several orders of magnitude over state-of-the-art baselines, while significantly improving accuracy and scalability on large-scale real-world datasets. It successfully generates linear scoring functions that are both more accurate and inherently interpretable.

Technology Category

Application Category

📝 Abstract
Given a relation and a ranking of its tuples, but no information about the ranking function, we are interested in synthesizing simple scoring functions that reproduce the ranking. Our system RankHow identifies linear scoring functions that minimize position-based error, while supporting flexible constraints on their weights. It is based on a new formulation as a mixed-integer linear program (MILP). While MILP is NP-hard in general, we show that RankHow is orders of magnitude faster than a tree-based algorithm that guarantees polynomial time complexity (PTIME) in the number of input tuples by reducing the MILP problem to many linear programs (LPs). We hypothesize that this is caused by 2 properties: First, the PTIME algorithm is equivalent to a naive evaluation strategy for the MILP program. Second, MILP solvers rely on advanced heuristics to reason holistically about the entire program, while the PTIME algorithm solves many sub-problems in isolation. To further improve RankHow's scalability, we propose a novel approximation technique called symbolic gradient descent (Sym-GD). It exploits problem structure to more quickly find local minima of the error function. Experiments demonstrate that RankHow can solve realistic problems, finding more accurate linear scoring functions than the state of the art.
Problem

Research questions and friction points this paper is trying to address.

Synthesizing simple scoring functions to reproduce given rankings
Minimizing position-based error with linear scoring functions
Improving scalability using symbolic gradient descent technique
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses symbolic gradient descent for approximation
Formulates problem as mixed-integer linear program
Reduces MILP to many linear programs efficiently
🔎 Similar Papers
No similar papers found.
Z
Zixuan Chen
Northeastern University, Boston, USA
P
P. Manolios
Northeastern University, Boston, USA
Mirek Riedewald
Mirek Riedewald
Associate Professor of Computer Science, Northeastern University
databasesMapReducelarge-scale data managementdata miningbig data