news | Nan Tang

Jun 16, 2025	[VLDB 2025] Three papers “Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation”, “Data Imputation with Limited Data Redundancy Using Data Lakes”, “AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework”, and a tutorial “Natural Language to SQL: State of the Art and Open Problems” were accepted.
May 16, 2025	[KDD 2025] Paper “NL2SQL-BUGs: A Benchmark for Detecting Semantic Errors in NL2SQL Translation” was accepted by KDD 2025 (Datasets and Benchmarks Track).
May 1, 2025	[ICML 2025] Paper “Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search” was accepted by ICML (poster) 2025.
Apr 30, 2025	[IJCAI 2025] Paper “RAMer: Reconstruction-based Adversarial Model for Multi-party Multi-modal Multi-label Emotion Recognition” was accepted by IJCAI 2025.
Apr 2, 2025	[AIED 2025] Paper “Automatic Modeling and Analysis of Students’ Problem-Solving Handwriting Trajectories” was accepted by The 26th International Conference on Artificial Intelligence in Education.
Mar 16, 2025	[SIGMOD 2025] Paper “Automatic Database Configuration Debugging using Retrieval-Augmented Language Models”, and demo “Andromeda: Debugging Database Performance Issues with Retrieval-Augmented Large Language Models” were accepted.
Jan 18, 2025	[CHI 2025] Paper, “Augmenting Realistic Charts with Virtual Overlays” was accepted by CHI 2025.
Sep 26, 2024	[NeurIPS 2024] Two papers, “Are Large Language Models Good Statisticians?” and “CRAG - Comprehensive RAG Benchmark”, were accepted by NeurIPS 2024 Datasets and Benchmarks Track.
Sep 20, 2024	[EMNLP 2024] Two papers, “ MAR: Matching-Augmented Reasoning for Enhancing Visual-based Entity Question Answering” (main) and “ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering” (findings), were accepted.
Jul 15, 2024	[VLDB 2024] Six papers, (1) “MisDetect: Iterative Mislabel Detection using Early Loss”, (2) “LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes”, (3)”Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL”, (4) “Are Large Language Models a Good Replacement of Taxonomies?”, (5) “HAIChart: Human and AI Paired Visualization System”, (6) “The Dawn of Natural Language to SQL: Are We Fully Ready?”, and two demos, (i) “Retrieval-Based Tabular Data Cleaning Using LLMs and Data Lake”, (ii) “LakeCompass: An End-to-End System for Table Maintenance, Search and Analysis in Data Lakes”, were accepted.
Mar 18, 2024	[SIGMOD 2024] Paper “Controllable Tabular Data Synthesis Using Diffusion Models” and two demos, “IDE: A System for Iterative Mislabel Detection” and “CHatPipe: Orchestrating Data Preparation Pipelines by Optimizing Human-ChatGPT Interactions” were accepted.
Mar 10, 2024	[ICDE 2024] Two papers, “Mitigating Data Scarcity in Supervised Machine Learning through Reinforcement Learning Guided Data Generation” and “Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration”, were accepted.
Mar 8, 2024	[KDD Cup 2024] Our proposal “CRAG–Comprehensive RAG Benchmark and Challenge”, co-hosted with Meta Reality Lab, was accepted.
Dec 16, 2023	[2024 SIGMOD Research Highlight Award] Paper “Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration” [Best of SIGMOD 2023] Paper “GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data”.
Nov 16, 2023	[Best of SIGMOD 2023] Paper “GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data”.
Oct 7, 2023	[CIDR 2024] Paper VerifAI: Verified Generative AI was accepted.
Oct 1, 2023	Ph.D. positions available for 2024 spring/fall