Nan Tang

I am an associate professor at Data Science and Analytics Thrust, Information Hub, Hong Kong University of Science and Technology (Guangzhou). I also hold an affiliated position at Hong Kong University of Science and Technology, the Clear Water Bay campus at Hong Kong.

Before joining HKUST(GZ), I worked as a senior scientist at Qatar Computing Research Institute, a visiting scientist at MIT CSAIL, a research fellow at University of Edinburgh, a scientific staff member at CWI (national research institute for mathematics and computer science in the Netherlands), and a visiting scholar at University of Waterloo.

I am directing the Data Intelligence and Analytics Lab (DIAL), which focuses on finding good data and smart analytics that are fundamental to data management, data science and artificial intelligence.

Retrieval-based language models using multi-modal data lakes. Data lakes have become increasingly popular for many organizations. Given a natural language question, retrieving datasets (e.g., text, tables, graphs) and reasoning with language models are key for business intelligence.
Good data for AI (a.k.a. data-centric AI). For most machine learning practitioners, the success of machine learning projects heavily depends on whether we can find good data for model training.
AI for good data. Data scientists spend at least 80% of their time on data preparation. Machine learning models can help address diverse data preparation challenges.
Visualization. Data visualization is important to data analytics. I am working on automatic visualization, visualization recommendation, chat-to-story, chat-to-video, and visualization using AR/VR devices.

Office: E3 601
E-mail: nantang (at) hkust-gz.edu.cn
Call: (+86)-20-88330888

news

Jun 16, 2025	[VLDB 2025] Three papers “Weak-to-Strong Prompts with Lightweight-to-Powerful LLMs for High-Accuracy, Low-Cost, and Explainable Data Transformation”, “Data Imputation with Limited Data Redundancy Using Data Lakes”, “AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework”, and a tutorial “Natural Language to SQL: State of the Art and Open Problems” were accepted.
May 16, 2025	[KDD 2025] Paper “NL2SQL-BUGs: A Benchmark for Detecting Semantic Errors in NL2SQL Translation” was accepted by KDD 2025 (Datasets and Benchmarks Track).
May 1, 2025	[ICML 2025] Paper “Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search” was accepted by ICML (poster) 2025.
Apr 30, 2025	[IJCAI 2025] Paper “RAMer: Reconstruction-based Adversarial Model for Multi-party Multi-modal Multi-label Emotion Recognition” was accepted by IJCAI 2025.
Apr 2, 2025	[AIED 2025] Paper “Automatic Modeling and Analysis of Students’ Problem-Solving Handwriting Trajectories” was accepted by The 26th International Conference on Artificial Intelligence in Education.