Nan Tang
I am an associate professor at Data Science and Analytics Thrust, Information Hub, Hong Kong University of Science and Technology (Guangzhou). I also hold an affiliated position at Hong Kong University of Science and Technology, the Clear Water Bay campus at Hong Kong.
Before joining HKUST(GZ), I worked as a senior scientist at Qatar Computing Research Institute, a visiting scientist at MIT CSAIL, a research fellow at University of Edinburgh, a scientific staff member at CWI (national research institute for mathematics and computer science in the Netherlands), and a visiting scholar at University of Waterloo.
I am directing the Data Intelligence lab, which focuses on finding good data and smart analytics that are fundamental to data management, data science and artificial intelligence.
- Retrieval-based language models using multi-modal data lakes. Data lakes have become increasingly popular for many organizations. Given a natural language question, retrieving datasets (e.g., text, tables, graphs) and reasoning with language models are key for business intelligence.
- Good data for AI (a.k.a. data-centric AI). For most machine learning practitioners, the success of machine learning projects heavily depends on whether we can find good data for model training.
- AI for good data. Data scientists spend at least 80% of their time on data preparation. Machine learning models can help address diverse data preparation challenges.
- Visualization. Data visualization is important to data analytics. I am working on automatic visualization, visualization recommendation, chat-to-story, chat-to-video, and visualization using AR/VR devices.
Office: E3 601
E-mail: nantang (at) hkust-gz.edu.cn
Call: (+86)-20-88330888
news
Mar 18, 2024 | [SIGMOD 2024] Paper “Controllable Tabular Data Synthesis Using Diffusion Models” and two demos, “IDE: A System for Iterative Mislabel Detection” and “CHatPipe: Orchestrating Data Preparation Pipelines by Optimizing Human-ChatGPT Interactions” were accepted. |
---|---|
Mar 16, 2024 | [VLDB 2024] Three papers, “MisDetect: Iterative Mislabel Detection using Early Loss”, “LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes”, and “Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL” were accepted. |
Mar 10, 2024 | [ICDE 2024] Two papers, “Mitigating Data Scarcity in Supervised Machine Learning through Reinforcement Learning Guided Data Generation” and “Cost-Effective In-Context Learning for Entity Resolution: A Design Space Exploration”, were accepted. |
Mar 8, 2024 | [KDD Cup 2024] Our proposal “CRAG–Comprehensive RAG Benchmark and Challenge”, co-hosted with Meta Reality Lab, was accepted. |
Dec 16, 2023 | [2023 SIGMOD Research Highlight Award] Paper “Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration” [Best of SIGMOD 2023] Paper “GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data”. |