Nan Tang

nan2.jpg

I am an associate professor at Data Science and Analytics Thrust, Information Hub, Hong Kong University of Science and Technology (Guangzhou). I also hold an affiliated position at Hong Kong University of Science and Technology, the Clear Water Bay campus at Hong Kong.

Before joining HKUST(GZ), I worked as a senior scientist at Qatar Computing Research Institute, a visiting scientist at MIT CSAIL, a research fellow at University of Edinburgh, a scientific staff member at CWI (national research institute for mathematics and computer science in the Netherlands), and a visiting scholar at University of Waterloo.

I am directing the Data Intelligence lab, which focuses on finding good data and smart analytics that are fundamental to data management, data science and artificial intelligence.

  • Retrieval-based language models using multi-modal data lakes. Data lakes have become increasingly popular for many organizations. Given a natural language question, retrieving datasets (e.g., text, tables, graphs) and reasoning with language models are key for business intelligence.
  • Good data for AI (a.k.a. data-centric AI). For most machine learning practitioners, the success of machine learning projects heavily depends on whether we can find good data for model training.
  • AI for good data. Data scientists spend at least 80% of their time on data preparation. Machine learning models can help address diverse data preparation challenges.
  • Visualization. Data visualization is important to data analytics. I am working on automatic visualization, visualization recommendation, chat-to-story, chat-to-video, and visualization using AR/VR devices.

Office: E3 601
E-mail: nantang (at) hkust-gz.edu.cn
Call: (+86)-20-88330888

news

Nov 2, 2024 :pencil: [SIGMOD 2025] Paper “Automatic Database Configuration Debugging using Retrieval-Augmented Language Models” was accepted.
Sep 26, 2024 :pencil: [NeurIPS 2024] Two papers, “Are Large Language Models Good Statisticians?” and “CRAG - Comprehensive RAG Benchmark”, were accepted by NeurIPS 2024 Datasets and Benchmarks Track.
Sep 20, 2024 :pencil: [EMNLP 2024] Two papers, “ MAR: Matching-Augmented Reasoning for Enhancing Visual-based Entity Question Answering” (main) and “ChartInsights: Evaluating Multimodal Large Language Models for Low-Level Chart Question Answering” (findings), were accepted.
Jul 15, 2024 :pencil: [VLDB 2024] Six papers, (1) “MisDetect: Iterative Mislabel Detection using Early Loss”, (2) “LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes”, (3)”Combining Small Language Models and Large Language Models for Zero-Shot NL2SQL”, (4) “Are Large Language Models a Good Replacement of Taxonomies?”, (5) “HAIChart: Human and AI Paired Visualization System”, (6) “The Dawn of Natural Language to SQL: Are We Fully Ready?”, and two demos, (i) “Retrieval-Based Tabular Data Cleaning Using LLMs and Data Lake”, (ii) “LakeCompass: An End-to-End System for Table Maintenance, Search and Analysis in Data Lakes”, were accepted.
Mar 18, 2024 :pencil: [SIGMOD 2024] Paper “Controllable Tabular Data Synthesis Using Diffusion Models” and two demos, “IDE: A System for Iterative Mislabel Detection” and “CHatPipe: Orchestrating Data Preparation Pipelines by Optimizing Human-ChatGPT Interactions” were accepted.