Publications

publications by categories in reversed chronological order.

2024

  1. Controllable Tabular Data Synthesis Using Diffusion Models
    Tongyu Liu, Ju Fan, Nan Tang, Guoliang Li, and 1 more author
    Proc. ACM Manag. Data, 2024
  2. MisDetect: Iterative Mislabel Detection using Early Loss
    Yuhao Deng, Chengliang Chai, Lei Cao, Nan Tang, and 4 more authors
    Proc. VLDB Endow., 2024
  3. LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data Lakes
    Yuhao Deng, Chengliang Chai, Lei Cao, Qin Yuan, and 14 more authors
    Proc. VLDB Endow., 2024
  4. Unicorn: A Unified Multi-Tasking Matching Model
    Ju Fan, Jianhong Tu, Guoliang Li, Peng Wang, and 4 more authors
    SIGMOD Rec., 2024
  5. Tabular data synthesis with generative adversarial networks: design space and optimizations
    Tongyu Liu, Ju Fan, Guoliang Li, Nan Tang, and 1 more author
    VLDB J., 2024
  6. VerifAI: Verified Generative AI
    Nan Tang, Chenyu Yang, Ju Fan, Lei Cao, and 2 more authors
    In 14th Conference on Innovative Data Systems Research, CIDR 2024, Chaminade, HI, USA, January 14-17, 2024, 2024
  7. ChatPipe: Orchestrating Data Preparation Pipelines by Optimizing Human-ChatGPT Interactions
    Sibei Chen, Hanbing Liu, Waiting Jin, Xiangyu Sun, and 4 more authors
    In Companion of the 2024 International Conference on Management of Data, SIGMOD/PODS 2024, Santiago AA, Chile, June 9-15, 2024, 2024
  8. IDE: A System for Iterative Mislabel Detection
    Yuhao Deng, Deng Qiyan, Chengliang Chai, Lei Cao, and 5 more authors
    In Companion of the 2024 International Conference on Management of Data, SIGMOD/PODS 2024, Santiago AA, Chile, June 9-15, 2024, 2024

2023

  1. Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data Integration
    Jianhong Tu, Ju Fan, Nan Tang, Peng Wang, and 4 more authors
    Proc. ACM Manag. Data, 2023
  2. Learned Data-aware Image Representations of Line Charts for Similarity Search
    Yuyu Luo, Yihui Zhou, Nan Tang, Guoliang Li, and 2 more authors
    Proc. ACM Manag. Data, 2023
  3. HAIPipe: Combining Human-generated and Machine-generated Pipelines for Data Preparation
    Sibei Chen, Nan Tang, Ju Fan, Xuemi Yan, and 3 more authors
    Proc. ACM Manag. Data, 2023
  4. Few-shot Text-to-SQL Translation using Structure and Content Prompt Learning
    Zihui Gu, Ju Fan, Nan Tang, Lei Cao, and 3 more authors
    Proc. ACM Manag. Data, 2023
  5. GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete Data
    Chengliang Chai, Jiabin Liu, Nan Tang, Ju Fan, and 4 more authors
    Proc. ACM Manag. Data, 2023
  6. Road-Aware Indexing for Trajectory Range Queries
    Yong Wang, Kaiyu Li, Guoliang Li, and Nan Tang
    IEEE Trans. Knowl. Data Eng., 2023
  7. HOFD: An Outdated Fact Detector for Knowledge Bases
    Shuang Hao, Chengliang Chai, Guoliang Li, Nan Tang, and 2 more authors
    IEEE Trans. Knowl. Data Eng., 2023
  8. Symphony: Towards Natural Language Query Answering over Multi-modal Data Lakes
    Zui Chen, Zihui Gu, Lei Cao, Ju Fan, and 2 more authors
    In 13th Conference on Innovative Data Systems Research, CIDR 2023, Amsterdam, The Netherlands, January 8-11, 2023, 2023
  9. Efficient Coreset Selection with Cluster-based Methods
    Chengliang Chai, Jiayi Wang, Nan Tang, Ye Yuan, and 3 more authors
    In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023, Long Beach, CA, USA, August 6-10, 2023, 2023
  10. Demystifying Artificial Intelligence for Data Preparation
    Chengliang Chai, Nan Tang, Ju Fan, and Yuyu Luo
    In Companion of the 2023 International Conference on Management of Data, SIGMOD/PODS 2023, Seattle, WA, USA, June 18-23, 2023, 2023
  11. Pay "Attention" to Chart Images for What You Read on Text
    Chenyu Yang, Ruixue Fan, Nan Tang, Meihui Zhang, and 3 more authors
    In Companion of the 2023 International Conference on Management of Data, SIGMOD/PODS 2023, Seattle, WA, USA, June 18-23, 2023, 2023
  12. RetClean: Retrieval-Based Data Cleaning Using Foundation Models and Data Lakes
    Mohammad Shahmeer Ahmad, Zan Ahmad Naeem, Mohamed Y. Eltabakh, Mourad Ouzzani, and 1 more author
    CoRR, 2023
  13. ChatPipe: Orchestrating Data Preparation Program by Optimizing Human-ChatGPT Interactions
    Sibei Chen, Hanbing Liu, Weiting Jin, Xiangyu Sun, and 4 more authors
    CoRR, 2023
  14. Interleaving Pre-Trained Language Models and Large Language Models for Zero-Shot NL2SQL Generation
    Zihui Gu, Ju Fan, Nan Tang, Songyue Zhang, and 6 more authors
    CoRR, 2023
  15. VerifAI: Verified Generative AI
    Nan Tang, Chenyu Yang, Ju Fan, and Lei Cao
    CoRR, 2023
  16. SEED: Simple, Efficient, and Effective Data Management via Large Language Models
    Zui Chen, Lei Cao, Sam Madden, Ju Fan, and 6 more authors
    CoRR, 2023

2022

  1. AlphaQO: Robust Learned Query Optimizer
    Xiang Yu, Chengliang Chai, Xinning Zhang, Nan Tang, and 2 more authors
    Int. J. Softw. Informatics, 2022
  2. Preface
    Guoliang Li, Nan Tang, and Chengliang Chai
    J. Comput. Sci. Technol., 2022
  3. Selective Data Acquisition in the Wild for Model Charging
    Chengliang Chai, Jiabin Liu, Nan Tang, Guoliang Li, and 1 more author
    Proc. VLDB Endow., 2022
  4. DADER: Hands-Off Entity Resolution with Domain Adaptation
    Jianhong Tu, Xiaoyue Han, Ju Fan, Nan Tang, and 3 more authors
    Proc. VLDB Endow., 2022
  5. Coresets over Multiple Tables for Feature-rich and Data-efficient Machine Learning
    Jiayi Wang, Chengliang Chai, Nan Tang, Jiabin Liu, and 1 more author
    Proc. VLDB Endow., 2022
  6. Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial Networks
    Jinfeng Peng, Derong Shen, Nan Tang, Tieying Liu, and 4 more authors
    Proc. VLDB Endow., 2022
  7. Steerable Self-Driving Data Visualization
    Yuyu Luo, Xuedi Qin, Chengliang Chai, Nan Tang, and 2 more authors
    IEEE Trans. Knowl. Data Eng., 2022
  8. Natural Language to Visualization by Neural Machine Translation
    Yuyu Luo, Nan Tang, Guoliang Li, Jiawei Tang, and 2 more authors
    IEEE Trans. Vis. Comput. Graph., 2022
  9. Interactively discovering and ranking desired tuples by data exploration
    Xuedi Qin, Chengliang Chai, Yuyu Luo, Tianyu Zhao, and 5 more authors
    VLDB J., 2022
  10. PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training
    Zihui Gu, Ju Fan, Nan Tang, Preslav Nakov, and 2 more authors
    In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, 2022
  11. Synthesizing Privacy Preserving Entity Resolution Datasets
    Xuedi Qin, Chengliang Chai, Nan Tang, Jian Li, and 3 more authors
    In 38th IEEE International Conference on Data Engineering, ICDE 2022, Kuala Lumpur, Malaysia, May 9-12, 2022, 2022
  12. Feature Augmentation with Reinforcement Learning
    Jiabin Liu, Chengliang Chai, Yuyu Luo, Yin Lou, and 2 more authors
    In 38th IEEE International Conference on Data Engineering, ICDE 2022, Kuala Lumpur, Malaysia, May 9-12, 2022, 2022
  13. Domain Adaptation for Deep Entity Resolution
    Jianhong Tu, Ju Fan, Nan Tang, Peng Wang, and 4 more authors
    In SIGMOD ’22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, 2022
  14. PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-training
    Zihui Gu, Ju Fan, Nan Tang, Preslav Nakov, and 2 more authors
    CoRR, 2022

2021

  1. Adaptive Data Augmentation for Supervised Learning over Missing Data
    Tongyu Liu, Ju Fan, Yinqing Luo, Nan Tang, and 2 more authors
    Proc. VLDB Endow., 2021
  2. RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data Preparation
    Nan Tang, Ju Fan, Fangyi Li, Jianhong Tu, and 4 more authors
    Proc. VLDB Endow., 2021
  3. Deep Learning for Blocking in Entity Matching: A Design Space Exploration
    Saravanan Thirumuruganathan, Han Li, Nan Tang, Mourad Ouzzani, and 4 more authors
    Proc. VLDB Endow., 2021
  4. Automatic Data Acquisition for Deep Learning
    Jiabin Liu, Fu Zhu, Chengliang Chai, Yuyu Luo, and 1 more author
    Proc. VLDB Endow., 2021
  5. Learned Cardinality Estimation: A Design Space Exploration and A Comparative Evaluation
    Ji Sun, Jintao Zhang, Zhaoyan Sun, Guoliang Li, and 1 more author
    Proc. VLDB Endow., 2021
  6. Mis-categorized entities detection
    Shuang Hao, Nan Tang, Guoliang Li, Jianhua Feng, and 1 more author
    VLDB J., 2021
  7. Ranking Desired Tuples by Database Exploration
    Xuedi Qin, Chengliang Chai, Yuyu Luo, Tianyu Zhao, and 5 more authors
    In 37th IEEE International Conference on Data Engineering, ICDE 2021, Chania, Greece, April 19-22, 2021, 2021
  8. Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL Benchmarks
    Yuyu Luo, Nan Tang, Guoliang Li, Chengliang Chai, and 2 more authors
    In SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, 2021
  9. Learned Cardinality Estimation for Similarity Queries
    Ji Sun, Guoliang Li, and Nan Tang
    In SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, 2021

2020

  1. DeepEye: A Data Science System for Monitoring and Exploring COVID-19 Data
    Yuyu Luo, Nan Tang, Guoliang Li, Wenbo Li, and 2 more authors
    IEEE Data Eng. Bull., 2020
  2. Deductive optimization of relational data storage
    John K. Feser, Sam Madden, Nan Tang, and Armando Solar-Lezama
    Proc. ACM Program. Lang., 2020
  3. Pattern Functional Dependencies for Data Cleaning
    Abdulhakim Ali Qahtan, Nan Tang, Mourad Ouzzani, Yang Cao, and 1 more author
    Proc. VLDB Endow., 2020
  4. VisClean: Interactive Cleaning for Progressive Visualization
    Yuyu Luo, Chengliang Chai, Xuedi Qin, Nan Tang, and 1 more author
    Proc. VLDB Endow., 2020
  5. DeepTrack: Monitoring and Exploring Spatio-Temporal Data - A Case of Tracking COVID-19 -
    Yuyu Luo, Wenbo Li, Tianyu Zhao, Xiang Yu, and 3 more authors
    Proc. VLDB Endow., 2020
  6. Debugging Large-Scale Data Science Pipelines using Dagger
    El Kindi Rezig, Ashrita Brahmaroutu, Nesime Tatbul, Mourad Ouzzani, and 4 more authors
    Proc. VLDB Endow., 2020
  7. Making data visualization more efficient and effective: a survey
    Xuedi Qin, Yuyu Luo, Nan Tang, and Guoliang Li
    VLDB J., 2020
  8. Dagger: A Data (not code) Debugger
    El Kindi Rezig, Lei Cao, Giovanni Simonini, Maxime Schoemans, and 4 more authors
    In 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12-15, 2020, Online Proceedings, 2020
  9. Data Curation with Deep Learning
    Saravanan Thirumuruganathan, Nan Tang, Mourad Ouzzani, and AnHai Doan
    In Proceedings of the 23rd International Conference on Extending Database Technology, EDBT 2020, Copenhagen, Denmark, March 30 - April 02, 2020, 2020
  10. Interactive Cleaning for Progressive Visualization through Composite Questions
    Yuyu Luo, Chengliang Chai, Xuedi Qin, Nan Tang, and 1 more author
    In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20-24, 2020, 2020
  11. Reinforcement Learning with Tree-LSTM for Join Order Selection
    Xiang Yu, Guoliang Li, Chengliang Chai, and Nan Tang
    In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20-24, 2020, 2020
  12. Outdated Fact Detection in Knowledge Bases
    Shuang Hao, Chengliang Chai, Guoliang Li, Nan Tang, and 2 more authors
    In 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20-24, 2020, 2020
  13. Interactively Discovering and Ranking Desired Tuples without Writing SQL Queries
    Xuedi Qin, Chengliang Chai, Yuyu Luo, Nan Tang, and 1 more author
    In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, 2020
  14. CoClean: Collaborative Data Cleaning
    Mashaal Musleh, Mourad Ouzzani, Nan Tang, and AnHai Doan
    In Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, 2020
  15. Relational Pretrained Transformers towards Democratizing Data Preparation [Vision]
    Nan Tang, Ju Fan, Fangyi Li, Jianhong Tu, and 4 more authors
    CoRR, 2020

2019

  1. Querying Shortest Paths on Time Dependent Road Networks
    Yong Wang, Guoliang Li, and Nan Tang
    Proc. VLDB Endow., 2019
  2. Data Civilizer 2.0: A Holistic Framework for Data Preparation and Analytics
    El Kindi Rezig, Lei Cao, Michael Stonebraker, Giovanni Simonini, and 5 more authors
    Proc. VLDB Endow., 2019
  3. Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries
    Sibo Wang, Renchi Yang, Runhui Wang, Xiaokui Xiao, and 4 more authors
    ACM Trans. Database Syst., 2019
  4. Unsupervised String Transformation Learning for Entity Consolidation
    Dong Deng, Wenbo Tao, Ziawasch Abedjan, Ahmed K. Elmagarmid, and 6 more authors
    In 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8-11, 2019, 2019
  5. Explaining Entity Resolution Predictions: Where are we and What needs to be done?
    Saravanan Thirumuruganathan, Mourad Ouzzani, and Nan Tang
    In Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA@SIGMOD 2019, Amsterdam, The Netherlands, July 5, 2019, 2019
  6. Raha: A Configuration-Free Error Detection System
    Mohammad Mahdavi, Ziawasch Abedjan, Raul Castro Fernandez, Samuel Madden, and 3 more authors
    In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, 2019
  7. ANMAT: Automatic Knowledge Discovery and Error Detection through Pattern Functional Dependencies
    Abdulhakim Ali Qahtan, Nan Tang, Mourad Ouzzani, Yang Cao, and 1 more author
    In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, 2019
  8. Towards Democratizing Relational Data Visualization
    Nan Tang, Eugene Wu, and Guoliang Li
    In Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, 2019
  9. Data civilizer: end-to-end support for data discovery, integration, and cleaning
    Mourad Ouzzani, Nan Tang, and Raul Castro Fernandez
    In Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker, 2019
  10. Deductive Optimization of Relational Data Storage
    John K. Feser, Samuel Madden, Nan Tang, and Armando Solar-Lezama
    CoRR, 2019
  11. Technical Report: Optimizing Human Involvement for Entity Matching and Consolidation
    Ji Sun, Dong Deng, Ihab F. Ilyas, Guoliang Li, and 4 more authors
    CoRR, 2019
  12. Efficient Algorithms for Approximate Single-Source Personalized PageRank Queries
    Sibo Wang, Renchi Yang, Runhui Wang, Xiaokui Xiao, and 4 more authors
    CoRR, 2019
  13. Dataset-On-Demand: Automatic View Search and Presentation for Data Discovery
    Raul Castro Fernandez, Nan Tang, Mourad Ouzzani, Michael Stonebraker, and 1 more author
    CoRR, 2019

2018

  1. DeepEye: An automatic big data visualization framework
    Xuedi Qin, Yuyu Luo, Nan Tang, and Guoliang Li
    Big Data Min. Anal., 2018
  2. RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! -
    Divy Agrawal, Sanjay Chawla, Bertty Contreras-Rojas, Ahmed K. Elmagarmid, and 11 more authors
    Proc. VLDB Endow., 2018
  3. Distributed Representations of Tuples for Entity Resolution
    Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq R. Joty, Mourad Ouzzani, and 1 more author
    Proc. VLDB Endow., 2018
  4. Distilling relations using knowledge bases
    Shuang Hao, Nan Tang, Guoliang Li, Jian Li, and 1 more author
    VLDB J., 2018
  5. DeepEye: Visualizing Your Data by Keyword Search
    Xuedi Qin, Yuyu Luo, Nan Tang, and Guoliang Li
    In Proceedings of the 21st International Conference on Extending Database Technology, EDBT 2018, Vienna, Austria, March 26-29, 2018, 2018
  6. DeepEye: Towards Automatic Data Visualization
    Yuyu Luo, Xuedi Qin, Nan Tang, and Guoliang Li
    In 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018, 2018
  7. Discovering Mis-Categorized Entities
    Shuang Hao, Nan Tang, Guoliang Li, and Jianhua Feng
    In 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018, 2018
  8. Seeping Semantics: Linking Datasets Using Word Embeddings for Data Discovery
    Raul Castro Fernandez, Essam Mansour, Abdulhakim Ali Qahtan, Ahmed K. Elmagarmid, and 5 more authors
    In 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018, 2018
  9. Building Data Civilizer Pipelines with an Advanced Workflow Engine
    Essam Mansour, Dong Deng, Raul Castro Fernandez, Abdulhakim Ali Qahtan, and 8 more authors
    In 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018, 2018
  10. Cleaning Your Wrong Google Scholar Entries
    Shuang Hao, Yi Xu, Nan Tang, Guoliang Li, and 1 more author
    In 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018, 2018
  11. FAHES: Detecting Disguised Missing Values
    Abdulhakim Ali Qahtan, Ahmed K. Elmagarmid, Mourad Ouzzani, and Nan Tang
    In 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018, 2018
  12. FAHES: A Robust Disguised Missing Values Detector
    Abdulhakim Ali Qahtan, Ahmed K. Elmagarmid, Raul Castro Fernandez, Mourad Ouzzani, and 1 more author
    In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018, 2018
  13. DeepEye: Creating Good Data Visualizations by Keyword Search
    Yuyu Luo, Xuedi Qin, Nan Tang, Guoliang Li, and 1 more author
    In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, 2018
  14. Data Curation with Deep Learning [Vision]: Towards Self Driving Data Curation
    Saravanan Thirumuruganathan, Nan Tang, and Mourad Ouzzani
    CoRR, 2018
  15. Reuse and Adaptation for Entity Resolution through Transfer Learning
    Saravanan Thirumuruganathan, Shameem Ahamed Puthiya Parambath, Mourad Ouzzani, Nan Tang, and 1 more author
    CoRR, 2018

2017

  1. Dependable Data Repairing with Fixing Rules
    Jiannan Wang, and Nan Tang
    ACM J. Data Inf. Qual., 2017
  2. Errata for "Lightning Fast and Space Efficient Inequality Joins" (PVLDB 8(13): 2074-2085)
    Zuhair Khayyat, William Lucia, Meghna Singh, Mourad Ouzzani, and 4 more authors
    Proc. VLDB Endow., 2017
  3. Synthesizing Entity Matching Rules by Examples
    Rohit Singh, Venkata Vamsikrishna Meduri, Ahmed K. Elmagarmid, Samuel Madden, and 4 more authors
    Proc. VLDB Endow., 2017
  4. A Novel Cost-Based Model for Data Repairing
    Shuang Hao, Nan Tang, Guoliang Li, Jian He, and 2 more authors
    IEEE Trans. Knowl. Data Eng., 2017
  5. Fast and scalable inequality joins
    Zuhair Khayyat, William Lucia, Meghna Singh, Mourad Ouzzani, and 4 more authors
    VLDB J., 2017
  6. The Data Civilizer System
    Dong Deng, Raul Castro Fernandez, Ziawasch Abedjan, Sibo Wang, and 6 more authors
    In 8th Biennial Conference on Innovative Data Systems Research, CIDR 2017, Chaminade, CA, USA, January 8-11, 2017, Online Proceedings, 2017
  7. A Novel Cost-Based Model for Data Repairing
    Shuang Hao, Nan Tang, Guoliang Li, Jian He, and 2 more authors
    In 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19-22, 2017, 2017
  8. Cleaning Relations Using Knowledge Bases
    Shuang Hao, Nan Tang, Guoliang Li, and Jian Li
    In 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19-22, 2017, 2017
  9. Interactive Data Repairing: the FALCON Dive
    Enzo Veltri, Donatello Santoro, Giansalvatore Mecca, Paolo Papotti, and 3 more authors
    In Proceedings of the 25th Italian Symposium on Advanced Database Systems, Squillace Lido (Catanzaro), Italy, June 25-29, 2017, 2017
  10. UGuide: User-Guided Discovery of FD-Detectable Errors
    Saravanan Thirumuruganathan, Laure Berti-Équille, Mourad Ouzzani, Jorge-Arnulfo Quiané-Ruiz, and 1 more author
    In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, 2017
  11. Generating Concise Entity Matching Rules
    Rohit Singh, Venkata Vamsikrishna Meduri, Ahmed K. Elmagarmid, Samuel Madden, and 4 more authors
    In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, 2017
  12. A Demo of the Data Civilizer System
    Raul Castro Fernandez, Dong Deng, Essam Mansour, Abdulhakim Ali Qahtan, and 8 more authors
    In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, 2017
  13. Entity Consolidation: The Golden Record Problem
    Dong Deng, Wenbo Tao, Ziawasch Abedjan, Ahmed K. Elmagarmid, and 5 more authors
    CoRR, 2017
  14. DeepER - Deep Entity Resolution
    Muhammad Ebraheem, Saravanan Thirumuruganathan, Shafiq R. Joty, Mourad Ouzzani, and 1 more author
    CoRR, 2017

2016

  1. Detecting Data Errors: Where are we and what needs to be done?
    Ziawasch Abedjan, Xu Chu, Dong Deng, Raul Castro Fernandez, and 5 more authors
    Proc. VLDB Endow., 2016
  2. Road to Freedom in Big Data Analytics
    Divy Agrawal, Sanjay Chawla, Ahmed K. Elmagarmid, Zoi Kaoudi, and 5 more authors
    In Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, Bordeaux, France, March 15-16, 2016, Bordeaux, France, March 15-16, 2016, 2016
  3. Interactive and Deterministic Data Cleaning
    Jian He, Enzo Veltri, Donatello Santoro, Guoliang Li, and 3 more authors
    In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, 2016
  4. Graph Stream Summarization: From Big Bang to Big Crunch
    Nan Tang, Qing Chen, and Prasenjit Mitra
    In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, 2016
  5. Rheem: Enabling Multi-Platform Task Execution
    Divy Agrawal, Mouhamadou Lamine Ba, Laure Berti-Équille, Sanjay Chawla, and 11 more authors
    In Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, 2016

2015

  1. KATARA: Reliable Data Cleaning with Knowledge Bases and Crowdsourcing
    Xu Chu, Mourad Ouzzani, John Morcos, Ihab F. Ilyas, and 3 more authors
    Proc. VLDB Endow., 2015
  2. Lightning Fast and Space Efficient Inequality Joins
    Zuhair Khayyat, William Lucia, Meghna Singh, Mourad Ouzzani, and 4 more authors
    Proc. VLDB Endow., 2015
  3. Proof positive and negative in data cleaning
    Matteo Interlandi, and Nan Tang
    In 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, 2015
  4. Big RDF data cleaning
    Nan Tang
    In 31st IEEE International Conference on Data Engineering Workshops, ICDE Workshops 2015, Seoul, South Korea, April 13-17, 2015, 2015
  5. BigDansing: A System for Big Data Cleansing
    Zuhair Khayyat, Ihab F. Ilyas, Alekh Jindal, Samuel Madden, and 5 more authors
    In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, 2015
  6. KATARA: A Data Cleaning System Powered by Knowledge Bases and Crowdsourcing
    Xu Chu, John Morcos, Ihab F. Ilyas, Mourad Ouzzani, and 3 more authors
    In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, 2015
  7. On Summarizing Graph Streams
    Nan Tang, Qing Chen, and Prasenjit Mitra
    CoRR, 2015

2014

  1. Interaction between Record Matching and Data Repairing
    Wenfei Fan, Shuai Ma, Nan Tang, and Wenyuan Yu
    ACM J. Data Inf. Qual., 2014
  2. Conflict resolution with data currency and consistency
    Wenfei Fan, Floris Geerts, Nan Tang, and Wenyuan Yu
    ACM J. Data Inf. Qual., 2014
  3. Incremental Detection of Inconsistencies in Distributed Data
    Wenfei Fan, Jianzhong Li, Nan Tang, and Wenyuan Yu
    IEEE Trans. Knowl. Data Eng., 2014
  4. Big Data Cleaning
    Nan Tang
    In Web Technologies and Applications - 16th Asia-Pacific Web Conference, APWeb 2014, Changsha, China, September 5-7, 2014. Proceedings, 2014
  5. Towards dependable data repairing with fixing rules
    Jiannan Wang, and Nan Tang
    In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014, 2014
  6. NADEEF/ER: generic and interactive entity resolution
    Ahmed K. Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, Jorge-Arnulfo Quiané-Ruiz, and 2 more authors
    In International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014, 2014

2013

  1. NADEEF: A Generalized Data Cleaning System
    Amr Ebaid, Ahmed K. Elmagarmid, Ihab F. Ilyas, Mourad Ouzzani, and 3 more authors
    Proc. VLDB Endow., 2013
  2. Data Quality Problems beyond Consistency and Deduplication
    Wenfei Fan, Floris Geerts, Shuai Ma, Nan Tang, and 1 more author
    In In Search of Elegance in the Theory and Practice of Computation - Essays Dedicated to Peter Buneman, 2013
  3. Inferring data currency and consistency for conflict resolution
    Wenfei Fan, Floris Geerts, Nan Tang, and Wenyuan Yu
    In 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8-12, 2013, 2013
  4. NADEEF: a commodity data cleaning system
    Michele Dallachiesa, Amr Ebaid, Ahmed Eldawy, Ahmed K. Elmagarmid, and 3 more authors
    In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22-27, 2013, 2013

2012

  1. Adding regular expressions to graph reachability and pattern queries
    Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and 1 more author
    Frontiers Comput. Sci., 2012
  2. The data analytics group at the qatar computing research institute
    George Beskales, Gautam Das, Ahmed K. Elmagarmid, Ihab F. Ilyas, and 5 more authors
    SIGMOD Rec., 2012
  3. Towards certain fixes with editing rules and master data
    Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and 1 more author
    VLDB J., 2012
  4. Incremental Detection of Inconsistencies in Distributed Data
    Wenfei Fan, Jianzhong Li, Nan Tang, and Wenyuan Yu
    In IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1-5 April, 2012, 2012

2011

  1. CerFix: A System for Cleaning Data with Certain Fixes
    Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and 1 more author
    Proc. VLDB Endow., 2011
  2. Adding regular expressions to graph reachability and pattern queries
    Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and 1 more author
    In Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11-16, 2011, Hannover, Germany, 2011
  3. Interaction between record matching and data repairing
    Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and 1 more author
    In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12-16, 2011, 2011

2010

  1. Towards Certain Fixes with Editing Rules and Master Data
    Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and 1 more author
    Proc. VLDB Endow., 2010
  2. Graph Pattern Matching: From Intractable to Polynomial Time
    Wenfei Fan, Jianzhong Li, Shuai Ma, Nan Tang, and 2 more authors
    Proc. VLDB Endow., 2010
  3. Projective Distribution of XQuery with Updates
    Ying Zhang, Nan Tang, and Peter A. Boncz
    IEEE Trans. Knowl. Data Eng., 2010

2009

  1. Space-economical partial gram indices for exact substring matching
    Nan Tang, Lefteris Sidirourgos, and Peter A. Boncz
    In Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, November 2-6, 2009, 2009
  2. Materialized View Selection in XML Databases
    Nan Tang, Jeffrey Xu Yu, Hao Tang, M. Tamer Özsu, and 1 more author
    In Database Systems for Advanced Applications, 14th International Conference, DASFAA 2009, Brisbane, Australia, April 21-23, 2009. Proceedings, 2009
  3. Efficient Distribution of Full-Fledged XQuery
    Ying Zhang, Nan Tang, and Peter A. Boncz
    In Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, March 29 2009 - April 2 2009, Shanghai, China, 2009

2008

  1. Fast XML Structural Join Algorithms by Partitioning
    Nan Tang, Jeffrey Xu Yu, Kam-Fai Wong, and Jianxin Li
    J. Res. Pract. Inf. Technol., 2008
  2. Multiple Materialized View Selection for XPath Query Rewriting
    Nan Tang, Jeffrey Xu Yu, M. Tamer Özsu, Byron Choi, and 1 more author
    In Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, April 7-12, 2008, Cancún, Mexico, 2008
  3. Hierarchical Indexing Approach to Support XPath Queries
    Nan Tang, Jeffrey Xu Yu, M. Tamer Özsu, and Kam-Fai Wong
    In Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, April 7-12, 2008, Cancún, Mexico, 2008

2007

  1. Efficient Xpath query processing in native XML databases
    Nan Tang
    Chinese University of Hong Kong, Hong Kong, 2007

2006

  1. Answering XML Queries Using Path-Based Indexes: A Survey
    Kam-Fai Wong, Jeffrey Xu Yu, and Nan Tang
    World Wide Web, 2006
  2. Fast Reachability Query Processing
    Jiefeng Cheng, Jeffrey Xu Yu, and Nan Tang
    In Database Systems for Advanced Applications, 11th International Conference, DASFAA 2006, Singapore, April 12-15, 2006, Proceedings, 2006
  3. Fast Structural Join with a Location Function
    Nan Tang, Jeffrey Xu Yu, Kam-Fai Wong, and Haifeng Jiang
    In Database Systems for Advanced Applications, 11th International Conference, DASFAA 2006, Singapore, April 12-15, 2006, Proceedings, 2006

2005

  1. Accelerating XML Structural Join by Partitioning
    Nan Tang, Jeffrey Xu Yu, Kam-Fai Wong, Kevin Lü, and 1 more author
    In Database and Expert Systems Applications, 16th International Conference, DEXA 2005, Copenhagen, Denmark, August 22-26, 2005, Proceedings, 2005
  2. WIN: An Effcient Data Placement Strategy for Parallel XML Databases
    Nan Tang, Guoren Wang, Jeffrey Xu Yu, Kam-Fai Wong, and 1 more author
    In 11th International Conference on Parallel and Distributed Systems, ICPADS 2005, Fuduoka, Japan, July 20-22, 2005, 2005

2004

  1. Answering XML Twig Queries with Automata
    Bing Sun, Bo Zhou, Nan Tang, Guoren Wang, and 2 more authors
    In Advanced Web Technologies and Applications, 6th Asia-Pacific Web Conference, APWeb 2004, Hangzhou, China, April 14-17, 2004, Proceedings, 2004

2003

  1. Data Placement and Query Processing Based on RPE Parallelisms
    Yaxin Yu, Guoren Wang, Ge Yu, Gang Wu, and 2 more authors
    In 27th International Computer Software and Applications Conference (COMPSAC 2003): Design and Assessment of Trustworthy Software-Based Systems, 3-6 November 2003, Dallas, TX, USA, Proceedings, 2003