Publications
publications by categories in reversed chronological order.
2024
- Controllable Tabular Data Synthesis Using Diffusion ModelsProc. ACM Manag. Data, 2024
- MisDetect: Iterative Mislabel Detection using Early LossProc. VLDB Endow., 2024
- LakeBench: A Benchmark for Discovering Joinable and Unionable Tables in Data LakesProc. VLDB Endow., 2024
- Unicorn: A Unified Multi-Tasking Matching ModelSIGMOD Rec., 2024
- Tabular data synthesis with generative adversarial networks: design space and optimizationsVLDB J., 2024
- VerifAI: Verified Generative AIIn 14th Conference on Innovative Data Systems Research, CIDR 2024, Chaminade, HI, USA, January 14-17, 2024, 2024
- ChatPipe: Orchestrating Data Preparation Pipelines by Optimizing Human-ChatGPT InteractionsIn Companion of the 2024 International Conference on Management of Data, SIGMOD/PODS 2024, Santiago AA, Chile, June 9-15, 2024, 2024
- IDE: A System for Iterative Mislabel DetectionIn Companion of the 2024 International Conference on Management of Data, SIGMOD/PODS 2024, Santiago AA, Chile, June 9-15, 2024, 2024
2023
- Unicorn: A Unified Multi-tasking Model for Supporting Matching Tasks in Data IntegrationProc. ACM Manag. Data, 2023
- Learned Data-aware Image Representations of Line Charts for Similarity SearchProc. ACM Manag. Data, 2023
- HAIPipe: Combining Human-generated and Machine-generated Pipelines for Data PreparationProc. ACM Manag. Data, 2023
- Few-shot Text-to-SQL Translation using Structure and Content Prompt LearningProc. ACM Manag. Data, 2023
- GoodCore: Data-effective and Data-efficient Machine Learning through Coreset Selection over Incomplete DataProc. ACM Manag. Data, 2023
- Road-Aware Indexing for Trajectory Range QueriesIEEE Trans. Knowl. Data Eng., 2023
- HOFD: An Outdated Fact Detector for Knowledge BasesIEEE Trans. Knowl. Data Eng., 2023
- Symphony: Towards Natural Language Query Answering over Multi-modal Data LakesIn 13th Conference on Innovative Data Systems Research, CIDR 2023, Amsterdam, The Netherlands, January 8-11, 2023, 2023
- Efficient Coreset Selection with Cluster-based MethodsIn Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2023, Long Beach, CA, USA, August 6-10, 2023, 2023
- Demystifying Artificial Intelligence for Data PreparationIn Companion of the 2023 International Conference on Management of Data, SIGMOD/PODS 2023, Seattle, WA, USA, June 18-23, 2023, 2023
- Pay "Attention" to Chart Images for What You Read on TextIn Companion of the 2023 International Conference on Management of Data, SIGMOD/PODS 2023, Seattle, WA, USA, June 18-23, 2023, 2023
- RetClean: Retrieval-Based Data Cleaning Using Foundation Models and Data LakesCoRR, 2023
- ChatPipe: Orchestrating Data Preparation Program by Optimizing Human-ChatGPT InteractionsCoRR, 2023
- Interleaving Pre-Trained Language Models and Large Language Models for Zero-Shot NL2SQL GenerationCoRR, 2023
- VerifAI: Verified Generative AICoRR, 2023
- SEED: Simple, Efficient, and Effective Data Management via Large Language ModelsCoRR, 2023
2022
- AlphaQO: Robust Learned Query OptimizerInt. J. Softw. Informatics, 2022
- PrefaceJ. Comput. Sci. Technol., 2022
- Selective Data Acquisition in the Wild for Model ChargingProc. VLDB Endow., 2022
- DADER: Hands-Off Entity Resolution with Domain AdaptationProc. VLDB Endow., 2022
- Coresets over Multiple Tables for Feature-rich and Data-efficient Machine LearningProc. VLDB Endow., 2022
- Self-supervised and Interpretable Data Cleaning with Sequence Generative Adversarial NetworksProc. VLDB Endow., 2022
- Steerable Self-Driving Data VisualizationIEEE Trans. Knowl. Data Eng., 2022
- Natural Language to Visualization by Neural Machine TranslationIEEE Trans. Vis. Comput. Graph., 2022
- Interactively discovering and ranking desired tuples by data explorationVLDB J., 2022
- PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-trainingIn Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, 2022
- Synthesizing Privacy Preserving Entity Resolution DatasetsIn 38th IEEE International Conference on Data Engineering, ICDE 2022, Kuala Lumpur, Malaysia, May 9-12, 2022, 2022
- Feature Augmentation with Reinforcement LearningIn 38th IEEE International Conference on Data Engineering, ICDE 2022, Kuala Lumpur, Malaysia, May 9-12, 2022, 2022
- Domain Adaptation for Deep Entity ResolutionIn SIGMOD ’22: International Conference on Management of Data, Philadelphia, PA, USA, June 12 - 17, 2022, 2022
- PASTA: Table-Operations Aware Fact Verification via Sentence-Table Cloze Pre-trainingCoRR, 2022
2021
- Adaptive Data Augmentation for Supervised Learning over Missing DataProc. VLDB Endow., 2021
- RPT: Relational Pre-trained Transformer Is Almost All You Need towards Democratizing Data PreparationProc. VLDB Endow., 2021
- Deep Learning for Blocking in Entity Matching: A Design Space ExplorationProc. VLDB Endow., 2021
- Automatic Data Acquisition for Deep LearningProc. VLDB Endow., 2021
- Learned Cardinality Estimation: A Design Space Exploration and A Comparative EvaluationProc. VLDB Endow., 2021
- Mis-categorized entities detectionVLDB J., 2021
- Ranking Desired Tuples by Database ExplorationIn 37th IEEE International Conference on Data Engineering, ICDE 2021, Chania, Greece, April 19-22, 2021, 2021
- Synthesizing Natural Language to Visualization (NL2VIS) Benchmarks from NL2SQL BenchmarksIn SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, 2021
- Learned Cardinality Estimation for Similarity QueriesIn SIGMOD ’21: International Conference on Management of Data, Virtual Event, China, June 20-25, 2021, 2021
2020
- DeepEye: A Data Science System for Monitoring and Exploring COVID-19 DataIEEE Data Eng. Bull., 2020
- Deductive optimization of relational data storageProc. ACM Program. Lang., 2020
- Pattern Functional Dependencies for Data CleaningProc. VLDB Endow., 2020
- VisClean: Interactive Cleaning for Progressive VisualizationProc. VLDB Endow., 2020
- DeepTrack: Monitoring and Exploring Spatio-Temporal Data - A Case of Tracking COVID-19 -Proc. VLDB Endow., 2020
- Debugging Large-Scale Data Science Pipelines using DaggerProc. VLDB Endow., 2020
- Making data visualization more efficient and effective: a surveyVLDB J., 2020
- Dagger: A Data (not code) DebuggerIn 10th Conference on Innovative Data Systems Research, CIDR 2020, Amsterdam, The Netherlands, January 12-15, 2020, Online Proceedings, 2020
- Data Curation with Deep LearningIn Proceedings of the 23rd International Conference on Extending Database Technology, EDBT 2020, Copenhagen, Denmark, March 30 - April 02, 2020, 2020
- Interactive Cleaning for Progressive Visualization through Composite QuestionsIn 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20-24, 2020, 2020
- Reinforcement Learning with Tree-LSTM for Join Order SelectionIn 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20-24, 2020, 2020
- Outdated Fact Detection in Knowledge BasesIn 36th IEEE International Conference on Data Engineering, ICDE 2020, Dallas, TX, USA, April 20-24, 2020, 2020
- Interactively Discovering and Ranking Desired Tuples without Writing SQL QueriesIn Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, 2020
- CoClean: Collaborative Data CleaningIn Proceedings of the 2020 International Conference on Management of Data, SIGMOD Conference 2020, online conference [Portland, OR, USA], June 14-19, 2020, 2020
- Relational Pretrained Transformers towards Democratizing Data Preparation [Vision]CoRR, 2020
2019
- Querying Shortest Paths on Time Dependent Road NetworksProc. VLDB Endow., 2019
- Data Civilizer 2.0: A Holistic Framework for Data Preparation and AnalyticsProc. VLDB Endow., 2019
- Efficient Algorithms for Approximate Single-Source Personalized PageRank QueriesACM Trans. Database Syst., 2019
- Unsupervised String Transformation Learning for Entity ConsolidationIn 35th IEEE International Conference on Data Engineering, ICDE 2019, Macao, China, April 8-11, 2019, 2019
- Explaining Entity Resolution Predictions: Where are we and What needs to be done?In Proceedings of the Workshop on Human-In-the-Loop Data Analytics, HILDA@SIGMOD 2019, Amsterdam, The Netherlands, July 5, 2019, 2019
- Raha: A Configuration-Free Error Detection SystemIn Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, 2019
- ANMAT: Automatic Knowledge Discovery and Error Detection through Pattern Functional DependenciesIn Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, 2019
- Towards Democratizing Relational Data VisualizationIn Proceedings of the 2019 International Conference on Management of Data, SIGMOD Conference 2019, Amsterdam, The Netherlands, June 30 - July 5, 2019, 2019
- Data civilizer: end-to-end support for data discovery, integration, and cleaningIn Making Databases Work: the Pragmatic Wisdom of Michael Stonebraker, 2019
- Deductive Optimization of Relational Data StorageCoRR, 2019
- Technical Report: Optimizing Human Involvement for Entity Matching and ConsolidationCoRR, 2019
- Efficient Algorithms for Approximate Single-Source Personalized PageRank QueriesCoRR, 2019
- Dataset-On-Demand: Automatic View Search and Presentation for Data DiscoveryCoRR, 2019
2018
- DeepEye: An automatic big data visualization frameworkBig Data Min. Anal., 2018
- RHEEM: Enabling Cross-Platform Data Processing - May The Big Data Be With You! -Proc. VLDB Endow., 2018
- Distributed Representations of Tuples for Entity ResolutionProc. VLDB Endow., 2018
- Distilling relations using knowledge basesVLDB J., 2018
- DeepEye: Visualizing Your Data by Keyword SearchIn Proceedings of the 21st International Conference on Extending Database Technology, EDBT 2018, Vienna, Austria, March 26-29, 2018, 2018
- DeepEye: Towards Automatic Data VisualizationIn 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018, 2018
- Discovering Mis-Categorized EntitiesIn 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018, 2018
- Seeping Semantics: Linking Datasets Using Word Embeddings for Data DiscoveryIn 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018, 2018
- Building Data Civilizer Pipelines with an Advanced Workflow EngineIn 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018, 2018
- Cleaning Your Wrong Google Scholar EntriesIn 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018, 2018
- FAHES: Detecting Disguised Missing ValuesIn 34th IEEE International Conference on Data Engineering, ICDE 2018, Paris, France, April 16-19, 2018, 2018
- FAHES: A Robust Disguised Missing Values DetectorIn Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018, 2018
- DeepEye: Creating Good Data Visualizations by Keyword SearchIn Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10-15, 2018, 2018
- Data Curation with Deep Learning [Vision]: Towards Self Driving Data CurationCoRR, 2018
- Reuse and Adaptation for Entity Resolution through Transfer LearningCoRR, 2018
2017
- Dependable Data Repairing with Fixing RulesACM J. Data Inf. Qual., 2017
- Errata for "Lightning Fast and Space Efficient Inequality Joins" (PVLDB 8(13): 2074-2085)Proc. VLDB Endow., 2017
- Synthesizing Entity Matching Rules by ExamplesProc. VLDB Endow., 2017
- A Novel Cost-Based Model for Data RepairingIEEE Trans. Knowl. Data Eng., 2017
- Fast and scalable inequality joinsVLDB J., 2017
- The Data Civilizer SystemIn 8th Biennial Conference on Innovative Data Systems Research, CIDR 2017, Chaminade, CA, USA, January 8-11, 2017, Online Proceedings, 2017
- A Novel Cost-Based Model for Data RepairingIn 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19-22, 2017, 2017
- Cleaning Relations Using Knowledge BasesIn 33rd IEEE International Conference on Data Engineering, ICDE 2017, San Diego, CA, USA, April 19-22, 2017, 2017
- Interactive Data Repairing: the FALCON DiveIn Proceedings of the 25th Italian Symposium on Advanced Database Systems, Squillace Lido (Catanzaro), Italy, June 25-29, 2017, 2017
- UGuide: User-Guided Discovery of FD-Detectable ErrorsIn Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, 2017
- Generating Concise Entity Matching RulesIn Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, 2017
- A Demo of the Data Civilizer SystemIn Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD Conference 2017, Chicago, IL, USA, May 14-19, 2017, 2017
- Entity Consolidation: The Golden Record ProblemCoRR, 2017
- DeepER - Deep Entity ResolutionCoRR, 2017
2016
- Detecting Data Errors: Where are we and what needs to be done?Proc. VLDB Endow., 2016
- Road to Freedom in Big Data AnalyticsIn Proceedings of the 19th International Conference on Extending Database Technology, EDBT 2016, Bordeaux, France, March 15-16, 2016, Bordeaux, France, March 15-16, 2016, 2016
- Interactive and Deterministic Data CleaningIn Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, 2016
- Graph Stream Summarization: From Big Bang to Big CrunchIn Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, 2016
- Rheem: Enabling Multi-Platform Task ExecutionIn Proceedings of the 2016 International Conference on Management of Data, SIGMOD Conference 2016, San Francisco, CA, USA, June 26 - July 01, 2016, 2016
2015
- KATARA: Reliable Data Cleaning with Knowledge Bases and CrowdsourcingProc. VLDB Endow., 2015
- Lightning Fast and Space Efficient Inequality JoinsProc. VLDB Endow., 2015
- Proof positive and negative in data cleaningIn 31st IEEE International Conference on Data Engineering, ICDE 2015, Seoul, South Korea, April 13-17, 2015, 2015
- Big RDF data cleaningIn 31st IEEE International Conference on Data Engineering Workshops, ICDE Workshops 2015, Seoul, South Korea, April 13-17, 2015, 2015
- BigDansing: A System for Big Data CleansingIn Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, 2015
- KATARA: A Data Cleaning System Powered by Knowledge Bases and CrowdsourcingIn Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, Melbourne, Victoria, Australia, May 31 - June 4, 2015, 2015
- On Summarizing Graph StreamsCoRR, 2015
2014
- Interaction between Record Matching and Data RepairingACM J. Data Inf. Qual., 2014
- Conflict resolution with data currency and consistencyACM J. Data Inf. Qual., 2014
- Incremental Detection of Inconsistencies in Distributed DataIEEE Trans. Knowl. Data Eng., 2014
- Big Data CleaningIn Web Technologies and Applications - 16th Asia-Pacific Web Conference, APWeb 2014, Changsha, China, September 5-7, 2014. Proceedings, 2014
- Towards dependable data repairing with fixing rulesIn International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014, 2014
- NADEEF/ER: generic and interactive entity resolutionIn International Conference on Management of Data, SIGMOD 2014, Snowbird, UT, USA, June 22-27, 2014, 2014
2013
- NADEEF: A Generalized Data Cleaning SystemProc. VLDB Endow., 2013
- Data Quality Problems beyond Consistency and DeduplicationIn In Search of Elegance in the Theory and Practice of Computation - Essays Dedicated to Peter Buneman, 2013
- Inferring data currency and consistency for conflict resolutionIn 29th IEEE International Conference on Data Engineering, ICDE 2013, Brisbane, Australia, April 8-12, 2013, 2013
- NADEEF: a commodity data cleaning systemIn Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2013, New York, NY, USA, June 22-27, 2013, 2013
2012
- Adding regular expressions to graph reachability and pattern queriesFrontiers Comput. Sci., 2012
- The data analytics group at the qatar computing research instituteSIGMOD Rec., 2012
- Towards certain fixes with editing rules and master dataVLDB J., 2012
- Incremental Detection of Inconsistencies in Distributed DataIn IEEE 28th International Conference on Data Engineering (ICDE 2012), Washington, DC, USA (Arlington, Virginia), 1-5 April, 2012, 2012
2011
- CerFix: A System for Cleaning Data with Certain FixesProc. VLDB Endow., 2011
- Adding regular expressions to graph reachability and pattern queriesIn Proceedings of the 27th International Conference on Data Engineering, ICDE 2011, April 11-16, 2011, Hannover, Germany, 2011
- Interaction between record matching and data repairingIn Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD 2011, Athens, Greece, June 12-16, 2011, 2011
2010
- Towards Certain Fixes with Editing Rules and Master DataProc. VLDB Endow., 2010
- Graph Pattern Matching: From Intractable to Polynomial TimeProc. VLDB Endow., 2010
- Projective Distribution of XQuery with UpdatesIEEE Trans. Knowl. Data Eng., 2010
2009
- Space-economical partial gram indices for exact substring matchingIn Proceedings of the 18th ACM Conference on Information and Knowledge Management, CIKM 2009, Hong Kong, China, November 2-6, 2009, 2009
- Materialized View Selection in XML DatabasesIn Database Systems for Advanced Applications, 14th International Conference, DASFAA 2009, Brisbane, Australia, April 21-23, 2009. Proceedings, 2009
- Efficient Distribution of Full-Fledged XQueryIn Proceedings of the 25th International Conference on Data Engineering, ICDE 2009, March 29 2009 - April 2 2009, Shanghai, China, 2009
2008
- Fast XML Structural Join Algorithms by PartitioningJ. Res. Pract. Inf. Technol., 2008
- Multiple Materialized View Selection for XPath Query RewritingIn Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, April 7-12, 2008, Cancún, Mexico, 2008
- Hierarchical Indexing Approach to Support XPath QueriesIn Proceedings of the 24th International Conference on Data Engineering, ICDE 2008, April 7-12, 2008, Cancún, Mexico, 2008
2007
- Efficient Xpath query processing in native XML databasesChinese University of Hong Kong, Hong Kong, 2007
2006
- Answering XML Queries Using Path-Based Indexes: A SurveyWorld Wide Web, 2006
- Fast Reachability Query ProcessingIn Database Systems for Advanced Applications, 11th International Conference, DASFAA 2006, Singapore, April 12-15, 2006, Proceedings, 2006
- Fast Structural Join with a Location FunctionIn Database Systems for Advanced Applications, 11th International Conference, DASFAA 2006, Singapore, April 12-15, 2006, Proceedings, 2006
2005
- Accelerating XML Structural Join by PartitioningIn Database and Expert Systems Applications, 16th International Conference, DEXA 2005, Copenhagen, Denmark, August 22-26, 2005, Proceedings, 2005
- WIN: An Effcient Data Placement Strategy for Parallel XML DatabasesIn 11th International Conference on Parallel and Distributed Systems, ICPADS 2005, Fuduoka, Japan, July 20-22, 2005, 2005
2004
- Answering XML Twig Queries with AutomataIn Advanced Web Technologies and Applications, 6th Asia-Pacific Web Conference, APWeb 2004, Hangzhou, China, April 14-17, 2004, Proceedings, 2004
2003
- Data Placement and Query Processing Based on RPE ParallelismsIn 27th International Computer Software and Applications Conference (COMPSAC 2003): Design and Assessment of Trustworthy Software-Based Systems, 3-6 November 2003, Dallas, TX, USA, Proceedings, 2003