Spring 2024 Reading List – Data and Information Systems

The written exam of the DAIS Qual Exam in Spring 2024 will be held on Monday, Feb. 26, 2024, at 1pm-5pm in room 0220 Siebel Center, which is in the basement of Siebel Center (floor map: https://facilityaccessmaps.fs.illinois.edu/archibus/schema/ab-products/essential/workplace/?blId=0563&flId=00).

This reading list consists of multiple topic sections, each containing 2-3 papers. The questions in the written exam will be based on the papers listed here, with 1-2 questions related to each section. If a section has two papers, you can usually expect to see one question related to the section in the qual exam, while if a section has three papers, you can usually expect to see two questions related to the section. You only need to answer four of those questions in the exam, so there is no need for you to read every paper. Instead, it would make sense for you to browse through the list and identify 8~10 papers that you are most familiar with or most comfortable with reading, and then focus on reading/digesting those papers. In general, you will likely find some sections to be closer to your interests or background than others, and you can focus more on reading the papers in those a few sections that seem to be closest to your research interests.

Section 1

Yunyi Zhang, Minhao Jiang, Yu Meng, Yu Zhang, Jiawei Han, “PIEClass: Weakly-Supervised Text Classification with Prompting and Noise-Robust Iterative Ensemble Training”, in EMNLP’23 http://hanj.cs.illinois.edu/pdf/emnlp23_yyzhang.pdf

Jiacheng Li, Ming Wang, Jin Li, Jinmiao Fu, Xin Shen, Jingbo Shang, Julian J. McAuley: “Text Is All You Need: Learning Language Representations for Sequential Recommendation, KDD 2023, https://dl.acm.org/doi/pdf/10.1145/3580305.3599519

Yu Zhang, Yunyi Zhang, Yanzhen Shen, Yu Deng, Lucian Popa, Larisa Shwartz, ChengXiang Zhai, and Jiawei Han, “Seed-Guided Fine-Grained Entity Typing in Science and Engineering Domains”, AAAI’24 https://hanj.cs.illinois.edu/pdf/aaai24_yzhang.pdf

Section 2

1. Martin Bichler and Soeren Merting. Randomized scheduling mechanisms: Assigning course seats in a fair and efficient way. Production and Operations Management, 30(10):3540–3559, 2021. https://onlinelibrary.wiley.com/doi/10.1111/poms.13449

2. David M Sommer, Liwei Song, Sameer Wagh, and Prateek Mittal. Athena: Probabilistic verification of machine unlearning. Proc. Privacy Enhancing Technol, 3:268–290, 2022. https://pdfs.semanticscholar.org/0b7d/657748db0f9475d2ba6027795e9cb6e054fd.pdf

3. Yixin Wang, Dawen Liang, Laurent Charlin, and David M. Blei. Causal inference for recommender systems. In Proceedings of the 14th ACM Conference on Recommender Systems, RecSys ’20, pages 426–431, New York, NY, USA, 2020. Association for Computing Machinery. https://dl.acm.org/doi/10.1145/3383313.3412225

Section 3

Khoi Pham and Kushal Kafle and Zhe Lin and Zhihong Ding and Scott Cohen and Quan Tran and Abhinav Shrivastava. 2021. Learning to Predict Visual Attributes in the Wild. Proc. CVPR2021. https://arxiv.org/abs/2106.09707

Nirat Saini and Khoi Pham and Abhinav Shrivastava. 2022. Disentangling Visual Embeddings for Attributes and Objects. Proc. CVPR2022. https://arxiv.org/abs/2205.08536

Hyundo Lee, Inwoo Hwang, Hyunsung Go, Won-Seok Choi, Kibeom Kim, Byoung-Tak Zhang. 2023. Learning Geometry-Aware Representations by Sketching. Proc. CVPR2023. https://arxiv.org/abs/2304.08204

Section 4

Chockchowwat, Supawit, Wenjie Liu, and Yongjoo Park. “AirIndex: Versatile Index Tuning Through Data and Storage.” SIGMOD’24. https://arxiv.org/pdf/2306.14395.pdf

Zhaoheng Li , Pranav Gor , Rahul Prabhu , Hui Yu , Yuzhou Mao, Yongjoo Park. “ElasticNotebook: Enabling Live Migration for Computational Notebooks.” PVLDB’23. https://arxiv.org/pdf/2309.11083.pdf

Section 5

Lin, Z., S. Trivedi, and J. Sun. 2022. “Conformal Prediction with Temporal Quantile Adjustments.” arXiv Preprint arXiv:2205.09940. http://arxiv.org/abs/2205.09940.

Yang, Chaoqi, M. Brandon Westover, and Jimeng Sun. 2023. “BIOT: Cross-Data Biosignal Learning in the Wild.” arXiv [eess.SP]. arXiv. http://arxiv.org/abs/2305.10351.

Theodorou, Brandon, Cao Xiao, and Jimeng Sun. 2023. “Author Correction: Synthesize High-Dimensional Longitudinal Electronic Health Records via Hierarchical Autoregressive Language Model.” Nature Communications 14 (1): 7586. https://www.nature.com/articles/s41467-023-41093-0

Section 6

Daniel Kang, John Guibas, Peter D. Bailis, Tatsunori Hashimoto, Matei Zaharia. TASTI: Semantic Indexes for Machine Learning-based Queries over Unstructured Data. SIGMOD 2022

Daniel Kang, Deepti Raghavan, Peter Bailis, Matei Zaharia. Model Assertions for Monitoring and Improving ML Models. MLSys 2020.

Section 7

Zhu, Yutao, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong Deng, Zhicheng Dou, and Ji-Rong Wen. “Large language models for information retrieval: A survey.” arXiv preprint arXiv:2308.07107 (2023). https://arxiv.org/pdf/2308.07107.pdf

Shuai Wang, Harrisen Scells, Bevan Koopman, and Guido Zuccon. 2023. Can ChatGPT Write a Good Boolean Query for Systematic Review Literature Search? In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’23). Association for Computing Machinery, New York, NY, USA, 1426–1436. https://doi.org/10.1145/3539618.3591703

Section 8

Thomas N. Kipf, Max Welling: Semi-Supervised Classification with Graph Convolutional Networks, 2016, https://arxiv.org/abs/1609.02907

Rawlsgcn: Towards rawlsian difference principle on graph convolutional network J Kang, Y Zhu, Y Xia, J Luo, H Tong Proceedings of the ACM Web Conference 2022, 1214-1225, https://dl.acm.org/doi/abs/10.1145/3485447.3512169

Y Yan, Y Chen, H Chen, M Xu, M Das, H Yang, H Tong: From Trainable Negative Depth to Edge Heterophily in Graphs. NeuIPS 2023

Section 9

Outliers in the ABCD Random Graph Model with Community Structure (ABCD+o) Kaminski,B., Pralat, P., and Theberge, F. (2023

How New Ideas Diffuse in Science (2023) Cheng, M.. Smith, S., and MacFarland, D. American Sociological Review

Section 10

DEER: Descriptive Knowledge Graph for Explaining Entity Relationships. Jie Huang, Kerui Zhu, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-mei Hwu. In The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022. PDF file: https://arxiv.org/abs/2205.10479

Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, Denny Zhou: Self-Consistency Improves Chain of Thought Reasoning in Language Models. ICLR 2023. https://arxiv.org/abs/2203.11171

Section 11

R. Deb, Y. Ban, S. Zuo, J. He, and A. Banerjee, Contextual Bandits with Online Neural Regression, International Conference on Learning Representations (ICLR), 2024.

W. Nei, B. Guo, Y. Huang, C. Xiao, A. Vahdat, A. Anandkumar, Diffusion Models for Adversarial Purification, ICML, 2022.

H. Shah, K. Tamuly, A. Raghunathan, P. Jain, P. Netrapalli, The Pitfalls of Simplicity Bias in Neural Networks, NeurIPS 2020.

Section 12

W. Bao, H. Wang, J. Wu, and J. He. Optimizing the Collaboration Structure in Cross-silo Federated Learning. ICML 2023

J. Wu, W. Bao, E.A. Ainsworth, and J. He. Personalized Federated Learning with Parameter Propagation. KDD 2023

N. Gruver, M. Finzi, S. Qiu, and A.G. Wilson. Large Language Models Are Zero-Shot Time Series Forecasters. NeurIPS 2023

Section 13

Uthsav Chitra, and Christopher Musco. “Analyzing the impact of filter bubbles on social network polarization.” In Proceedings of the 13th International Conference on Web Search and Data Mining (WSDM), pp. 115-123. 2020.

Corrado Monti, Giuseppe Manco, Cigdem Aslay, and Francesco Bonchi. “Learning ideological embeddings from information cascades.” In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 1325-1334. 2021.

Luceri, Luca, Valeria Pantè, Keith Burghardt, and Emilio Ferrara. “Unmasking the web of deceit: Uncovering coordinated activity to expose information operations on twitter.” WWW 2024