Fall 2025 DAIS Qual Reading List

The written exam of the DAIS Qual Exam in Fall 2025 will be held on Monday,  September 29, 2025, at 1pm-5pm in room 2244 Siebel Center

This reading list consists of  multiple topic sections, each containing 2-3 papers.  The questions in the written exam will be based on the papers listed here, with 1-2 questions related to each section. If a section has two papers, you can usually expect to see one question related to the section in the qual exam, while if a section has three papers, you can usually expect to see two questions related to the section. You only need to answer four of those questions in the exam, so there is no need for you to read every paper. Instead, it would make sense for you to browse through the list and identify up to 4 sections that have papers that you are most familiar with or most comfortable with reading, and then focus on reading/digesting those papers. In general, you will likely find some sections to be closer to your interests or background than others, and you can focus more on reading the papers in those few sections that seem to be closest to your research interests. We will ask all the qual exam participants to submit the four sections that they have chosen and ensure that there will be questions designed based on those selected sections.   

Section 1

  • Daniel Rothchild, Ashwinee Panda, Enayat Ullah, Nikita Ivkin, Ion Stoica, Vladimir Braverman, Joseph Gonzalez, Raman Arora. FetchSGD: Communication-Efficient Federated Learning with Sketching, ICML, 2020 https://arxiv.org/abs/2007.07682
  • M. Shrivastava, B. Isik, Q. Li, S. Koyejo, and A. Banerjee, Sketching for Distributed Deep Learning: A Sharper Analysis, NeurIPS, 2024 . https://openreview.net/pdf?id=0G0VpMjKyV

Section 2

  • Anirudh Ajith, Mengzhou Xia, Alexis Chevalier, Tanya Goyal, Danqi Chen, Tianyu Gao: LitSearch: A Retrieval Benchmark for Scientific Literature Search. EMNLP 2024: 15068-15083 https://arxiv.org/abs/2407.18940
  • G-Retriever: Retrieval-Augmented Generation for Textual Graph Understanding and Question Answering. NeurIPS 2024. https://arxiv.org/abs/2402.07630

Section 3

  • Andy Zeng and Maria Attarian and Brian Ichter and Krzysztof Choromanski and Adrian Wong and Stefan Welker and Federico Tombari and Aveek Purohit and Michael Ryoo and Vikas Sindhwani and Johnny Lee and Vincent Vanhoucke and Pete Florence. 2022. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language. Proc. ICLR2023. https://arxiv.org/abs/2204.00598
  • Zane Durante and Qiuyuan Huang and Naoki Wake and Ran Gong and Jae Sung Park and Bidipta Sarkar and Rohan Taori and Yusuke Noda and Demetri Terzopoulos and Yejin Choi and Katsushi Ikeuchi and Hoi Vo and Li Fei-Fei and Jianfeng Gao. 2024. Agent AI: Surveying the Horizons of Multimodal Interaction. Arxiv. https://arxiv.org/pdf/2401.03568
  • Yilun Du and Igor Mordatch. 2019. Implicit Generation and Generalization in Energy-Based Models. Proc. NeurIPS2019. https://arxiv.org/abs/1903.08689

Section 4

Section 5

Section 6

Section 7

  • Wang, Z. et al. A foundation model for human-AI collaboration in medical literature mining. arXiv [cs.CL] (2025).   https://arxiv.org/abs/2501.16255
  • Wang, Hanyin, Chufan Gao, Bolun Liu, Qiping Xu, Guleid Hussein, Mohamad El Labban, Kingsley Iheasirim, Hariprasad Korsapati, Chuck Outcalt, and Jimeng Sun. 2024. “Adapting Open-Source Large Language Models for Cost-Effective, Expert-Level Clinical Note Generation with on-Policy Reinforcement Learning.” arXiv [Cs.CL]. arXiv. http://arxiv.org/abs/2405.00715.

Section 8  

  • Xiao Lin, Zhichen Zeng, Tianxin Wei, Zhining Liu, Yuzhong Chen, Hanghang Tong: CATS: Mitigating Correlation Shift for Multivariate Time Series Classification. CoRR abs/2504.04283 (2025). https://arxiv.org/abs/2504.04283
  • Qi Yu, Zhichen Zeng, Yuchen Yan, Lei Ying, R. Srikant, Hanghang Tong: Joint Optimal Transport and Embedding for Network Alignment. WWW 2025: 2064-2075. https://arxiv.org/abs/2502.19334
  • Peter Halmos, Julian Gold, Xinhao Liu, Benjamin J. Raphael: Hierarchical Refinement: Optimal Transport to Infinity and Beyond. ICML 2025. https://arxiv.org/abs/2503.03025

Section 9

  • ChengXiang Zhai. 2024. Large Language Models and Future of Information Retrieval: Opportunities and Challenges. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). Association for Computing Machinery, New York, NY, USA, 481–490. https://doi.org/10.1145/3626772.3657848 
  • ChengXiang Zhai. 2025. Information Retrieval for Artificial General Intelligence: A New Perspective of Information Retrieval Research. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’25). Association for Computing Machinery, New York, NY, USA, 3876–3886. https://doi.org/10.1145/3726302.3730349

Section 10

  • Bao, H. and Teplitsky, M. (2024) A simulation-based analysis of the impact of rhetorical citations in science Nature Communications 10.1038/s41467-023-44249-0 
  • Touwen et al. (2024) Learning the mechanisms of network growth. Scientific Reports 10.1038/s41598-024-61940-4
  • Liang et al. (2024) Can Large Language Models Provide Useful Feedback on Research Papers? A Large-Scale Empirical Analysis NEJM AI 10.1056/AIoa2400196 

Section 11

  • Priyanka Kargupta, Ishika Agarwal, Tal August, Jiawei Han, “Tree-of-Debate: Multi-Persona Debate Trees Elicit Critical Thinking for Scientific Comparative Analysis”, Proc. 2025 Annual Meeting of the Association for Computational Linguistics (ACL 2025), Vienna, Austria, July 2025 https://arxiv.org/abs/2502.14767  
  • Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan O Arik, Dong Wang, Hamed Zamani, and Jiawei Han, “Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning”, in Proc. 2025 Conf. on Language Modeling (COLM’2025), Montreal, Canada, Oct. 2025 https://arxiv.org/abs/2503.09516v4

Section 12

  •  Kevin Muyuan Xia, Yushu Pan, and Elias Bareinboim. Neural causal models for counterfactual identification and estimation. In The Eleventh International Conference on Learning Representations, 2023. https://arxiv.org/pdf/2210.00035 
  • An Zhang, Fangfu Liu, Wenchang Ma, Zhibo Cai, Xiang Wang, and Tat-Seng Chua. Boosting causal discovery via adaptive sample reweighting. In The Eleventh International Conference on Learning Representations, 2023. https://arxiv.org/pdf/2303.03187 

Section 13

  • Feng, Tao, Yexin Wu, Guanyu Lin, and Jiaxuan You. “Graph World Model.” arXiv preprint arXiv:2507.10539 (ICML 2025). https://arxiv.org/pdf/2507.10539
  • Yu, Haofei, Zhaochen Hong, Zirui Cheng, Kunlun Zhu, Keyang Xuan, Jinwei Yao, Tao Feng, and Jiaxuan You. “Researchtown: Simulator of human research community.” (ICML 2025). https://arxiv.org/pdf/2412.17767

Data and Information Systems
Email: yongjoo@g.illinois.edu