The written exam of the DAIS Qual Exam in Fall 2025 will be held on Monday, March 2, 2026, at 1pm-5pm in Siebel Center. The room is still to be decided.
This reading list consists of multiple topic sections, each containing 2-3 papers. The questions in the written exam will be based on the papers listed here, with 1-2 questions related to each section. If a section has two papers, you can usually expect to see one question related to the section in the qual exam, while if a section has three papers, you can usually expect to see two questions related to the section. You only need to answer four of those questions in the exam, so there is no need for you to read every paper. Instead, it would make sense for you to browse through the list and identify up to 4 sections that have papers that you are most familiar with or most comfortable with reading, and then focus on reading/digesting those papers. In general, you will likely find some sections to be closer to your interests or background than others, and you can focus more on reading the papers in those few sections that seem to be closest to your research interests. We will ask all the qual exam participants to submit the four sections that they have chosen and ensure that there will be questions designed based on those selected sections.
Section 1
- Manuel Faysse, Hugues Sibille, Tony Wu, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo: ColPali: Efficient Document Retrieval with Vision Language Models. ICLR 2025 PDF
- Omar Khattab, Matei Zaharia: ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT. SIGIR 2020: 39-48 PDF
- Jonas Geiping, Sean McLeish, Neel Jain, John Kirchenbauer, Siddharth Singh, Brian R. Bartoldson, Bhavya Kailkhura, Abhinav Bhatele, Tom Goldstein: Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach. PDF
Section 2
- J. Zou, Y. Ban, Z. Li, Y. Qi, R. Qiu, L. Yang, and J. He. Transformer Copilot: Learning from The Mistake Log in LLM Fine-tuning. NeurIPS 2025. https://openreview.net/pdf?id=MRvxlTlkNQ
- J. Zou, L. Yang, J. Gu, J. Qiu, K. Shen, J. He, and M. Wang. Trajectory-aware PRMs for Long CoT Reasoning. NeurIPS 2025. https://openreview.net/pdf/88d9323f85c36e29c52ce3a3cae948c2b2598eb2.pdf
- Jingyuan Wang, Yankai Chen, Zhonghang Li, Chao Huang. LightReasoner: Can Small Language Models Teach Large Language Models Reasoning? https://arxiv.org/pdf/2510.07962
Section 3
- Zihan Qiu et al., 2025. Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink-Free. NeurIPS2025 Best Paper Award. https://arxiv.org/abs/2505.06708
- Revanth Gangi Reddy, Tanay Dixit, Jiaxin Qin, Cheng Qian, Daniel Lee, Jiawei Han, Kevin Small, Xing Fan, Ruhi Sarikaya and Heng Ji. 2026. WiNELL: Wikipedia Never-Ending Updating with LLM Agents. Proc. The ACM Web Conference 2026 (WWW2026). https://arxiv.org/abs/2508.03728
- Ranjan Sapkotaa, Yang Caob, Konstantinos I. Roumeliotisc, Manoj Karkeea. 2025. Vision-Language-Action Models: Concepts, Progress, Applications and Challenges. CVPR2025. https://arxiv.org/abs/2505.04769
Section 4
- Su, C., et al. (2025) Diffusion Models for Time Series Forecasting: A Survey https://arxiv.org/abs/2507.14507
- Bonnaire, T., et. al. (2025) Why Diffusion Models Don’t Memorize: The Role of Implicit Dynamical Regularization in Training. NeurIPS 2025 Best Paper. https://arxiv.org/abs/2505.17638
- Xu, M., et. al. (2025) LSM-2: Learning from Incomplete Wearable Sensor Data. https://arxiv.org/abs/2506.05321v1
Section 5
- Velloso, E., & Hornbæk, K. (2025, April). Theorising in HCI using Causal Models. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (pp. 1-17). https://dl.acm.org/doi/10.1145/3706598.3713789
- Shen, H., Clark, N., & Mitra, T. (2025, November). Mind the Value-Action Gap: Do LLMs Act in Alignment with Their Values?. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (pp. 3097-3118). https://aclanthology.org/2025.emnlp-main.154/
- Zou, H., Wang, P., Yan, Z., Sun, T., & Xiao, Z. (2024). Can llm” self-report”?: Evaluating the validity of self-report scales in measuring personality design in llm-based chatbots. In Proceedings of the 20225 Conference on Language Modeling (COLM). https://arxiv.org/abs/2412.00207
Section 6
- Jiang, P. et al. DeepRetrieval: Hacking real search engines and retrievers with Large Language Models via reinforcement learning. arXiv [cs.IR] (2025) https://arxiv.org/abs/2503.00223
- Wu, J., Cross, A. & Sun, J. RDMA: Cost effective agent-driven rare disease discovery within electronic health record systems. arXiv [cs.LG] (2025) https://arxiv.org/abs/2507.15867
- Wang, H. et al. Reinforcement learning for out-of-distribution reasoning in LLMs: An empirical study on diagnosis-Related Group coding. arXiv [cs.LG] (2025) https://arxiv.org/abs/2505.21908
Section 7
- Z. Liu, et al: Breaking Silos: Adaptive Model Fusion Unlocks Better Time Series Forecasting. ICML 2025. https://openreview.net/pdf?id=sdFRCRk1pP
- Z. Xu et al: Fine-Grained Graph Rationalization. CIKM 2025: 3708-3719. https://dl.acm.org/doi/10.1145/3746252.3761307
- Z. Xu et al: How to make LLMs strong node classifiers?, arXiv preprint arXiv:2410.02296. https://arxiv.org/pdf/2410.02296
Section 8
- Graph World Model (https://arxiv.org/abs/2507.10539)
- ResearchTown: Simulator of Human Research Community (https://arxiv.org/abs/2412.17767)
- GraphRouter: A Graph-based Router for LLM Selections (https://arxiv.org/abs/2410.03834)
Section 9
- ChengXiang Zhai. 2024. Large Language Models and Future of Information Retrieval: Opportunities and Challenges. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). Association for Computing Machinery, New York, NY, USA, 481–490. https://doi.org/10.1145/3626772.3657848
- ChengXiang Zhai. 2025. Information Retrieval for Artificial General Intelligence: A New Perspective of Information Retrieval Research. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’25). Association for Computing Machinery, New York, NY, USA, 3876–3886. https://doi.org/10.1145/3726302.3730349
- Gao, Yunfan, Yun Xiong, Meng Wang, and Haofen Wang. “Modular rag: Transforming rag systems into lego-like reconfigurable frameworks.” arXiv preprint arXiv:2407.21059 (2024). https://arxiv.org/abs/2407.21059
Section 10
- Basu et al. (2024) Covering a Graph with Dense Subgraph Families, via Triangle-Rich Sets Proceedings of the 33rd ACM International Conference on Information and Knowledge Management https://dx.doi.org/10.1145/3627673.3679578
- Chopra et al. (2025) On the Limits of Agency in Agent-based Models AAMAS ’25: Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems Pages 500 – 509 https://dl.acm.org/doi/10.5555/3709347.3743565
Section 11
- Siru Ouyang, Jun Yan, I-Hung Hsu, Yanfei Chen, Ke Jiang, Zifeng Wang, Rujun Han, Long Le, Samira Daruki, Xiangru Tang, Vishy Tirumalashetty, George Lee, Mahsan Rofouei, Hangfei Lin, Jiawei Han, Chen-Yu Lee, Tomas Pfister, “ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory”, in Proc. 2026 Int. Conf. on Learning Representations (ICLR 2026) https://arxiv.org/abs/2509.25140
- Pengcheng Jiang, Xueqiang Xu, Jiacheng Lin, Jinfeng Xiao, Zifeng Wang, Jimeng Sun, Jiawei Han, “s3: You Don’t Need That Much Data to Train a Search Agent via RL”, in Proc. 2025 Conf. on Empirical Methods in Natural Language Processing (EMNLP’2025) https://arxiv.org/abs/2505.14146
Section 12
- Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First (https://www.cidrdb.org/papers/2026/p32-liu.pdf)
- Pervasive Annotation Errors Break Text-to-SQL Benchmarks and Leaderboards (https://arxiv.org/abs/2601.08778)
Section 13
- Li, Zhaoheng, et al. “Sieve: Effective filtered vector search with collection of indexes.” arXiv preprint arXiv:2507.11907 (2025). https://arxiv.org/pdf/2507.11907
- Fang, Hanxi, et al. “Enhancing Computational Notebooks with Code+ Data Space Versioning.” Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 2025. https://dl.acm.org/doi/pdf/10.1145/3706598.3714141
Section 14
- Zhijing Jin, Yuen Chen, Felix Leeb, Luigi Gresele, Ojasv Kamal, Zhiheng Lyu, Kevin Blin, Fernando Gonzalez Adauto, Max Kleiman-Weiner, Mrinmaya Sachan, et al. CLadder: Assessing causal reasoning in language models. Advances in Neural Information Processing Systems, 36:31038–31065, 2023. (https://arxiv.org/pdf/2312.04350.pdf)
- An Zhang, Fangfu Liu, Wenchang Ma, Zhibo Cai, Xiang Wang, and Tat-Seng Chua. Boosting differentiable causal discovery via adaptive sample reweighting. In The Eleventh International Conference on Learning Representations, 2023. (https://openreview.net/pdf?id=LNpMtk15AS4)
Section 15
- M. Shrivastava, B. Isik, Q. Li, S. Koyejo, and A. Banerjee, Sketching for Distributed Deep Learning: A Sharper Analysis, NeurIPS, 2024.https://openreview.net/pdf?id=0G0VpMjKyV
- Zhijie Chen, Qiaobo Li, and Arindam Banerjee, Sketched Adaptive Distributed Deep Learning: A Sharp Convergence Analysis, NeurIPS, 2025. https://openreview.net/forum?id=XIeE8jbM4K&referrer=%5Bthe%20profile%20of%20Zhijie%20Chen%5D(%2Fprofile%3Fid%3D~Zhijie_Chen2)