The written exam of the DAIS Qual Exam in Fall 2024 will be held on Friday, Oct. 4, 2024, at 3pm-7pm in room 0220 Siebel Center (tentative), which is in the basement of Siebel Center (floor map: https://facilityaccessmaps.fs.illinois.edu/archibus/schema/ab-products/essential/workplace/?blId=0563&flId=00).
This reading list consists of multiple topic sections, each containing 2-3 papers. The questions in the written exam will be based on the papers listed here, with 1-2 questions related to each section. If a section has two papers, you can usually expect to see one question related to the section in the qual exam, while if a section has three papers, you can usually expect to see two questions related to the section. You only need to answer four of those questions in the exam, so there is no need for you to read every paper. Instead, it would make sense for you to browse through the list and identify up to 4 sections that have papers that you are most familiar with or most comfortable with reading, and then focus on reading/digesting those papers. In general, you will likely find some sections to be closer to your interests or background than others, and you can focus more on reading the papers in those a few sections that seem to be closest to your research interests.
Section 1
- Mengting Wan, Tara Safavi, Sujay Kumar Jauhar, Yujin Kim, Scott Counts, Jennifer Neville, Siddharth Suri, Chirag Shah, Ryen W. White, Longqi Yang, Reid Andersen, Georg Buscher, Dhruv Joshi, Nagu Rangan, “TnT-LLM: Text Mining at Scale with Large Language Models”, in KDD 2024, https://dl.acm.org/doi/pdf/10.1145/3637528.3671647
- Siru Ouyang, Jiaxin Huang, Pranav Pillai, Yunyi Zhang, Yu Zhang, Jiawei Han, “Ontology Enrichment for Effective Fine-grained Entity Typing”, in KDD’24 https://doi.org/10.48550/arXiv.2310.07795
Section 2
- Wang, Yixin and David M. Blei (2019). “The Blessings of Multiple Causes”. In: Journal of the American Statistical Association 114.528, pages 1574–1596 (https://doi.org/10.1080/01621459.2019.1686987)
- Karthika Mohan and Judea Pearl. Graphical models for processing missing data. Journal of the American Statistical Association, 116(534):1023–1037, 2021. (https://doi.org/10.1080/01621459.2021.1874961)
Section 3
- Tianheng Cheng, Lin Song, Yixiao Ge, Wenyu Liu, Xinggang Wang, Ying Shan. 2024. YOLO-World: Real-Time Open-Vocabulary Object Detection. Proc. CVPR2024. https://arxiv.org/abs/2401.17270
- Matthew Kowal, Achal Dave, Rares Ambrus, Adrien Gaidon, Konstantinos G. Derpanis, Pavel Tokmakov. 2024. Understanding Video Transformers via Universal Concept Discovery. Proc. CVPR2024. https://arxiv.org/abs/2401.10831
- OpenAI. 2024. GPT-4V Technical Report. https://openai.com/index/gpt-4-research/
Section 4
- Zhaoheng Li, Supawit Chockchowwat, Ribhav Sahu, Areet Sheth, Yongjoo Park. “Kishu: Time-Traveling for Computational Notebooks.” arxiv’24. https://arxiv.org/abs/2406.13856
- Devin Petersohn, Stephen Macke, Doris Xin, William Ma, Doris Lee, Xiangxi Mo, Joseph E. Gonzalez, Joseph M. Hellerstein, Anthony D. Joseph, Aditya Parameswaran. “Towards scalable dataframe systems.” PVLDB’20 https://arxiv.org/pdf/2001.00888
Section 5
- Jiang, Pengcheng, Cao Xiao, Zifeng Wang, Parminder Bhatia, Jimeng Sun, and Jiawei Han. 2024. “TriSum: Learning Summarization Ability from Large Language Models with Structured Rationale.” arXiv [Cs.CL]. arXiv. http://arxiv.org/abs/2403.10351.
- Wang, Hanyin, Chufan Gao, Christopher Dantona, Bryan Hull, and Jimeng Sun. 2024. “DRG-LLaMA : Tuning LLaMA Model to Predict Diagnosis-Related Group for Hospitalized Patients.” NPJ Digital Medicine 7 (1): 16.
Section 6
- Daniel Kang, John Guibas, Peter D. Bailis, Tatsunori Hashimoto, Matei Zaharia. TASTI: Semantic Indexes for Machine Learning-based Queries over Unstructured Data. SIGMOD 2022
- Daniel Kang, Deepti Raghavan, Peter Bailis, Matei Zaharia. Model Assertions for Monitoring and Improving ML Models. MLSys 2020.
- Liana Patel, Siddharth Jha, Carlos Guestrin, Matei Zaharia. LOTUS: Enabling Semantic Queries with LLMs Over Tables of Unstructured and Structured Data.
Section 7
- Florin Cuconasu, Giovanni Trappolini, Federico Siciliano, Simone Filice, Cesare Campagnano, Yoelle Maarek, Nicola Tonellotto, and Fabrizio Silvestri. 2024. The Power of Noise: Redefining Retrieval for RAG Systems. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). Association for Computing Machinery, New York, NY, USA, 719–729. https://doi.org/10.1145/3626772.3657834
- Alireza Salemi, Surya Kallumadi, and Hamed Zamani. 2024. Optimization Methods for Personalizing Large Language Models through Retrieval Augmentation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). Association for Computing Machinery, New York, NY, USA, 752–762. https://doi.org/10.1145/3626772.3657783
Section 8
- Lyu, Hanjia, et al. “Llm-rec: Personalized recommendation via prompting large language models.” arXiv preprint arXiv:2307.15780 (2023). https://arxiv.org/abs/2307.15780
- Ye, Ruosong, et al. “Language is all a graph needs.” Findings of the Association for Computational Linguistics: EACL 2024. 2024. https://arxiv.org/abs/2308.07134
Section 9
- Outliers in the ABCD Random Graph Model with Community Structure (ABCD+o) Kaminski,B., Pralat, P., and Theberge, F. (2023)
- How New Ideas Diffuse in Science (2023) Cheng, M.. Smith, S., and MacFarland, D. American Sociological Review
Section 10
- Nishant Balepur, Jie Huang, Kevin Chen-Chuan Chang: Expository Text Generation: Imitate, Retrieve, Paraphrase. EMNLP 2023: 11896-11919. https://arxiv.org/abs/2305.03276
- DEER: Descriptive Knowledge Graph for Explaining Entity Relationships. Jie Huang, Kerui Zhu, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-mei Hwu. In The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022. PDF file: https://arxiv.org/abs/2205.10479
- Zhihong Shao, Yeyun Gong, Yelong Shen, Minlie Huang, Nan Duan, and Weizhu Chen. 2023. Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9248–9274, Singapore. Association for Computational Linguistics. https://aclanthology.org/2023.findings-emnlp.620/
Section 11
- Daniel Rothchild, Ashwinee Panda, Enayat Ullah, Nikita Ivkin, Ion Stoica, Vladimir Braverman, Joseph Gonzalez, Raman Arora, FetchSGD: Communication-Efficient Federated Learning with Sketching, International Conference on Machine Learning (ICML), 2020.
- Yingxue Zhou, Steven Wu, and Arindam Banerjee, Bypassing the Ambient Dimension: Private SGD with Gradient Subspace Identification, International Conference on Learning Representations (ICLR), 2021.