DAIS Qual Time & Reading List (Fall 2022) – Data and Information Systems

The written exam of the DAIS Qual Exam in Fall 2022 will be held on Monday, Oct. 3, 2022, at 1pm-5pm in Siebel Center room 3401.

This reading list consists of multiple topic sections, each containing 2-3 papers. The questions in the written exam will be based on the papers listed here, with 1-2 questions related to each section. That is, if a section has two papers, you can usually expect to see one question related to the section in the qual exam, while if a section has three papers, you can usually expect to see two questions related to the section.

Section 1

Alistair Moffat, Joel Mackenzie, Paul Thomas, and Leif Azzopardi. 2022. A Flexible Framework for Offline Effectiveness Metrics. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22). Association for Computing Machinery, New York, NY, USA, 578–587. https://doi.org/10.1145/3477495.3531924 PDF file: http://www.library.illinois.edu/proxy/go.php?url=https://doi.org/10.1145/3477495.3531924
Fernando Diaz and Andres Ferraro. 2022. Offline Retrieval Evaluation Without Evaluation Metrics. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22). Association for Computing Machinery, New York, NY, USA, 599–609. https://doi.org/10.1145/3477495.3532033 PDF file:http://www.library.illinois.edu/proxy/go.php?url=https://doi.org/10.1145/3477495.3532033

Section 2

Ahmed Alaa and Mihaela van Der Schaar. 2020. Discriminative Jackknife: Quantifying Uncertainty in Deep Learning via Higher-Order Influence Functions. In ICML. https://arxiv.org/abs/2007.13481
Jian Kang, Qinghai Zhou, Hanghang Tong: JuryGCN: Quantifying Jackknife Uncertainty on Graph Convolutional Networks. KDD 2022: 742-752. http://jiank2.web.illinois.edu/files/kdd22/kang22jurygcn.pdf
Luca Franceschi, Mathias Niepert, Massimiliano Pontil, and Xiao He. 2019. Learning discrete structures for graph neural networks. In ICML. PMLR, 1972–1982. https://arxiv.org/abs/1903.11960

Section 3

Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. Dissecting racial bias in an algorithm used to manage the health of populations. Science, 366(6464):447–453, 2019. (DOI: 10.1126/science.aax2342)
Rediet Abebe, Solon Barocas, Jon Kleinberg, Karen Levy, Manish Raghavan, and David G. Robinson. Roles for computing in social change. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, FAT* ’20, pages 252–260, New York, NY, USA, 2020. Association for Computing Machinery. (https://dl.acm.org/doi/abs/10.1145/3351095.3372871)

Section 4

Tamari, Ronen and Shani, Chen and Hope, Tom and Petruck, Miriam R L and Abend, Omri and Shahaf, Dafna. 2020. {L}anguage (Re)modelling: {T}owards Embodied Language Understanding. ACL2020. https://aclanthology.org/2020.acl-main.559
Kolluru, Keshav and Mohammed, Muqeeth and Mittal, Shubham and Chakrabarti, Soumen and Mausam. 2022. Alignment-Augmented Consistent Translation for Multilingual Open Information Extraction. ACL2022. https://aclanthology.org/2022.acl-long.179
Sundriyal, Megha and Malhotra, Ganeshan and Akhtar, Md Shad and Sengupta, Shubhashis and Fano, Andrew and Chakraborty, Tanmoy. 2022. Document Retrieval and Claim Verification to Mitigate {COVID}-19 Misinformation. ACL2022. https://aclanthology.org/2022.constraint-1.8

Section 5

Yu Meng, Yunyi Zhang, Jiaxin Huang, Yu Zhang and Jiawei Han, “Topic Discovery via Latent Space Clustering of Language Model Embeddings”, in Proc. The ACM Web Conf. 2022 (WWW’22), April 2022
Jiaming Shen, Yunyi Zhang, Heng Ji and Jiawei Han, “Corpus-based Open-Domain Event Type Induction“, in Proc. 2021 Conf. on Empirical Methods in Natural Language Processing (EMNLP’21), Nov. 2021

Section 6

Fu, Tianfan, Cao Xiao, Cheng Qian, Lucas M. Glass, and Jimeng Sun. 2021. “Probabilistic and Dynamic Molecule-Disease Interaction Modeling for Drug Discovery.” In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 404–14. KDD ’21. https://dl.acm.org/doi/pdf/10.1145/3447548.3467286
Fu, Tianfan, Kexin Huang, Cao Xiao, Lucas M. Glass, and Jimeng Sun. 2022. “HINT: Hierarchical Interaction Network for Trial Outcome Prediction Leveraging Web Data.” Cell Patterns and also at arXiv [cs.CY]. arXiv. http://arxiv.org/abs/2102.04252 .
Huang, Kexin, Cao Xiao, Lucas M. Glass, and Jimeng Sun. 2020. “MolTrans: Molecular Interaction Transformer for Drug–target Interaction Prediction.” Bioinformatics , October. https://doi.org/10.1093/bioinformatics/btaa880.

Section 7

Chockchowwat, Supawit, Chaitanya Sood, and Yongjoo Park. “Airphant: Cloud-oriented Document Indexing.” 2022 IEEE 38th International Conference on Data Engineering (ICDE). IEEE, 2022. https://arxiv.org/pdf/2112.13323.pdf
Chockchowwat, Supawit, Wenjie Liu, and Yongjoo Park. “Automatically Finding Optimal Index Structure.” arXiv preprint arXiv:2208.03823 (2022). https://arxiv.org/pdf/2208.03823.pdf

Section 8

J. Negrea, M. Haghifam, G. K. Dziugaite, A. Khisti, and D. M. Roy (2019). “Information-theoretic generalization bounds for sgld via data-dependent estimates,” NeurIPS, 2019. https://papers.nips.cc/paper/2019/file/05ae14d7ae387b93370d142d82220f1b-Paper.pdf
A. Banerjee, T. Chen, X. Li, and Y. Zhou (2022), “Stability Based Generalization Bounds for Exponential Family Langevin Dynamics,” ICML, 2022. https://proceedings.mlr.press/v162/banerjee22a/banerjee22a.pdf

Section 9

Zhuangdi Zhu, Junyuan Hong, Jiayu Zhou: Data-Free Knowledge Distillation for Heterogeneous Federated Learning. ICML 2021: 12878-12889
Jun Wu, Jingrui He: Domain Adaptation with Dynamic Open-Set Targets. KDD 2022: 2039-2049
Ekdeep Singh Lubana, Chi Ian Tang, Fahim Kawsar, Robert P. Dick, Akhil Mathur: Orchestra: Unsupervised Federated Learning via Globally Consistent Clustering. ICML 2022: 14461-14484

Session 10

Hoyeop Lee, Jinbae Im, Seongwon Jang, Hyunsouk Cho, Sehee Chung: MeLU: Meta-Learned User Preference Estimator for Cold-Start Recommendation. KDD 2019: 1073-1082 PDF: https://arxiv.org/abs/1908.00413
Patrick S. H. Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020 PDF file: https://arxiv.org/abs/2005.11401
Abram Handler, Brendan T. O’Connor: Relational Summarization for Corpus Analysis. NAACL-HLT 2018: 1760-1769 PDF file: https://aclanthology.org/N18-1159.pdf