Spring 2023 – Data and Information Systems

The written exam of the DAIS Qual Exam in Spring 2023 will be held on Monday, Feb. 20, 2023, at 1pm-5pm in room 2407 Siebel Center.

This reading list consists of multiple topic sections, each containing 2-3 papers. The questions in the written exam will be based on the papers listed here, with 1-2 questions related to each section. If a section has two papers, you can usually expect to see one question related to the section in the qual exam, while if a section has three papers, you can usually expect to see two questions related to the section. You only need to answer four of those questions in the exam, so there is no need for you to read every paper. Instead, it would make sense for you to browse through the list and identify 8~10 papers that you are most familiar with or most comfortable with reading, and then focus on reading/digesting those papers. In general, you will likely find some sections to be closer to your interests or background than others, and you can focus more on reading the papers in those a few sections that seem to be closest to your research interests.

Section 1

Yu Zhang, Yunyi Zhang, Martin Michalski, Yucheng Jiang, Yu Meng, and Jiawei Han, “Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts”, in Proc. 2023 ACM Int. Conf. on Web Search and Data Mining (WSDM’23), Feb. 2023

Yizhu Jiao, Sha Li, Yiqing Xie, Ming Zhong, Heng Ji and Jiawei Han, “Open-Vocabulary Argument Role Prediction for Event Extraction”, in Proc. 2022 Conf. on Empirical Methods in Natural Language Processing (EMNLP’22), Dec. 2022

Yu Meng, Jiaxin Huang, Yu Zhang, Jiawei Han, “Generating Training Data with Language Models: Towards Zero-Shot Language Understanding“, in Proc. 2022 Conf. on Neural Information Processing Systems (NeurIPS’22), Nov. 2022

Section 2

Chen, C., Sun, F., Zhang, M., and Ding, B. Recommendation unlearning. In Proceedings of the ACM Web Conference 2022 (New York, NY, USA, 2022), WWW ’22, Association for Computing Machinery, pp. 2768–2777. (https://dl.acm.org/doi/10.1145/3485447.3511997)

Zhang, Y., Feng, F., He, X., Wei, T., Song, C., Ling, G., and Zhang, Y. Causal intervention for leveraging popularity bias in recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (New York, NY, USA, 2021), SIGIR ’21, Association for Computing Machinery, pp. 11–20. (https://dl.acm.org/doi/10.1145/3404835.3462875)

Karimi, A.-H., Schölkopf, B., and Valera, I. Algorithmic recourse: From counterfactual explanations to interventions. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (New York, NY, USA, 2021), FAccT ’21, Association for Computing Machinery, pp. 353–362. (https://dl.acm.org/doi/10.1145/3442188.3445899)

Section 3

Brown et al. 2020. Language Models are Few-Shot Learners. NeurIPS2020. https://arxiv.org/abs/2005.14165

Masahiro Kaneko and Danushka Bollegala. 2021. Debiasing Pre-trained Contextualised Embeddings. Proc. EACL2021. https://aclanthology.org/2021.eacl-main.107.pdf

Yue Guo, Yi Yang, Ahmed Abbasi. 2022. Auto-Debias: Debiasing Masked Language Models with Automated Biased Prompts. ACL2022. https://aclanthology.org/2022.acl-long.72/

Section 4

Vartak, Manasi, et al. “Mistique: A system to store and query model intermediates for model diagnosis.” Proceedings of the 2018 International Conference on Management of Data. 2018. https://www-cs.stanford.edu/~matei/papers/2018/sigmod_mistique.pdf

Li, Feifei, et al. “Wander join: Online aggregation via random walks.” Proceedings of the 2016 International Conference on Management of Data. 2016. http://www.cs.utah.edu/~lifeifei/papers/wanderjoin.pdf

Petersohn, Devin, et al. “Towards Scalable Dataframe Systems.” Proceedings of the VLDB Endowment 13.11. http://www.vldb.org/pvldb/vol13/p2033-petersohn.pdf

Section 5

Zifeng Wang and Jimeng Sun. TransTab: Learning Transferable Tabular Transformers Across Tables. NeurIPS 2022.

Zhen Lin, Shubhendu Trivedi, and Jimeng Sun. Conformal Prediction with Temporal Quantile Adjustments. NeurIPS 2022

Tianfan Fu*, Wenhao Gao*, Connor W. Coley, Jimeng Sun. Reinforced Genetic Algorithm for Structure-based Drug Design. NeurIPS 2022.

Section 6

Arjovsky, Martin, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. “Invariant risk minimization.” https://openreview.net/forum?id=BOz47Bq–NB

Susan Athey, Mohsen Bayati, Nikolay Doudchenko, Guido Imbens & Khashayar Khosravi (2021) Matrix Completion Methods for Causal Panel Data Models, Journal of the American Statistical Association, 116:536, 1716-1730, https://www.tandfonline.com/doi/full/10.1080/01621459.2021.1891924

Amjad, Muhammad, Devavrat Shah, and Dennis Shen. “Robust synthetic control.” The Journal of Machine Learning Research 19, no. 1 (2018): 802-852. https://www.jmlr.org/papers/volume19/17-777/17-777.pdf

Section 7

Min et al. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? EMNLP 2022,

Malkin, Limisiewicz and Stanovsky, Balanced Data Approach for Evaluating Cross-Lingual Transfer: Mapping the Linguistic Blood Bank NAACL 2022

Chakrabarty, Choi and Shwartz. It’s not Rocket Science: Interpreting Figurative Language in Narratives TACL 2022.

Section 8

Chen Xu, Piji Li, Wei Wang, Haoran Yang, Siyun Wang, and Chuangbai Xiao. 2022. COSPLAY: Concept Set Guided Personalized Dialogue Generation Across Both Party Personas. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22). Association for Computing Machinery, New York, NY, USA, 201–211. https://www.library.illinois.edu/proxy/go.php?url=https://doi.org/10.1145/3477495.3531957

Wenqiang Lei, Yao Zhang, Feifan Song, Hongru Liang, Jiaxin Mao, Jiancheng Lv, Zhenglu Yang, and Tat-Seng Chua. 2022. Interacting with Non-Cooperative User: A New Paradigm for Proactive Dialogue Policy. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22). Association for Computing Machinery, New York, NY, USA, 212–222.https://www.library.illinois.edu/proxy/go.php?url=https://doi.org/10.1145/3477495.3532001

Section 9

Zhe Xu, Boxin Du, Hanghang Tong: Graph Sanitation with Application to Node Classification. WWW 2022: 1136-1147. (https://arxiv.org/abs/2105.09384)

Wei Jin, Yao Ma, Xiaorui Liu, Xianfeng Tang, Suhang Wang, and Jiliang Tang. 2020. Graph Structure Learning for Robust Graph Neural Networks. In SIGKDD. ACM, 66–74. (https://arxiv.org/abs/2005.10203)

Section 10

Park, M., Leahey, E. & Funk, R.J. (2023) Papers and patents are becoming less disruptive over time. Nature 613, 138–144 . https://doi.org/10.1038/s41586-022-05543-x [relevant preceding paper DOI: 10.1038/s41586-019-0941-9 ]

Fontana, M., Iori, M., Montobbio, F., and Sinatra, R. (2020) New and atypical combinations: An assessment of novelty and interdisciplinarity (2020) Research Policy, 2020, vol. 49, issue 7 https://doi.org/10.1016/j.respol.2020.104063 [relevant preceding papers are (i) https://doi.org/10.1162/qss_a_00007 (ii) 10.1126/science.12404]

Section 11

Wang, B., Wang, X., Tao, T., Zhang, Q., & Xu, J. (2020). Neural Question Generation with Answer Pivot. Proceedings of the AAAI Conference on Artificial Intelligence, 34(05), 9138-9145.PDF file: https://ojs.aaai.org/index.php/AAAI/article/view/6449

DEER: Descriptive Knowledge Graph for Explaining Entity Relationships. Jie Huang, Kerui Zhu, Kevin Chen-Chuan Chang, Jinjun Xiong, Wen-mei Hwu. In The 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2022. PDF file: https://arxiv.org/abs/2205.10479

Patrick S. H. Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. NeurIPS 2020 PDF file: https://arxiv.org/abs/2005.11401

Section 12

Y. Zhou, X. Li, and A. Banerjee, Noisy Truncated SGD: Optimization and Generalization, SIAM International Conference on Data Mining (SDM), 2022 https://arxiv.org/abs/2103.00075

A. Banerjee, T. Chen, X. Li, Y. Zhou, Stability Based Generalization Bounds for Exponential Family Langevin Dynamics, International Conference on Machine Learning (ICML), 2022. https://arxiv.org/abs/2201.03064