Virginia Tech® home
Bill_Ingram
Professional Experience:
  • 2018– Assistant Dean and Director for IT (Assistant Professor), University Libraries, Virginia Tech
  • 2016–2018 Director for IT Services (Lecturer), University Libraries, Virginia Tech
  • 2014–2016 Manager, Scholarly Communication and Repository Services, Library, University of Illinois at Urbana-Champaign (UIUC)
  • 2010–2014 Research Programmer, Library, UIUC
  • 2008–2010 Visiting Research Programmer, Library, UIUC
  • 2005–2007 Assistant Director for Programs, Rare Book School, University of Virginia
Participation in Professional Organizations
  • 2021 Member, Program Committee, ACM/IEEE Joint Conference on Digital Libraries in 2021 (JCDL ’21), Urbana-Champaign, Illinois.
  • 2020 Member, Reviewing Committee, ETD2020 Virtual Conference, Al Ain, Abu Dhabi, United Arab Emirates
  • 2020 Member, Program Committee, ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL ’20), Wuhan, China.
  • 2020 Subreviewer, 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2020), Xi'an, China.
  • 2019– Member, Board of Directors, Networked Digital Library of Theses and Dissertations (NDLTD).
  • 2018 Member, Program Committee, ACM/IEEE Joint Conference on Digital Libraries in 2018 (JCDL ’18), Fort Worth, Texas
  • 2018– Representative, Coalition for Networked Information (CNI)
  • 2015–2016 Member, DSpace Leadership Committee, DuraSpace
  • 2014–2016 Member, CSV on the Web, W3C Working Group
Education:
  • Ph.D. (in progress), Computer Science, Virginia Tech, 2024 (expected)
  • M.S., Library and Information Science, University of Illinois at Urbana-Champaign, 2008
  • B.A., Cognitive Science, University of Virginia, 2005
Research:

My research explores the application of computational methods and techniques to large-scale digital collections held by libraries and archives. Specifically, I work within the fields of digital libraries and information retrieval to mine knowledge from electronic theses and dissertations.

Dissertation:

Despite the huge number of Electronic Theses and Dissertations (ETDs) publicly available online, research done by graduate students is insufficiently utilized. We lack the computational models, tools, and services for discovering and accessing the knowledge buried in these long documents. ETDs contain novel ideas and findings that make a significant contribution to the students' subject areas. They often contain extensive bibliographies and literature reviews, as well as useful graphs, figures, and tables. Much important knowledge and scientific data lie hidden in ETDs, but we need better tools to mine the content and facilitate the identification, discovery, and reuse of these important components. To address this problem, this project develops sophisticated textual analytics, natural language processing, and information extraction methods to identify and extract key components of ETDs containing important knowledge that would otherwise remain buried in these long documents. We investigate techniques and build predictive models to automatically classify and summarize these extracted components. In doing so, we aim to answer the following fundamental research questions: (1) How can we effectively identify and extract key parts of ETDs such as chapters, literature reviews, bibliographies, graphs, tables, and figures? (2) How can we develop effective classification and summarization services for ETDs at the chapter level? (3) How can we use these services to enrich the user experience for digital libraries of ETDs? Text analytics presents a novel way to connect text with language understanding. By investigating analytical methods for extracting, classifying, and summarizing the knowledge contained in ETDs, our research demonstrates how intensive computational analysis of digital collections can provide more effective access to book-length documents, increase the impact of graduate research, and help libraries meet the evolving needs of the communities they serve.

Problem Areas:
  • We don't know how to provide effective access to digital books and other book-length documents in digital libraries.
  • Research done by graduate students is insufficiently utilized.
  • Research libraries aren't meeting the needs of the communities they serve.

Despite the huge number of books held in digital libraries, there is a lack of computational models, tools, and services for discovering and accessing the knowledge they contain. Current models are limited to basic metadata and full-text search. We need better tools to mine the knowledge and scientific data buried inside books and other book-length documents, like theses and dissertations. Graduate students write theses and dissertations, and they go into digital libraries. But their work is not read or cited as much as shorter forms of research output, like journal articles and conference proceedings. More generally, the needs of students, researchers, and others in the academic community are quickly evolving due to rapid advancements in digital technology. Libraries struggle to evolve alongside the communities they serve.

Sponsored Projects:
  • Ensuring Scholarly Access to Government Records and Archives (1910-07229) to support a convening of experts to address machine-learning techniques to enhance public access to government records. Andrew W. Mellon Foundation. 2020. $44,000. PI: William A. Ingram.
  • Opening Books and the National Corpus of Graduate Research (LG-37-19-0078-19) to bring computational access to book-length documents, through a research and piloting effort employing Electronic Theses and Dissertations (ETDs). Institute of Muesum and Library Services, National Leadership Grants—Libraries. 2019. $505,214. PI: William A. Ingram co-PIs: Edward A. Fox and Jian Wu.

Papers in refereed journals

  • Liuqing Li, Jack Geissinger, William A. Ingram, Edward A. Fox. "Teaching Natural Language Processing through Big Data Text Summarization with Problem-Based Learning." Data and Information Management, ISSN:2543-9251, 4(1): 18-43, March 24, 2020, 10.2478/dim-2020-0003, https://content.sciendo.com/downloadpdf/journals/dim/4/1/article-p18.xml
  • William A. Ingram, Bipasha Banerjee, and Edward A. Fox. "Summarizing ETDs with deep learning." Cadernos BAD 1 (2019): 46-52 https://www.bad.pt/publicacoes/index.php/cadernos/article/view/2014
  • Colleen Fallaw, Elise Dunham, Elizabeth Wickes, Dena Strong, Ayla Stein, Qian Zhang, Kyle Rimkus, William A. Ingram, and Heidi J. Imker. 2016. "Overly Honest Data Repository Development." The Code4Lib Journal, no. 34, http://journal.code4lib.org/articles/11980
  • Thomas Habing, Janet Eke, Matthew A. Cordial, William Ingram, and Robert Manaster. 2009. Developments in Digital Preservation at the University of Illinois: The Hub and Spoke Architecture for Supporting Repository Interoperability and Emerging Preservation Standards. Library Trends 57, 3 (May 2009), 556–579. https://doi.org/10.1353/lib.0.0052

Papers and posters presented at professional meetings

  • Aman Ahuja, William A. Ingram (presenting), Chenyu Mao, Chongyu He, Jianchi Wei, Edward A. Fox. "Analyzing and Navigating ETDs Using Topic Models." Paper presented at the 25h International Symposium on Electronic Theses and Dissertations. Sept 7-9, 2022. Novi Sad, Serbia. https://etd2022.uns.ac.rs/
  • Bipasha Banerjee (presenting), William A. Ingram and Jian Wu and Ed Fox. "Applications of Mining ETDs." Paper presented at the 24rd International Symposium on Electronic Theses and Dissertations. Nov 15-17, 2021. Virtual. https://doi.org/10.26226/morressier.614c9b8c87a68d83cb5d59b2
  • William A. Ingram, Sylvester A. Johnson, and Pamela Wright. "Applications of Mining ETDs." Paper presented at the 24rd International Symposium on Electronic Theses and Dissertations. Nov 15-17, 2021. Virtual. https://doi.org/10.26226/morressier.614c9b8c87a68d83cb5d59b2
  • Yinlin Chen and William A. Ingram. Why and How We Went Serverless, and How You Can Too. CNI: Coalition for Networked Information Spring 2021 Membership Meeting, March 15–21, 2021. Virtual. https://www.cni.org/topics/digital-curation/why-and-how-we-went-serverless-and-how-you-can-too
  • William A. Ingram. Mining ETDs for Trends in Graduate Research. CNI: Coalition for Networked Information Fall 2020 Membership Meeting, November 12, 2020. Virtual. https://www.cni.org/topics/electronic-theses-dissertations-etds/mining-etds-for-trends-in-graduate-research
  • William A. Ingram and Edward A. Fox (co-presenting). Preparing code and data for computational reproducibility, ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL ’20), half-day during 1-5 August, Wuhan, China
  • Edward Fox and William Ingram (co-presenting). Introduction to Digital Libraries, ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL ’20), half-day during 1-5 August, Wuhan, China
  • William A. Ingram. Bringing Computational Access to Book-length Documents Via an ETD Pilot. CNI: Coalition for Networked Information Fall 2019 Membership Meeting. December 9-10, 2019. Washington, DC. https://www.cni.org/topics/electronic-theses-dissertations-etds/bringing-computational-access-to-book-length-documents-via-an-etd-pilot
  • William A. Ingram (presenting), Bipasha Banerjee, and Edward A. Fox. "Summarizing ETDs with Deep Learning." Paper presented at the 22nd International Symposium on Electronic Theses and Dissertations. November 6-8, 2019. Porto, Portugal
  • William A. Ingram and Edward A. Fox (co-presenting). "Preparing code and data for computational reproducibility: a hands-on workshop." 22nd International Symposium on Electronic Theses and Dissertations. November 6-8, 2019. Porto, Portugal
  • Nushrat Khan and William A. Ingram. 2015. System Development for Automatic Ingestion of Large Amount of Data and Associated Metadata using REST API—Scope of DSpace. Poster presentation at the 2015 Digital Library Federation 2015 Forum. Vancouver, BC. http://hdl.handle.net/2142/88928
  • Thomas Habing, Howard Ding, William A. Ingram (presenting), Robert Ferrer. 2012. "Fedora Akubra Storage Plugin for the Dell DX Object Storage Platform." Presented at the 7th International Conference on Open Repositories. Edinburgh, Scotland
  • Sarah Shreeves and William A. Ingram (co-presenting). 2010. "BibApp 1.0 and Beyond: Developing a Piece of the Scholarly Communication Toolkit." Presented at the 5th International Conference on Open Repositories. Madrid, Spain
  • Thomas Habing, Myung-Ja Han, Patricia Hswe, William A. Ingram, and Robert Manaster (all co-presenting). "Repository Interoperability and Preservation: The Hub and Spoke Framework." Presented at the 2009 Digital Library Federation Spring Forum. Raleigh, NC.
  • William A. Ingram. "Hub and Spoke Tool Suite." 2009 PREMIS Implementation Fair. San Francisco, CA.
  • Thomas Habing and William A. Ingram (co-presenting). "Preservation Metadata Implementation Scenarios: The Hub and Spoke Tool Suite." 2009 Digital Preservation Metadata Workshop. Urbana, IL.
  • William A. Ingram. 2009. Invited Guest Lecture. LIS 590 MD, Metadata in Theory & Practice. Instructor: Timothy Cole. Graduate School of Library and Information Science. University of Illinois at Urbana-Champaign.

Other papers and reports

  • Naman Ahuja, Ritesh Bansal, William A. Ingram, Palakh Jude, Sampanna Kahu, and Xinyue Wang. "Big Data Text Summarization: Using Deep Learning to Summarize Theses and Dissertations." http://hdl.handle.net/10919/86406
  • John Aromando, Bipasha Banerjee, William A. Ingram, Palakh Jude, and Sampanna Kahu. 2020. "Classification and extraction of information from ETD documents." http://hdl.handle.net/10919/96645

Papers in refereed conference proceedings

  • Lamia Salsabil, Jian Wu, Muntabir Hasan Choudhury, William A. Ingram, Edward A. Fox, Sarah M. Rajtmajer, and C. Lee Giles. 2022. A Study of Computational Reproducibility using URLs Linking to Open Access Datasets and Software. In Companion Proceedings of the Web Conference 2022 (WWW ’22 Companion), April 25–29, 2022, Virtual Event, Lyon, France. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3487553.3524658
  • Sami Uddin, Bipasha Banerjee, Jian Wu, William A. Ingram, and Edward A. Fox. 2021. Building A Large Collection of Multi-domain Electronic Theses and Dissertations. In 2021 IEEE International Conference on Big Data (Big Data), 6043–6045. https://doi.org/10.1109/BigData52589.2021.9672058
  • Muntabir Hasan Choudhury, Himarsha R. Jayanetti, William A. Ingram, Jian Wu, Edward A. Fox. Automatic Metadata Extraction Incorporating Visual Features from Scanned Electronic Theses and Dissertations. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2021. (JCDL ’21). Association for Computing Machinery, New York, NY, USA, 565–566. https://doi.org/10.1109/JCDL52503.2021.00066
  • Sampanna Yashwant Kahu, William A. Ingram, Jian Wu, Edward A. Fox. 2021. ScanBank: A Benchmark Dataset for Figure Extraction from Scanned Electronic Theses and Dissertations. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2021. (JCDL ’21). Association for Computing Machinery, New York, NY, USA, 565–566. https://doi.org/10.1109/JCDL52503.2021.00030
  • William A. Ingram and Edward A. Fox. 2020. Preparing Code and Data for Computational Reproducibility. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL ’20). Association for Computing Machinery, New York, NY, USA, 565–566. https://doi.org/10.1145/3383583.3398714
  • Edward A. Fox and William A. Ingram. 2020. Introduction to Digital Libraries. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL ’20). Association for Computing Machinery, New York, NY, USA, 567–568. https://doi.org/10.1145/3383583.3398501
  • James Tuttle, Yinlin Chen, Tingting Jiang, Lee Hunter, Andrea Waldren, Soumik Ghosh, and William A. Ingram. 2020. Multi-tenancy Cloud Access and Preservation: Virginia Tech Digital Libraries Platform. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL ’20). Association for Computing Machinery, New York, NY, USA, 557–558. https://doi.org/10.1145/3383583.3398624
  • Muntabir Hasan Choudhury, Jian Wu, William A. Ingram, and Edward A. Fox. 2020. A Heuristic Baseline Method for Metadata Extraction from Scanned Electronic Theses and Dissertations. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020 (JCDL ’20). Association for Computing Machinery, New York, NY, USA, 515–516. https://doi.org/10.1145/3383583.3398590