Analysis of Costume Core Vocabulary and Historical Descriptions of Costume Artifacts Using Natural Language Processing

Co-PIs: Chreston Miller (Library), Dina Smith-Glaviana (Apparel, Housing, and Resource Management), Julia Spencer (Library), & Wen Nie Ng (Library)

Proposal

This proposed research project addresses the inconsistency in terminology and the limitations of written word within the field of historic costume.

The historic descriptions are originally typed on hundreds of physical note cards, which have been scanned into digital format, e.g., JPEGs, by the library’s co-PI Spencer. The library’s co-PI Miller provides an Optical Character Recognition (OCR) service to transform the scanned note cards into digital text. The library co-PI, Ng, will oversee quality Control of the OCR results. Given the OCR results of the historic descriptions and the newly described/contemporary descriptions using Costume Core, the library’s co-PI Miller will oversee the use of NLP techniques to accomplish the language analysis. To accomplish these goals, one undergraduate research assistant will be hired to catalog items using Costume Core, and a second undergraduate research assistant will be hired to perform quality control (QC) of notecards where text was extracted using OCR. They will work under the direction of the subject expertise co-PI Smith-Glaviana, who will consult with the library co-PI Ng for the data needed for the costume collection.

We anticipate two main deliverables. Once is more items re-described using Costume Core which adds value to the collection. The second is an NLP solution for automatically mapping free-form text of historic descriptions to that of a controlled vocabulary.