Surfacing Women in Smithsonian History

January 2022 | By Dr. Elizabeth Harmon, Digital Curator, Smithsonian Libraries and Archives in collaboration with Artist in Residence at Google Arts & Culture Lab: Dr. Lynn Cherny

Developing machine learning tools to help Smithsonian curators uncover the history and contributions of women in science

Video Play Button
Women have always contributed to the sciences. In fact, some of the Smithsonian’s most exciting early scientific research was undertaken by women. Before the turn of the century, Louisa Bernie Gallaher experimented with scientific photography at the Smithsonian’s United States National Museum, Mary Jane Rathbun was well on her way to becoming an expert on Crustacea while working in the Smithsonian’s Invertebrate Zoology Department, and Erminnie Adele Platt Smith completed field research for the Bureau of American Ethnology. To their contemporaries, these women of science were known as experts. Newspaper articles about their work and Smithsonian annual reports provide evidence of this.
Despite their achievements and contributions, though, women have remained below the surface in Smithsonian history. While collection records and stories exist in abundance about the important early contributions of men in leadership at the Smithsonian, it’s much harder to locate records and stories about women’s work. This is not unusual. In the early twentieth century when scientific and archival labor professionalized, historians have documented how emerging scientific and cultural institutions prioritized the hiring of white men and the creation of historical records about their work and achievements. As a result, today there are barriers to surfacing stories about women’s lives and work in collection records. For example, women’s personal and professional papers are harder to find in archives and collections records. Women are not always identified in photographic records. Women were likely to work as contractors, or even as volunteers, rather than full-time staff. And, perhaps most challenging of all, women’s names change in written records—due to marriage, the inclusion of titles such as Miss and Mrs., and the use of a husband’s name.

The Smithsonian American Women’s History Initiative, Because of Her Story, is creating, disseminating, and amplifying the historical record of the accomplishments of American women, including the women who have worked at the Smithsonian. While we may not have personal or professional papers in our archives that tell complete stories about the work of women in Smithsonian history, we do have over 16.9 million records in our online collections, many of which contain traces of these women’s work. By mining our collections metadata, which include information about the scientific specimens, museum objects, artworks, library volumes, and archival materials in our collections, we are piecing together the history of women’s work in the sciences at the Smithsonian.
And we have launched Smithsonian Open Access, allowing people around the world to download, share, and reuse millions of the Smithsonian’s online collections – all for free, right now, without restrictions. So far, we have released over 3 million 2D and 3D digital items, and nearly two centuries of collections data that the public can access at-scale with a new public API and GitHub repository. By making our collections more open, we are creating new research opportunities, many of which can surface stories about women in American history.

In collaboration with Google Arts & Culture, lead sponsor of the Smithsonian’s Open Access launch, the Smithsonian has been using machine learning to mine its collections metadata to uncover stories about women in Smithsonian history, particularly women in science. Using a structured data set containing information about women who have worked in science at the Smithsonian, Google ran machine learning algorithms to identify “named entities” (such as people, places, or dates) in both the Smithsonian’s collections metadata and the texts of Smithsonian annual reports. Using the named entities, data scientists created a network view to allow relationships between entities to surface in visual form. Smithsonian curators could then browse among the “nodes” in the network and see who is connected to whom in the collections metadata. 

Collections relating to Mary Jane Rathbun, for example, surfaced taxonomy cards in Smithsonian records that detail a collecting trip Rathbun took with Smithsonian colleague and scientific illustrator Serena Katherine Dandridge in 1911. The taxonomy card also revealed that another Smithsonian colleague, Dr. Harriet Richardson Searle, identified some of the specimens they brought back to the Smithsonian. 

Another algorithm identified the most likely binary gender of named entities -- based on historical probabilities of a first name referencing a man or woman or the presence of a Miss or Mrs. title. The metadata for subsets of the Smithsonian records, such as “Invertebrate Zoology,” were loaded into BigQuery tables with this extra name and gender information, and based on these tables, Data Studio interactive reports allowed curators to search the data for women’s names in the collection.

Finally, a clustering algorithm was applied to the images in the collections metadata to expose the breadth and diversity of Smithsonian collections. Clusters found include, for example, collections of handwritten record cards, pictures of scientists with microscopes, photographs of field expeditions, shards of pottery in the anthropology collection, and examples of coins and paper money. Within these images, curators could search for women’s names and find historical images of the women in science who collected and recorded items documented in the Smithsonian’s records.
The work that the Smithsonian and Google Arts & Culture completed for the Smithsonian Open Access launch will help Smithsonian curators and data scientists to provide at-scale analysis and visualizations showing the representation of women across nearly two centuries of cultural data in order to surface new stories we can share with the world.