Project History & Objectives
The American Soldier in World War II Project was initiated in 2015, but inspired by a discovery six years earlier, when Project Director Ed Gitre first encountered the handwritten commentaries that are at the heart of this project.
The National Archives and Records Administration, or NARA, in College Park, Maryland, holds what was then the sole copy of the surviving responses, as mentioned in About the Surveys and Data. Though unclassified, the collection was non-circulating. The only way to read the soldiers' commentaries was to visit the archives and read them on a microfilm reader.
A historian of the interdisciplinary social and behavioral sciences and modern American culture, Dr. Gitre recognized that he had discovered something quite special after requesting the 44 microfilm reels for himself and reading a small sample of commentaries. Especially moving were the raw free responses of aggrieved Black soldiers serving in a segregated military. Then and there, he knew the collection deserved a much wider audience.
2015-2018: Planning and Transcribing
Dr. Gitre began planning in 2015 after arriving at Virginia Tech as a visiting assistant professor. He had two microfilm reels digitized so that students enrolled in his World War II courses could transcribe and tag them using Omeka and Incite, a custom-built transcription and tagging plugin designed by Virginia Tech’s Crowd Intelligence Lab under the supervision of the lab's director, Dr. Kurt Luther.
The exercise introduced students to the digital humanities while also offering them an intimate and unfiltered perspective on this global conflict. Dr. Gitre's students rose to the challenge of deciphering and tagging. They were surprised, moved, and amused by what they discovered in the process. The university recognized Dr. Gitre's innovative use of technology in teaching in 2017 with an xCaliber Award.
Based on the success of this student-sourcing, the National Endowment for the Humanities (NEH) awarded Virginia Tech a $50,000 Division of Preservation and Access planning grant (PW-253766-17) in 2017, with Dr. Gitre as Principal Investigator. The objective was to prepare an implementation plan for transcribing all of the soldiers' free responses and reuniting them with the quantitative survey datasets held at both NARA and the Roper Center for Public Opinion Research (“The American Soldier Collaborative Digital Archive,” 2017).
From Student-sourcing to Crowdsourcing
Committed to seeing the entire microfilm collection transcribed and annotated, Dr. Gitre and the Crowd Intelligence Lab designed and launched during the planning year a transcription initiative on the planet’s most popular crowdsourcing platform, Zooniverse.org, which already hosted several war-related transcription projects. To jumpstart the effort, NARA’s Digital Engagement Division digitized the entire collection, containing some 65,000 pages.
With over a million volunteers in 2018, the Zooniverse platform showed great promise for attracting a wide range of volunteers who share an interest in World War II, often because a relative had served or because they had served in uniform themselves.
On the anniversary of VE (Victory in Europe) Day, May 8, 2018, The American Soldier officially launched its transcription drive with a daylong "transcribathon."
The project held additional transcribathons to coincide with other World War II-related anniversaries. These events were held at Virginia Tech Libraries' Digital Humanities hub, the Atheneum, and at other locations across the country, including:
- Buffalo and Erie County Library in New York State
- Clifton Community Library
- College of the Holy Cross’ Dinand Library
- Georgia Institute of Technology’s Digital Integrative Liberal Arts Center
- Graceland University
- Purdue University's Humanities, Social Sciences and Education Library
- The College at Brockport’s Drake Memorial Library
- The University of North Carolina at Chapel Hill’s Digital History Lab
- Veterans Employment Training Service Inc. (V.E.T.S. Inc.)
- Westfield State University
People at Virginia Tech campus in Blacksburg, VA, transcribe text from images at The American Soldier in WWII Veterans' Week transcribathon in 2019.
In 2019, the NEH Division of Preservation and Access awarded the project a second, $350,000 implementation grant (PW-264049-19) to sustain these transcription efforts, modernize NARA’s data files in a joint endeavor with the University of Virginia’s Biocomplexity Institute, and, ultimately, build this open-access website.
Other institutions contributed as well. The Social Science Research Council (SSRC) provided additional assistance, helping to secure a Creative Commons license to redistribute the Stouffer-edited four volumes available on this site in cooperation with VT Publishing, HathiTrust Digital Library and Princeton University Press, while the George C. Marshall Research Library supplied copies of its public-domain high-resolution scans of What the Soldier Thinks.
The SSRC, NARA, and other contributing organizations helped as well by promoting the transcription drive and transcribathon events, with NARA's Citizen Historian Hub hosting a transcribathon itself.
In the course of deciphering tens of thousands of images of handwritten responses, the community of Zooniverse citizen-archivists and citizen-historians asked and answered questions of one another and of the project researchers on the discussion boards in the project’s Talk section. Explore discussions and comments on the following boards.*
- Notes — A board for comments on individual survey pages where transcribers shared interesting responses, asked for help deciphering a word or phrase, or added tags to help identify common themes.
- General Discussion — A board where transcribers discussed what they learned from these handwritten documents about the soldiers’ experiences and about the history and events of World War II. A board for discussing lessons learned from project broadly instead of discussing a specific page or response.
- Questions for the Research Team — A board for asking project experts about the Army Research Branch survey questions and other points of historical or military interest.
* Zooniverse account required.
To improve website performance and to draw insights from the army's survey data, Dr. Gitre collaborated with data scientists at Virginia Tech and the University of Virginia Biocomplexity Institute. Toward these efforts, in 2020 the project received $15,000 in additional financial support from Virginia Tech’s Data & Decisions Destination Area.
Student teams from Virginia Tech’s Computational Modeling and Data Analytics program and summer interns from the University of Virginia’s Data Science for the Public Good Young Scholar’s Program explored, tested, and implemented several National Language Processing techniques, including a new Google algorithm known as BERT. To determine the order in which free responses are presented on this site, we have assigned an algorithm-generated "interest score" to each soldier commentary. Those scored as most interesting appear atop the list.
Funding from the Data & Decisions Destination Area also supported survey data cleaning and restructuring, undertaken by Virginia Tech Libraries' DataBridge lab. In addition to relabeling and restructuring the quantitative and qualitative survey for this site, DataBridge provided raw datasets for download along with new frequencies files. Virginia Tech Libraries' Data Services assisted as well with the cleaning and restructuring of the soldiers' free response, while Virginia Tech Publishing facilitated a creative commons licensing agreement for The American Soldier volumes, supported and hosted transcribathons, and copyedited website content.
Many of the free responses that had been microfilmed after the war were likely already reproductions, as paper documents were often photographed to film before being shuttled between overseas theaters and the US as a cost- and space-saving measure. Here is one example.
Following the example of other transcription project, the team originally intended to have each document transcribed three times. We would then apply a Jaccard algorithm to select as the best transcription the one that agreed most with the other two. The initial results were good. But we thought they could be better, and we decided to add another round to the transcription drive.
This fourth round further improved the output, yet after another review of the data we added a final quality-control measure. Over twenty of our lead transcribers from around the globe volunteered to manually review every handwritten commentary using a purpose-built Omeka transcription platform, designed and managed by Virginia Tech Computer Scientist Xavier Pleimling and his Crowd Intelligence Lab mentor, Dr. Luther.
The project's lead transcribers compared the four submitted transcriptions with the one selected by the Jaccard algorithm and adding final corrections, all in an effort to provide the most accurate transcriptions possible.
Over the course of the entire transcription drive, nearly 7,200 citizen-archivists and citizen-historians submitted over a quarter-million transcriptions.
During this implementation phase, Dr. Gitre initiated another interdisciplinary collaboration, with Virginia Tech's Center for Human-Center Interaction. CHCI graduate student Lee Lisle designed a virtual reality (VR) prototype for displaying large document sets in an immersive environment, called Immersive Space to Think. A selection of soldier commentaries from Survey 32, the army's first race relations survey, were incorporated into IST testing and refining, which has resulted in two referred IEEE VR conference proceedings, “Evaluating the Benefits of the Immersive Space to Think” and "Sensemaking Strategies with Immersive Space to Think.". Virginia Tech in 2021 recognized Dr. Gitre's transdisciplinary teaching and mentoring and public engagement with a Diggs Teaching Scholar Award.
In 2020, Dr. Kurt Piehler, a historian of World War II at Florida State University and a project advisory board member, began discussions with Dr. Gitre about creating a public museum exhibit, centered on the soldiers' free responses, to engage and expand audiences. This interactive exhibit would combine physical artifacts from World War II with a VR/AR tool, similar to IST, for visitors to explore and read free responses within an immersive environment. To prototype this VR/AR technology and advance the next phase of the project, Virginia Tech's Institute for Creativity, Arts, and Technology awarded the project a $25,000 Major SEAD grant.
Early prototyping of a VR/AR system for displaying soldiers' free responses and other World War II-related media in an immersive environment.
For more on the project's transdisciplinary collaborations and future directions, read Suzanne Irby's “Researchers Are Pulling Movements out of Microfilm with Digital History”.
A Note of Gratitude
The enhancements of artificial intelligence and the generous support of an assortment of institutions have made this digital archive and publication possible. But wide public access to this unparalleled collection of soldiers’ reflections on World War II and their military service would not have been possible without the deep commitment of the project’s volunteer citizen-archivists and citizen-historians. This is especially true of the project’s lead transcribers, some of whom personally transcribed many thousands of pages in addition to reviewing manually the final transcriptions one-by-one. To them, The American Soldier in World War II team and the users of this site owe a tremendous debt of gratitude.
Read more about the Project Team & Partners, including the lead transcribers.
Want to tell us what you think of the site? Found an error or issue? Give us your feedback.