Eng During last 365 days Approved articles: 1912,   Articles in work: 306 Declined articles: 803 
Library
Articles and journals | Tariffs | Payments | Your profile

Back to contents

Automating Historical Source Transcription with Record Linkage Techniques. Work in progress on the 1950 census for Norway
Thorvaldsen Gunnar

Doctor of History

Professor at Tromsoand UrFU

620142, Russia, Sverdlovskaya oblast', g. Ekaterinburg, ul. Chapaeva, 17, of. 48

zabolotnykh@mail.com

Abstract.

The article addresses the issue of transcribing handwritten materials of the 1950 Norwegian Population Census. These are 801 000 scanned double sided questionnaires. Optical character recognition programs have been improving for over four decades.  Now researchers aim to extend similar techniques to handle handwritten historical source material. The article analyzes studies carried by the Center of Historical Documents at the University of Tromsø which address handwritten text recognition as well as considers the use of various text recognition techniques as far as nominative sources are concerned. Since it is difficult to distinguish and separate individual handwritten characters, the words are mathematically clustered according to image similarity or searched for within sources that have been transcribed earlier. After the recognition quality control, the software uses the line numbers to place the information taken from the transcribed cells. After that the latter become a part of the census database. Moreover, special software has been developed to process handwritten numerical codes, data on occupations and education, etc. The methods offered in the article provide for handwritten texts transcribing quality improvement and can be used to recognize nominative source notes in Russia, for instance, parish registers and vital records. The main goals are still the search for methods and algorithms which optimally link different variables as well as the rationalization of interactive proofread methods.  

Keywords: graphical user interface, Historical Population Register, neural network, Deep learning, OCR, transcription, record linkage, Norwegian population census, databases, hand writing

DOI:

10.7256/2585-7797.2018.1.25686

Article was received:

11-03-2018


Review date:

12-03-2018


Publish date:

21-04-2018


This article written in Russian. You can find full text of article in Russian here .

References
1.
Kliatskine V. et al. A structured method for the recognition of complex historical tables // History and Computing. 1997. Vol. 9 (13). P. 5877.
2.
Madhvanath S. et al. Reading handwritten US census forms // Proceedings of the Third International Conference on Document Analysis and Recognition. Vol. 1. IEEE Computer Society, 1995. P. 8285.
3.
Thorvaldsen G. et al. A Tale of Two Transcriptions. Machine-Assisted Transcription of Historical Sources // Historical Life Course Studies. 2015. Vol. 2 (1). P. 119.
4.
Torval'dsen G. T. Nominativnye istochniki v kontekste vsemirnoi istorii perepisei: Rossiya i Zapad // Izvestiya Ural'skogo federal'nogo universiteta. Seriya 2. Gumanitarnye nauki. 2016. T. 18. 3 (154). S. 928.
5.
Krizhevsky A., Sutskever I., Hinton G. E. Imagenet classification with deep convolutional neural networks // Advances in neural information processing systems 25. NIPS, 2012.
6.
Glavatskaya E. M., Torval'dsen G. T. Etno-religioznaya i demograficheskaya dinamika v gornoi Evrazii v kontse XIX nachale XX vv.: proekt sozdaniya Registra naseleniya Urala // Informatsionnyi byulleten' assotsiatsii Istoriya i komp'yuter. 2016. 45 (spetsvypusk). S. 251254.