Overview | Project Details | Team | Awards | Full text issue


Developing full text research

During the development of this first version of the archive arbeiter-zeitung.at various ways of optical character recognition (OCR) and more importantly different research and presentation solutions for a full-text newspaper archive have been created and evaluated. The full-text issue presented below shows what a highly developed research and presentation-tool can provide. (As full-text is the subject in this test, it is stripped of other standard navigation tools as page turning, zoom or similar.) The example shows June, 22nd 1978 - the day after Austria's historic victory over Germany during the Soccer World Cup in Argentine. As the AZ is in German the following names or words might be useful to test the efficiency of the full-text-research: Krankl (striker in Austria's soccer team), Atom, Carter;

Try the full text search:

Or view a particular page:

Page 01

Page 02

Page 03

Page 04

Page 05

Page 06

Page 07

Page 09

Page 10

Page 11

Page 12

Page 13

Page 14

Page 15

Page 16
Being the most common way for research in the internet full-text research will become a vital function even in retro-digitized newspaper archives. With this function integrated, newspaper archives will be more powerful than usual search engines as they can combine requests of content and period of time. Still, OCR does not work with sufficient accuracy to provide the plain text as result of research requests. A very common technique to avoid defective text is to use the text for the search only, but to present the image as result. Thereby an incorrect OCR only influences the accuracy of the search and not the quality of the text.

Another difficulty for OCR in connection with newspapers is their complex layout. This makes an automated separation of single articles challenging if not impossible. One the other hand, to forego the separation of articles will result in lesser quality and usability of the research. Either way, the process of OCR is costly and time consuming. Not only in the interest of arbeiter-zeitung.at Kaltenbrunner Medienberatung and scharf_net will keep on researching and developing creative solutions for highly efficient and user-friendly full-text research in digitised newspapers and their presentation online.