The availability of digital cameras and the possibility to take photos at no cost lead to an increasing amount of digital photos online and on private computers. The pure amount of data makes approaches that support users in the administration of the photo necessary. As the automatic understanding of photo content is still an unsolved task, metadata is needed for supporting administrative tasks like search or photo work such as the generation of photo books. Meta-information textually describes the depicted scene or consists of information on how good or interesting a photo is.
In this thesis, an approach for creating meta-information without additional effort for the user is investigated. Eye tracking data is used to measure the human visual attention. This attention is analyzed with the objective of information creation in the form of metadata. The gaze paths of users working with photos are recorded, for example, while they are searching for photos or while they are just viewing photo collections.
Eye tracking hardware is developing fast within the last years. Because of falling prices for sensor hardware such as cameras and more competition on the eye tracker market, the prices are falling, and the usability is increasing. It can be assumed that eye tracking technology can soon be used in everyday devices such as laptops or mobile phones. The exploitation of data, recorded in the background while the user is performing daily tasks with photos, has great potential to generate information without additional effort for the users.
The first part of this work deals with the labeling of image region by means of gaze data for describing the depicted scenes in detail. Labeling takes place by assigning object names to specific photo regions. In total, three experiments were conducted for investigating the quality of these assignments in different contexts. In the first experiment, users decided whether a given object can be seen on a photo by pressing a button. In the second study, participants searched for specific photos in an image search application. In the third experiment, gaze data was collected from users playing a game with the task to classify photos regarding given categories. The results of the experiments showed that gaze-based region labeling outperforms baseline approaches in various contexts. In the second part, most important photos in a collection of photos are identified by means of visual attention for the creation of individual photo selections. Users freely viewed photos of a collection without any specific instruction on what to fixate, while their gaze paths were recorded. By comparing gaze-based and baseline photo selections to manually created selections, the worth of eye tracking data in the identification of important photos is shown. In the analysis of the data, the characteristics of gaze data has to be considered, for example, inaccurate and ambiguous data. The aggregation of gaze data, collected from several users, is one suggested approach for dealing with this kind of data.
The results of the performed experiments show the value of gaze data as source of information. It allows to benefit from human abilities where algorithms still have problems to perform satisfyingly.
Placing questions before the material or after the material constitute different reading situations. To adapt to these reading situations, readers may apply appropriate reading strategies. Reading strategy caused by location of question has been intensively explored in the context of text comprehension. (1) However, there is still not enough knowledge about whether text plays the same role as pictures when readers apply different reading strategies. To answer this research question, three reading strategies are experimentally manipulated by displaying question before or after the blended text and picture materials: (a) Unguided processing with text and pictures and without the question. (b) Information gathering to answer the questions after the prior experience with text and pictures. (c) Comprehending text and pictures to solve the questions with the prior information of the questions. (2) Besides, it is arguable whether readers prefer text or pictures when the instructed questions are in different difficulty levels. (3) Furthermore, it is still uncertain whether students from higher school tier (Gymnasium) emphasize more on text or on pictures than students from lower school tier (Realschule). (4) Finally, it is rarely mentioned whether higher graders are more able to apply reading strategies in text processing and picture processing than lower graders.
Two experiments were undertaken to investigate the usage of text and pictures in the perspectives of task orientation, question difficulty, school and grade. For a 2x2(x2x2x2) mixed design adopting eye tracking method, participants were recruited from grade 5 (N = 72) and grade 8 (N = 72). In Experiment 1, thirty-six 5th graders were recruited from higher tier (Gymnasium) and thirty-six 5th graders were from lower tier (Realschule). In Experiment 2, thirty-six 8th graders were recruited from higher tier and thirty-six were from lower tier. They were supposed to comprehend the materials combining text and pictures and to answer the questions. A Tobii XL60 eye tracker recorded their eye movements and their answers to the questions. Eye tracking indicators were analyzed and reported, such as accumulated fixation duration, time to the first fixation and transitions between different Areas of Interest. The results reveal that students process text differently from pictures when they follow different reading strategies. (1) Consistent with Hypothesis 1, students mainly use text to construct their mental model in unguided spontaneous processing of text and pictures. They seem to mainly rely on the pictures as external representations when trying to answer questions after the prior experience with the material. They emphasize on both text and pictures when questions are presented before the material. (2) Inconsistent with Hypothesis 2, students are inclined to emphasize on text and on pictures as question difficulty increases. However, the increase of focus on pictures is more than on text when the presented question is difficult. (3) Different from Hypothesis 3, the current study discovers that higher tier students did not differ from lower tier students in text processing. Conversely, students from higher tier attend more to pictures than students from lower tier. (4) Differed from Hypothesis 4, 8th graders outperform 5th graders mainly in text processing. Only a subtle difference is found between 5th graders and 8th graders in picture processing.
To sum up, text processing differs from picture processing when applying different reading strategies. In line with the Integrative Model of Text and Picture Comprehension by Schnotz (2014), text is likely to play a major part in guiding the processing of meaning or general reading, whereas pictures are applied as external representations for information retrieval or selective reading. When question is difficulty, pictures are emphasized due to their advantages in visualizing the internal structure of information. Compared to lower tier students (poorer problem solvers), higher tier students (good problem solvers) are more capable of comprehending pictures rather than text. Eighth graders are more efficient than 5th graders in text processing rather than picture processing. It also suggests that in designing school curricula, more attention should be paid to students’ competence on picture comprehension or text-picture integration in the future.