Improving the Usability of E-Book Readers

by Eva Siegenthaler, Pascal Wurtz, Rudolf Groner

Peer-reviewed Article

pp. 25-38

No PDF available for download.

Abstract

The use of e-book readers (e-readers or electronic-readers) has become increasingly widespread. An e-reader should meet two important requirements: adequate legibility and good usability. In our study, we investigated these two requirements of e-reader design. Within the framework of a multifunctional approach, we combined eye tracking with other usability testing methods. We tested five electronic reading devices and one classic paper book. The results suggested that e-readers with e-ink technology provided legibility that was comparable to classic paper books. However, our study also showed that the current e-reader generation has large deficits with respect to usability. Users were unable to use e-readers intuitively and without problems. We found significant differences between the different brands of e-book readers. Interestingly, we found dissociations between objective eye-tracking data and subjective user data, stressing the importance of multi-method approaches.

Practitioner’s Take Away

The following list summarized take-aways that practitioners can get from this article:

E-readers are not yet accepted as a replacement for a classic paper book.
One of the primary reasons for this is poor usability.
Another reason is that users expect more functions from an electronic reading device. This problem illustrates a challenge for future e-reader development; more functions should be integrated in such a way that they are usable intuitively.
The legibility of the current e-reader generation is good; e-ink technology enables a reading process that is very similar to the reading process for classic paper books.
For people with visual impairment, e-readers have the advantage of providing an opportunity to adjust the font size.
Differences between subjective interview data and objective eye-movement data underline the importance of combining different methods in usability testing.

Introduction

An e-book (electronic book) is an electronic version of a printed book. E-books can be read using an e-book reader (or e-reader) which is a digital device that displays electronic text. E-books can also be read on a personal computer (PC), mobile phone, or personal digital assistant (PDA); however, a specialized software application (e.g., Stanza) is necessary to read an e-book on a PC, mobile phone, or PDA. Our study focused on the e-book reader device, not the e-reader software for PCs, mobile phones, or PDAs. E-book reader devices are becoming more and more widely used and are subject to rapid development. Figure 1 shows the mutual relationships between the devices and the content.

Figure 1. E-books can be read either on a specialized reading device (e-book reader/e-reader) or on any general purpose device such as PC, PDA, or mobile phone that allow e-book reading through a software application.

The idea of using a digital device to read books is not new; it has existed for almost as long as interactive (end-user) computing (Golovchinsky, 2008). The first devices for e-reading were prototyped in the late 1960s by Alan Kay and later embodied in several generations of devices (Apple Newton, the Rocket eBook, and the Amazon Kindle). These generations of devices have been driven by innovations in device technology (e.g., displays, batteries, CPUs) rather than through evolving user needs.

Around the year 2000, there was a large interest to read content on specialized e-reading devices. Several companies (e.g., Franklin, Hanlin, Hiebook, Rocket eBook) released specialized e-reading devices (e-readers). At the same time, Microsoft released an Adobe software-only tool for reading on PCs. In this time, online stores for purchasing titles (e-books) were created. The latest generation of e-readers includes, among others, the Sony reader, the Amazon Kindle, and the IrexiLiad. This generation of e-readers is equipped with a different display technology. The active LCD displays have been replaced by “e-ink” technology that reduces the power consumption, thereby increasing the battery life and reducing the weight of the device. Last year’s market shows an expanding interest in e- readers. It seems that the e-book (and e-readers) is the most important development in the world of literature since the Gutenberg press (Rao, 2003). Consequently, there is a great need for empirical research on the usability of e-readers examining the extent to which the devices are capable of replacing the classic paper book and how their additional capabilities should be implemented so that users can use them intuitively.

Usability of E-Readers

The question whether e-readers will, at least partly, supersede the classic book largely depends on good usability. To define this variable we have applied ISO 9241-11, which is the most widely accepted definition. This standard describes how these qualities can be defined as part of a quality system and specifies good usability when a tool can be used effectively and efficiently and when the user is satisfied.

Previous studies found that users have problems in the handling of the current e-reader generation (Lam, P., Lam, S.L., Lam, J., & McNaught, 2009; McDowell & Twal, 2009; Thompson, 2009). Lam et al. found that students judged the enjoyment of the e-book reading process as low. Thompson compared six e-readers and concluded that the Sony Digital Reader was the most user friendly solution for consumer level use, but she added that studying on an electronic reader is less attractive than studying text using a classic paper book. In McDowell and Twal’s study, students used the Kindle Reader during one semester. They found that the majority of the students were not satisfied with the navigation.

In a self-experiment, Jakob Nielsen (2009) tested the new Amazon e-reader Kindle 2. He read one half of a book on the Kindle 2 and read the classic paper book for the other half. He concluded that the Kindle 2 was well suited for linear texts. He found no difference in the reading speed between the Kindle 2 and the classic paper book. He noted problems in navigation and criticized the navigation of Kindle 2 as non-intuitive; therefore, reading non-linear texts (like newspapers) is not comfortable. Nielsen saw advantages in the use of e-readers, like less weight to carry around or big fonts, and saw benefits in equal-to-print legibility and multidevice integration (2009).

Except for the studies mentioned above, a substantial amount of experimental work about the usability of e-readers does not exist. One reason could be that e-reader companies do not publish their results about their own laboratory studies; another reason could be that an e-reader as a commercial device is a relatively new phenomenon. However, despite the few experimental studies on the usability of e-readers, there is a great potential for improving the usability of the next e-reader generation (Siegenthaler, Wurtz, & Groner, 2010).

The usage of e-books has potential in different areas like the fields of education or publication of newspapers and literature. In comparison to classic paper books, e-books have some essential advantages. E-books can be easily updated for correcting errors and adding information. Additionally, full text search functions help users quickly find a passage or keyword in a book. E-books can be annotated without compromising the original work. They can be hyper-linked for easier access to additional information, and they may allow the option for the addition of multimedia like still images, moving images, and sound. And last, but not the least, they make reading accessible to persons with disabilities, because text can be re-sized for the visually impaired and be read aloud using speech synthesis. However, for the satisfactory usage of all these functions good usability is necessary.

E-Reading

Studies about e-reading based on reading processes are rare. There are several studies that compare reading between classic paper books and TFT screens (e.g., Creed, Dennis, & Newstead, 1987; Gould & Grischkowsky, 1984). Early studies found many qualitative and quantitative differences between paper and screen reading (for a review, see Dillon, 1992). In the meantime, digital displays have become more sophisticated and are increasingly present in everyday life. A recent study (Van de Velde & von Grünau, 2003) suggested that the differences between media (e.g., print vs. screen) have decreased but emphasized that reading behavior also depends on moderating variables like computer experience or the task to be performed. It is not granted that reading an e-reader is the same as reading a classic paper book or reading on a computer screen. The size of the device, the scalable font size, and the new e-ink technology make it different in comparison to reading a classic paper book or to reading from a screen. However, the robustness against bright ambient light in the surrounding area provided by e-paper technology and the improvement in screen quality makes reading from these devices increasingly acceptable, and the advantages in terms of low weight, small dimensions, and freedom of movement are evident.

Colors can serve as a factor for information transmission. Traditional print books are most frequently monochromatic, while technology of e-books has virtually no limits in that regard, although currently available e-ink technology is mainly monochromatic. Future e-readers could offer different color combinations of text and background. Some studies showed that the new e-ink technology can have a positive effect on the reading process (Siegenthaler, Wurtz, & Groner, 2010). However, other results showed that usability problems can negatively affect reading with e-reading devices such as e-readers and palm handhelds (Marshal l& Ruotolo, 2002; Siegenthaler, Wurtz, & Groner, 2010). Additional research investigating the reading process with e-reading devices is necessary.

In this study, we compared the usability and legibility of different e-reading devices to the classic paper book.

Methods

The following sections discuss the participants, apparatus, devices, and procedures used in this study.

Participants

Ten participants, 5 male and 5 female, were tested. Their ages ranged from 16 to 71 years (mean = 42 years). They were selected to represent the full range of possible e-reader users (with respect to subjective media experience, education, and subjective reading time per week). The average subjective media experience was 3.7 (scale from 1 to 6), and the average subjective reading time per week was 6.25 hours. Four participants had a high-school degree, two participants had a university degree, and four participants had completed an apprenticeship. All participants reported normal or corrected to normal vision and had no previous experience with e-readers. Participants gave written informed consent prior to participation. The study was performed in accordance with the latest declaration of Helsinki (WMA: Ethical Principles for Medical Research Involving Human Subjects, 2008).

Apparatus

Eye movements were recorded with an infrared video eye-tracking device (Tobii X120 Eye Tracker, Tobii Technology, Danderyd, Sweden). The system has a sampling rate of 120 Hz and a spatial accuracy of 0.5 degrees of visual angle. For each participant and trial, the system was recalibrated to assure best possible accuracy. Participants were allowed to move their head within a range of approximately 30 × 20 × 30 cm.

Devices

Five e-readers and one classic paper book were chosen for this study. The selection criterion was their availability in the Swiss market in June 2009. The five e-readers are listed below:

IRex Iliad
Sony PRS-505
BeBook
Ectaco jetBook®
Bookeen Cybook Gen

Figure 2 shows the e-readers used in the study.

Figure 2. The e-readers used in this study: (a) IRex Iliad, (b) Sony PRS-505, (c) BeBook, (d) Ectaco jetBook, and (e) Bookeen Cybook Gen.

The text material used in this study was the first chapter of a novel in German language (Querschläger, Silvia Roth, 2008). Every participant read the first 12 pages from the novel for legibility assessment.

Procedure

When the participants arrived at the laboratory, they were instructed about the aim and course of the experiment. The experiment started with a first legibility test. Participants had to read a segment of the text on all reading devices (five e-Readers and one classic paper book) while their eye movements were recorded. The reading devices were presented in randomized order, and the experimenter adjusted the font to the size most convenient for the participant (this was not possible for the classic book of course). Prior to each reading trial, a 9-point calibration procedure was performed to ensure the best possible measurement quality. Immediately after the first legibility test participants were interviewed, and they had to rate and give subjective preferences to each reading device.

Following the first legibility test, a two- hour usability-based task session took place. Participants had to perform small tasks on each of the six devices and rate the devices on different scales. Each participant had to perform the tasks and fill out separate questionnaires consecutively for each reading device. In this phase, participants worked individually because we wanted to create a general situation that was close to reality. The sequence of e-readers was randomized to control for order effects. Participants were allowed to use the user manuals and other documentations supplied by the manufacturers in the original device package. Participants had to complete the following five usability tasks:

Open a book.
Increase font size.
Open a text in horizontal format.
Open an audio-file.
Open a picture.

For each task, participants had to answer whether they managed the task successfully or not. The time it took to complete the task was not noted because participants worked individually. After the small usability tasks were completed, participants were asked to rate (6-point Likert scale) the devices on the following criteria:

Design
Navigation
Orientation
Functionality
Handiness

After completing the usability tasks and the ratings, participants filled out a questionnaire about usability and acceptance of the device based on Huang, Wei, Yu, Kuo’s (2006) questionnaire.

After the usability test, participants had a short break where refreshments were supplied. After the break, a second legibility test was administered. This second test was identical to the first legibility test with the exception that different text segments and different orders of e-reading devices were employed. Finally, participants were interviewed and asked to give subjective judgments in the form of ratings on a Likert scale with numbers matching the grading system as used by Swiss schools, ranging from 1 (very bad) 2 (bad), 3 (fail), 4 (pass), 5 (good), 6 (very good).

Results

The following sections discuss the reading performance, results of usability ratings, and comparison between subjective user data and objective eye-movement data.

Reading Performance

We analyzed reading performance and eye-movement data (for a detailed eye-movement analysis, see Siegenthaler, Wurtz & Groner, 2010). Analysis of reading speed was based on the time codes of the video recordings. The start and stop times of reading and page-turns were coded for later statistical analysis.

Statistical analysis was performed using F-statistics based on a repeated measures ANOVA with the within factor “reading device” (iRex, Bookeen, BeBook, Sony, Ectaco, classic paper book). In cases of unequal variances within the groups, Friedman-tests (using χ2-statistics) were employed.

Reading time

No significant effects were found for total reading duration, F(5,40) = 0.857, p = .518. The time needed to read the text did not differ between the different devices.

Reading speed

We measured reading speed in counting words read per minute. No significant effect was found for reading speed between the different devices, F(5,40) = 1.113, p = .369.

Total page-turn duration

The time needed for each page-turn (where no reading takes place) was assessed, i.e., the period of time between the last reading fixation on the bottom of a page and the first reading fixation on the top of the next page. By summing up those durations, we measured the total time needed for page-turns. The means of total page-turn durations differed significantly between the reading devices, χ2 (5) = 22.016, p < .01.

Proportion of time spent for page-turns

The proportion of time spent for turning the pages, i.e., the ration of time needed to turn the pages and the total reading time, is shown in Table 1. The Friedman test revealed significant differences between the e-readers, χ2(5) = 19.857, p < .01.

Table 1. Mean Proportion (in %) and Standard Deviations (SD) of Time Spent for Page-Turns

Mean fixation duration

Visual fixation duration is a well established indicator of the difficulty of perceptual and/or cognitive processing (Just & Carpenter, 1980; Menz & Groner, 1982). The mean duration of visual fixations differed significantly between the reading devices, χ2(5) = 25.063, p < .01. Figure 3 shows differences in means and standard deviations of the fixation durations.

Figure 3. Mean fixation durations (ms) for reading on the different devices, with bars indicating one standard deviation. A single asterisk indicates a significant difference of p < .05 to the device with the shortest mean fixation duration (iRex); double asterisks indicate a significant difference of p < .01.

Number of letters per fixation

The mean number of letters, read per fixation was significantly affected by the type of book, χ2(5) = 14.460, p < .05. In the classic paper book, one fixation covered the largest number of letters. Table 2 shows the numbers of letters per fixation depending on the reading device. Note that the paper book had the smallest font.

Table 2. Mean Number of Letters per Fixation and Standard Deviations (SD)

Results of the Usability Ratings

The following sections describe the usability analysis based on the usability tasks and the ratings of the participants after they had tried to solve the tasks. Statistical analysis was performed using a Friedman-test with the reading devices as a within factor (iRex, Bookeen, BeBook, Sony, Ectaco, classic paper book).

Success rate of the usability tasks

After participants had solved the usability tasks, they had to report whether they solved the task successfully or not. Table 3 shows the success rates of the five usability tasks.

Table 3. Percentage of Success Rates for the Five Usability Tasks¹

Subjective Usability ratings

Design

The question was, “How do you like the design?” The ratings (on a 1–6 Likert scale) about the design of the reading devices was significantly different between devices, χ2(5) = 20.388, p < .01. Table 4 shows the results.

Table 4. Mean Rating and Standard Deviations (SD) for Design on a Likert Scale from 1 (Very Bad) to 6 (Very Good)

Navigation

Participants judged the navigation on a Likert scale from 1 (very bad) to 6 (very good). The question was, “How do you judge the navigation?” We found significant differences between the devices, χ2(5) = 25.064, p < .01. As a control question, we asked, “How are you getting along with the reading device?” and replicated the above result, χ2(5) = 29.411, p < .01. Table 5 shows the results and ranks.

Table 5. Mean Ratings and Standard Deviations (SD) for Navigation, Likert Scale from 1 (Very Bad) to 6 (Very Good)

Functionality

The question, “How do you judge the different functions/applications (like reading, listening music, storage of pictures…) of the reading device?” was judged on a Likert scale from 1 (very bad) to 6 (very good). We found significant differences in the functionality of the reading devices, χ2(5) = 19.265, p < .01. Table 6 shows results and ranks for functionality.

Table 6. Mean Rating and Standard Deviation (SD) of Functionality Ratings on a Likert Scale Ranging from 1 (Very Bad) to 6 (Very Good)

Handiness

The question was, “How handy do you rate the reading device?” Participants judged handiness on a Likert-scale ranging from 1 (very bad) to 6 (very good). We found significant differences in handiness of the reading devices, χ2(5) = 15.111, p < .05. Table 7 shows the results and ranks.

Table 7. Mean Rating and Standard Deviations (SD) for Handiness on a Likert Scale Ranging from 1 (Very Bad) to 6 (Very Good)

Usability ratings based on a questionnaire

After participants performed the requested tasks in the usability test, they filled out the usability questionnaire by Huang et al. (2006). The questionnaire originally resulted in a 10-item Likert scale that, for comparison with the other rating scales, was transformed into a 6-item Likert scale. The usability ratings differed significantly between the different reading devices, χ2(5) = 26.667, p < .01. Table 8 shows the mean usability ratings.

Table 8. Mean Usability Ratings (Based on the Questionnaire by Huang et al. 2006) and Standard Deviations (SD) Transformed to a Likert Scale Ranging from 1 (Very Bad) to 6 (Very Good)

Comparison Between Subjective User Data and Objective Eye Movement Data

In the multifunctional approach employed in our analysis, different usability methods were combined. We found a dissociation, which is a discrepant result between perception (eye tracking) and evaluation (interviews) in the second legibility test. The results in the first legibility test showed a significant correlation between preference and legibility (r = .356, p < .01); the second legibility test showed a dissociation (r = –.002) between preference and legibility. In the first session, 60% of the participants preferred the iRex device to read with; however, after having used all devices, only 30% still preferred it. Figure 4 shows the comparison between subjective and objective measures for the two tests, a similar distribution between interview data and eye-movement data in Test 1 (left side of Figure 4) and a dissociation in Test 2 (right side).

Figure 4. The two diagrams in the upper half show the percentage of “favorite to read with” as judged by participants. The lower diagrams show the eye-movement data as the percentage of participants who had the shortest visual fixations when reading on the device compared to the other devices.

This result suggests that subjective appraisal by the subject is prone to bias: When asking users to rate the legibility, their judgment is biased by their overall impression of the device, including usability. Test setup and procedure (like order of tasks and questions) influence the participant’s appraisal.

Recommendations

For consumers who want to buy an e-reader, we recommend the following:

Think about the purpose for which you want to use an e-reader.
Choose the e-reader model depending on the required functions.
Before you buy an e-reader, test it and check whether you can handle the device.

For practitioners doing usability tests, we recommend the following:

Do not rely exclusively on subjective interview data.
Use a combination of objective and subjective data.
Carefully analyze the differences between subjective and objective data.
If you do a usability test in an area where reading is involved, apply eye tracking.

Conclusion

Our study shows that the legibility of the current e-reader generation is comparable to that of classic paper books. E-ink technology enables a reading process that is very similar to the reading process of classic paper books. In our experiment, participants had the possibility of choosing the font size they liked. We have identified that especially older people prefer a big font size, and the eye-tracking data show that participants had significant shorter fixations on the e-readers compared to the classic paper book, which is an indication for better legibility. As a conclusion, we can say that in some situations, e-readers, with the option of changing font size, can have better legibility than classic paper books. The option of adjusting the font size could also enlarge the user group. For instance, people with low vision may benefit from the possibility of increasing font size on e-readers.

It should also be mentioned that the fact that participants had the possibility to adjust font size probably influenced eye-movement data. Because reading distance was constant and font size in the classic paper book was very small, some participants (especially older participants) reported problems with reading. This may have contributed to longer fixations in the classic book. Smaller font size may also be related to the number of letters per fixations. A single fixation covers more letters when the font size is smaller. One of the disadvantages when reading on an e-reader is the time spent with page-turns. Page-turns take a longer time with current e-ink technology because the building up of the displays takes some seconds. Another reason for the time loss due to page turning is the larger font size. Increasing font size inevitably results in a higher number of pages and, as a consequence, more page-turns. Although larger font size can negatively affect some eye-movement parameters, the possibility of adjusting font size will lead to better usability and also to better legibility.

The results of this study showed a remarkable deficit in the usability of the current e-reader generation. Participants had great problems using the e-readers. This is a crucial problem because perceived usability can influence subjective legibility ratings. In other words, if a person is not able to use a reading device efficiently, then he or she does not like reading with it. Future e-reader generations should have a more intuitive design. Users expect more from such a device than only displaying text. Further, e-reader generations may incorporate additional functionality like wireless local area network (WLAN) compatibility, the possibility to highlight text sections, a comment function, or a fast search function. To integrate such functions and make them intuitively usable will be a challenge for e-reader developers.

The finding that the usability of a device has an effect on legibility judgments expresses a methodical problem. In the first legibility test, where participants had no experience with the e-reading devices, the participants’ acceptance ratings were in close agreement with the efficiency of their eye-movement data. Thus, participants were able to judge which device was best for reading. However, after a period of self-regulated handling of the device, during the second legibility test, eye-movement data and the interview data showed a discrepancy. Between the first legibility test and the second legibility test, the participants had the opportunity to use the devices and had to give their usability ratings. In this session, participants encountered several problems during the interaction with the respective reading devices. They became disappointed or frustrated when they were not able to perform the exercises successfully. This outcome influenced their judgments in the second test session. After the second test session, the participants judged the classic paper book with the highest legibility rating. Objective eye-movement data remained constant, because eye-movement behavior is predominantly controlled by automatic and unconscious processes, which are not likely to be changed or manipulated deliberately and are only partially accessible to awareness (Albert & Tedesco, 2010). This discrepancy between subjective rating data and objective performance measures is an important cue for the interpretation of the results; it shows that the usability of a reading device is at least as important as its legibility.

References

Albert, W., & Tedesco, D. (2010). Reliability of self-reported awareness measures based on eye tracking. Journal of Usability Studies, 5(2), 50-64.
Creed, A., Dennis, I., & Newstead, S. (1987). Proof-reading on VDUs. Behaviour & Information Technology, 6(1), 3-13.
Dillon, A. (1992). Reading from the paper versus screens: a critical review of the empirical literature. Ergonomics, 35(10), 1297–1326.
Golovchinsky, G. (2008). Reading in the office. In Proceeding of the 2008 ACM Workshop on Research Advances in Large Digital Book Repositories (pp. 21-24). New York, NY. ACM Press.
Gould, J. D., & Grischkowsky, N. (1984). Doing the same work with hard copy and with cathode-ray tube (CRT) computer terminals. Human Factors, 26(3), 323-337.
Huang, S-M., Wei, C-W., Yu, P-T., & Kuo, T-Y. (2006). An empirical investigation on learners’ acceptance of e-learning for public unemployment vocational training. International Journal of Innovation and Learning, 3 (2), 174-185.
Just, M.A., & Carpenter, P.A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87, 329-354.
Lam P., Lam S.L., Lam J., & McNaught C. (2009). Usability and usefulness eBooks on PPCs: How students’ opinions vary over time. Australasian Journal of Educational Technology, 25(1), 30-44.
Marshall, C. C., & Ruotolo, C. (2002). Reading-in-the-small: a study of reading on small form factor devices, Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries (pp. 58-64). San Jose, CA. ACM Press.
McDowell, M., & Twal, R. (2009, September 17). Integrating Amazon Kindle: A Seton Hall University pilot program. Retrieved on September 17, 2009, from Educause Resources http://www.educause.edu/Resources/IntegratingAmazonKindleASetonH/163808 .
Menz, Ch., & Groner R. (1982).The analysis of some componential skills of reading acquisition. In R. Groner & P. Fraisse (Eds.) Cognition and Eye Movements (pp. 169-178). Amsterdam: Elsevier North Holland.
Nielsen, J. (2009, March 9). Kindle 2 usability review. Retrieved on November 16, 2009, from http://www.useit.com/alertbox/kindle-usability-review.html
Rao, S. S. (2003). Electronic books: A review and evaluation. Library High Tech, 27(1), 85-93.
Siegenthaler, E., Wurtz, P., & Groner, R. (2010). Comparing reading processes on e-ink displays and printed paper (submitted for publication).
Thompson, C. (2009). Digital readers: Fact and fiction. Proceedings of the IATUL Conference in Leuven. Available at: http://www.iatul.org/doclibrary/public/Conf_Proceedings/2009/Thompson-text.pdf
Van de Velde, C., & von Grünau, M. (2003). Tracking eye movements while reading: Printing press versus the cathode ray tube. Perception 32 ECVP Abstract Supplement.

Published in: in Volume 6, Issue 1,