Generalized User Experience Questionnaire (UEQ-G): Holistic Measurement of Multimodal UX

by Chase S. Boothe, PhD, Lesley Strawderman, PhD, Reuben F. Burch V, PhD, Brian K. Smith, PhD, Cindy L. Bethel, PhD, Kate Holmes, PhD

Peer-reviewed Article

pp. 75-103Download full article (PDF).

Abstract

The User Experience Questionnaire (UEQ) is a commonly used tool for measuring product experience. This study covers extending the UEQ to measure multimodal experiences that include both product and service experiences. Currently, no questionnaires measure holistic user experiences, including pragmatic and hedonic qualities, for both product and service experiences. Through three study phases, we created and tested the Generalized User Experience Questionnaire (UEQ-G). First, we generalized and tested language from the UEQ’s original, product experience context. Second, the UEQ-G was applied to controlled service experiences in which conditions were artificially manipulated across traditional UEQ factors. Third, we applied the UEQ-G in the field to experiences that contained both product and service experiences within the same scenario.

No significant differences were observed between the UEQ and UEQ-G during the first phase, but the UEQ-G detected differences between high and low conditions for all expected factors except one during the second phase. During the third phase, many expected correlations were found among UEQ-G factors and those from other well-established tools; however, a few expected correlations were not observed. This study found the UEQ-G to be as valid and reliable as its predecessor, UEQ, in product experience scenarios, and although additional study is required, the UEQ-G showed great potential in evaluating service experience scenarios and for evaluating multimodal experiences in the field. With additional study, the UEQ-G tool could be the first tool of its type for assessing holistic user experience across various multimodal experiences.

Keywords

User Experience Questionnaire, UEQ, UEQ-G, experience evaluation, holistic experience, multimodal experience

Introduction

Purpose

In an age in which users’ expectations of their digital and non-digital experiences are paramount, providing top-notch experiences is important to retain current customers, capture competitors’ unsatisfied customers, and maximize employee morale and productivity; not to mention, it is a way to make the world a little more pleasant. Whether the experience is digital, non-digital, or multimodal, a thoughtful and effective user experience is a basic requirement for any user-centric design. But how can improvements be made without effective measurement?

Defining UX

ISO 9241-210 defines user experience as the “user’s perceptions and responses that result from the use and/or anticipated use of a system, product or service” and specifically identifies brand image, presentation, functionality, system performance, interactive behavior, and assistive capabilities as factors determining UX (ISO 9241-210:2019, 2019).

ISO 9241-210 goes on to define user interface (UI) as “all components of an interactive system (software or hardware) that provide information and controls for the user to accomplish specific tasks with the interactive system” (ISO 9241-210:2019, 2019). Because the ISO definition of UX specifically identifies interactive behavior—a factor that seemingly subsumes the entire UI definition—as just one of many UX factors, it is reasonable to conclude that a UI is but one factor in determining the resulting UX.

Another important distinction between UX and a UI is that while a UI can be directly designed, a UX cannot. UX is rather the resulting perception a user has based on a series of interactions with a system, product, or service, and it is only those points of interaction that can be designed (Rogers et al., 2011-b). Therefore, the term “user experience design” can be misleading as it really describes the process of designing those points of interaction that impact the experience rather than the experience itself.

Confusion can be further exacerbated by the often-misunderstood relationship between UX and usability. Usability is “the extent to which a product can be used by specified users to achieve specified goals with effectiveness, efficiency and satisfaction in a specified context of use” (ISO 9241-210:2019, 2019). Usability only accounts for a portion of the factors that determine a resulting UX. However, unlike a UI, which can also be a factor in determining a resulting UX, it is virtually impossible to have a UX if usability is absent. Usability is “fundamental” to UX, and UX is “inextricably linked” to usability; therefore, the two should be evaluated in concert (Rogers et al., 2011-b). Those elements included in UX that are not related to usability are captured within the concepts of hedonic quality (Laugwitz et al., 2008).

Measuring UX

UX can be measured using various techniques such as interviews, focus groups, questionnaires, direct observation in the field, direct observation in a controlled environment, and indirect observation (Rogers et al., 2011-a). This study focuses on the single technique of standardized questionnaires, which have the benefits of being widely distributable and highly consistent. These benefits grant the ability to collect a large number of responses and compare results from one test to those of other user experiences, to benchmarks, or to previous versions of the same UX. However, questionnaires do not allow clarification and often do not lead to deep understanding. For research, it may be advantageous to employ a methodological triangulation strategy, utilizing the questionnaire in conjunction with other techniques (Rogers et al., 2011-a).

As discussed previously, the established, complete definition of UX includes system, product, and service experiences and includes elements of usability and hedonic quality. However, there are no questionnaires that attempt to assess the complete UX. As shown in Table 1, we reviewed 31 UX questionnaires on two dimensions: scope (product/system versus service) and assessed qualities (usability versus hedonic). As Table 1 shows, 15 of the 31 assessment tools apply to service experience. Of those 15, only one, the Customer Experience Index (CXi or CX Index™), assesses both usability and hedonic quality. However, the CXi is only applicable to an overarching service or brand experience and does not include specific product or service experience assessments. Furthermore, CXi is administered by Forrester Research, which does not allow complete visibility into their process (Forrester Research, Inc., n.d.).

In addition to the CXi, five other assessment tools measure both usability and hedonic quality but only for product or system experiences and not for service experiences. Those tools include the AttrakDiff2™ (Hassenzahl et al., 2003), Software Usability Measurement Inventory (SUMI) (Kirakowski & Corbett, 1993), Standardized User Experience Percentile Rank Questionnaire (SUPR-Q®) (Sauro, 2015), User Experience Questionnaire (UEQ) (Lund, 2001), and Website Quality (WEBQUAL™) (Wang & Senecal, 2007). Both the SUPR-Q and WEBQUAL are highly specialized for evaluating websites, so expanding their use to assess other types of products, not to mention service experiences, would be challenging. The AttrakDiff2, SUMI, and UEQ could all be modified to include service experience assessment with seemingly little effort. However, with three usability factors and three hedonic quality factors, the UEQ has a more balanced approach to assessing traditional usability and hedonic qualities than the AttrakDiff2, which includes only one usability factor, and the SUMI, which includes only one hedonic quality factor (Hassenzahl et al., 2003; Kirakowski & Corbett, 1993; Laugwitz et al., 2008). Therefore, the UEQ provided the best opportunity to create an assessment tool that can measure product, system, and service experiences on dimensions of usability and hedonic quality.

Table 1. Inventory and Brief Evaluation of UX Questionnaire Assessment Tools (Based on Tool Scope and Qualities Assessed)

Tool	Applies to Product/ System	Applies to Service	Assesses Usability	Assesses Hedonic Quality	Source
After-Scenario Questionnaire	Yes	Yes	Yes	No	Lewis, 1995
American Customer Satisfaction Index	Yes	Yes	No	Yes	American Customer Satisfaction Index LLC, n.d.
AttrakDiff2	Yes	No	Yes	Yes	Hassenzahl et al., 2003
Computer System Usability Questionnaire	Yes	No	Yes	No	Lewis, 1995
Customer Effort Score	Yes	Yes	Yes	No	Dixon et al., 2010
Customer Experience Index	No	Yes	Yes	Yes	Forrester Research, Inc., n.d.
Customer Satisfaction	Yes	Yes	No	Yes	Bendle et al., 2016
Emotional Metric Outcomes	Yes	Yes	No	Yes	Lewis & Mayes, 2014
Information Satisfaction	Yes	No	No	Yes	Lascu & Clow, 2008
Intranet Satisfaction Questionnaire	Yes	No	Yes	No	Bargas-Avila et al., 2009
NASA Task Load Index	Yes	Yes	No	No	Hart & Staveland, 1988
Net Promoter Score®	Yes	Yes	No	No	Reichheld, 2003
Post-Study Usability Questionnaire	Yes	No	Yes	No	Lewis, 1992
Practical Usability Rating by Experts	Yes	Yes	Yes	No	Rohrer et al., 2016
Questionnaire for User Interaction Satisfaction (QUIS™)	Yes	No	Yes	No	Chin et al., 1988
SERVPERF	No	Yes	No	No	Cronin Jr. & Taylor, 1994
SERVQUAL	No	Yes	No	No	Parasuraman et al., 1988
Single Ease Question	Yes	Yes	Yes	No	Sauro & Dumas, 2009
Software Usability Measurement Inventory	Yes	No	Yes	Yes	Kirakowski & Corbett, 1993
Standardized User Experience Percentile Rank Questionnaire	Yes	No	Yes	Yes	Sauro, 2015
Subjective Mental Effort Question	Yes	Yes	Yes	No	Zijlstra & van Doorn, 1985
System Usability Scale	Yes	Yes	Yes	No	Brooke, 1996
Usability Magnitude Estimation	Yes	Yes	Yes	No	McGee, 2003
Usability Metric for User Experience	Yes	No	Yes	No	Finstad, 2010
Usability Metric for User Experience Lite	Yes	No	Yes	No	Lewis, 2013
Usefulness, Satisfaction, and Ease-of-Use	Yes	Yes	Yes	No	Lund, 2001
User Experience Questionnaire	Yes	No	Yes	Yes	Laugwitz et al., 2008
Web Quality	Yes	No	Yes	No	Aladwani & Palvia, 2002
Website Analysis and Measurement Inventory	Yes	No	Yes	No	Kirakowski & Cierlik, 1998
Website Quality	Yes	No	Yes	Yes	Loiacono et al., 2002
Website Usability	Yes	No	Yes	No	Wang & Senecal, 2007

User Experience Questionnaire (UEQ)

The UEQ is an efficient product experience evaluation tool to supplement traditional expert evaluations and usability testing. It was originally created in German in 2006 (Laugwitz et al., 2006) and translated into English in 2008 (Laugwitz et al., 2008). Based on the UX framework proposed by Hassenzahl (2001), the aim of the UEQ was to measure the holistic UX including the aspects of perceived ergonomic and hedonic quality as well as overall perceived attractiveness (Laugwitz et al., 2008).

UEQ Method

The UEQ can be administered via paper and pencil or online survey. Participants are presented with 26 seven-point semantic differentials, with opposite attributes on either end of the scale. Based on their initial perceptions, participants rate their opinion for a product by checking a value within a scale. For half of the questions, the positive attribute is presented at the beginning of the differential, and for the other half it is flipped, providing the negative attribute first (Schrepp, 2019).

Raw data from the UEQ is analyzed by first transforming the data to a -3 to +3 scale, with -3 indicating that the participant associated the product experience most closely with the negative attribute and +3 indicating the opposite. For each participant, scores comprising each of the six UEQ factors are averaged. Each average factor score is then averaged across all participants to obtain six separate factor scores for the product of interest. Those six scores can be rolled up into attractiveness, pragmatic quality, and hedonic quality as shown in Figure 1. Table 2 shows each of the UEQ adjective pairs along with the factor to which each pair contributes. A single UEQ score cannot be attained without following a separate method, which we described earlier in the KPI approach (Hinderks et al., 2019). Of course, additional analyses of correlation of individual items with each factor, as well as variance of responses within the participant pool, are suggested for a standard UEQ analysis. There are also recommended methods for eliminating data from analysis based on inconsistent participant responses (Schrepp, 2019).

Figure 1. The assumed underlying structure of the UEQ factors (Schrepp et al., 2017).

Table 2. UEQ Individual Adjective Pairs and Their Respective Contributing Factors (Laugwitz et al., 2008)

UEQ Adjective Pairs	UEQ Factor
annoying/enjoyable	Attractiveness
attractive/unattractive	Attractiveness
friendly/unfriendly	Attractiveness
good/bad	Attractiveness
unlikable/pleasing	Attractiveness
unpleasant/pleasant	Attractiveness
meets expectations/does not meet expectations	Dependability
obstructive/supportive	Dependability
secure/not secure	Dependability
unpredictable/predictable	Dependability
fast/slow	Efficiency
impractical/practical	Efficiency
inefficient/efficient	Efficiency
organized/cluttered	Efficiency
conservative/innovative	Novelty
creative/dull	Novelty
inventive/conventional	Novelty
usual/leading edge	Novelty
clear/confusing	Perspicuity
complicated/easy	Perspicuity
easy to learn/difficult to learn	Perspicuity
not understandable/understandable	Perspicuity
boring/exciting	Stimulation
motivating/demotivating	Stimulation
not interesting/interesting	Stimulation
valuable/inferior	Stimulation

Research Overview

The next three sections describe three separate studies designed to develop and validate a robust questionnaire that measures both pragmatic and hedonic user experience qualities across product and service experiences. The first study, phase 1, takes the initial step by generalizing product-centric language from the UEQ so that the language is applicable to both product and service experiences. Furthermore, the first study examines the new, generalized version of the UEQ—the Generalized User Experience Questionnaire (UEQ-G)—within the original, product experience context for which the UEQ was originally designed. Study two then tests the UEQ-G in a series of controlled service scenarios which were designed to test each factor of the UEQ-G separately. Finally, study 3 applies the UEQ-G in the field to scenarios that include both product and service modalities within the same experience. Each of the following studies complied with the American Psychological Association Code of Ethics and was approved by the Institutional Review Board at Mississippi State University (IRB-20-312 & IRB-20-315). Informed consent was obtained from each participant.

Phase 1: Generalizing the UEQ to Measure Product and Service Experience

This study sought to generalize the language used in the UEQ, thereby creating a generalized user experience questionnaire (UEQ-G). Additionally, this study tested the new UEQ-G alongside the original UEQ in product scenarios in which the UEQ is known to be effective. The goal of this study was not to test the UEQ-G in a novel situation but to first confirm that it continued to perform in its original context after being generalized.

Methodology

Expert Revision Process

We formed a panel of nine expert UX professionals including researchers, designers, and information architects, all with significant industry experience. We asked the panel to identify which words or phrases in the current UEQ were product-oriented, rather than generalizable to a more holistic UX. Each professional independently reviewed the UEQ and highlighted specific words or phrases that they believed fit that description. Then, the panel discussed all the highlighted portions and collectively agreed on generalized replacement terms or phrases. The instructions for the original UEQ (format modified for this study) are shown in Figure 2; the modified instructions are shown using italics and red font to indicate changes in Figure 3. Additionally, the “easy to learn/difficult to learn” adjective pair was revised to “easy to grasp/difficult to grasp.” The expert revision process only resulted in a couple small changes, so there were early expectations that the revised UEQ would be successful when tested in its original context.

Figure 2. Screenshot of original UEQ instructions (formatted specifically for this study).

Figure 3. Screenshot of revised UEQ-G instructions (formatted specifically for this study) using italics and red font to show the changes from the original UEQ.

Participants

A human intelligence task (HIT) was posted on Amazon® Mechanical Turk™ with an incentive of $5.00 for successful HIT completion. To be eligible, Mechanical Turk workers had to be in the United States, have a HIT approval rating of at least 50%, have completed at least 50 HITs, have normal or corrected to normal vision, have full-color vision, be planning to use an Apple® iPhone® for the study, and have no experience with the Weather Puppy™ or Kelley Blue Book® mobile apps. An estimated 20 minutes were required for each participant to complete the HIT. As a first step, participants had to pass a screener questionnaire. The screener was attempted by 2,956 workers. Of those, 913 workers successfully passed, and 408 workers chose to begin the study. The 408 participants were comprised of 234 females (57.35%), 172 males (42.16%), and 2 individuals (0.49%) who preferred not to answer the question regarding gender. Participant ages ranged from 18 to 70 years of age with an average age of 33.48 (SD = 10.31). Only 268 participants successfully completed the study. Furthermore, for each participant, the difference between the highest and lowest individual adjective pair values in each factor was calculated. Forty participants had more than a difference of three for more than two factors, and they were removed from analysis due to a suspected lack of participant thoughtfulness, in alignment with previous recommendations (Schrepp, 2019). Another 4 participants had data that were less than 1.5 times the interquartile range below the first quartile; therefore, they were identified as outliers and removed as well. In all, data from 224 participants were used in this study. Based on an a priori power analysis with α=0.05, d=0.50, and β=0.10, a sample size of 99 for each population was necessary to detect a mid-sized effect for a Mann-Whitney U Test. With a sample size greatly exceeding that recommended by the power analysis, a failure to detect a difference was unlikely to be due to an insufficient sample size.

Procedure

After consenting to participate and completing demographic questionnaires, participants were asked to complete four study tasks within one of the two randomly assigned apps (Weather Puppy or Kelley Blue Book) and then complete either the UEQ or UEQ-G as well as the AttrakDiff2 in a SurveyMonkey® survey.

For the Weather Puppy app, participants were asked to complete the following tasks:

Download and open the “Weather Puppy Forecast + Radar” app from the App Store.
Using the app, find and review the chance of rain over the next few hours in your current city.
Using the app, find the current temperature in London, England.
Change your puppy theme within the app.

For the Kelley Blue Book app, participants were asked to complete the following tasks:

Download and open the Kelley Blue Book app from the App Store.
Using the app, find the fair market value of any used car you want.
Using the app, find the nearest Honda™ car dealer to your current location.
Using the app, find copyright and trademark information for Kelley Blue Book.

Mobile apps were chosen for testing due to the apps’ availability and recent studies confirming the effectiveness of the UEQ for evaluating mobile apps (Sabukunze & Arakaza, 2021; Hartono et al., 2022). Then, participants were asked to complete four more study tasks in the other app and complete the same questionnaire set as they did for the first app. Aside from the screening process via Mechanical Turk, all studies were conducted via participants’ mobile phones.

Results

Data Preparation

To begin, raw data from the UEQ, UEQ-G, and AttrakDiff2 were analyzed by first transforming the data to a -3 to +3 scale, with -3 indicating the participant associated the product experience most closely with the negative attribute and +3 indicating the opposite. Factor scores were calculated for each participant by averaging question scores from each UEQ, UEQ-G, and AttrakDiff2 respective factor. For each participant, the difference between the highest and lowest individual adjective pair values in each factor were calculated. In addition to the outliers previously mentioned, any participants who incorrectly answered an attention check question were removed from the analysis. Attention check questions were embedded toward the middle of the UEQ, UEQ-G, and AttrakDiff2. The questions simply asked the participant to “select the 3^rd option” and were formatted identically to the surrounding items. Finally, any participant who had a factor score greater than 1.5 times the interquartile range above the third quartile, or less than 1.5 times the interquartile range below the first quartile for a given factor, was labeled as an outlier and removed from analysis.

UEQ and UEQ-G Individual Adjective Pair Score Comparisons

Table 3 shows the individual adjective pair score means and standard deviations for each UEQ version.

Table 3. Mean UEQ/UEQ-G Individual Adjective Pair Scores Organized by UEQ Version

Individual UEQ/UEQ-G Adjective Pairs	UEQ			UEQ-G
Individual UEQ/UEQ-G Adjective Pairs	x̅	s	n	x̅	s	n
annoying/enjoyable	0.64	1.82	268	0.77	1.80	180
not understandable/understandable	1.46	1.62	268	1.40	1.68	180
dull/creative	0.68	1.70	268	0.66	1.74	180
difficult to learn/easy to learn (changed to “difficult to grasp/easy to grasp” for the UEQ-G)	1.44	1.65	268	1.37	1.59	180
inferior/valuable	0.84	1.60	268	1.02	1.59	180
boring/exciting	0.31	1.62	268	0.52	1.63	180
not interesting/interesting	0.77	1.70	268	0.88	1.70	180
unpredictable/predictable	0.85	1.47	268	0.78	1.49	180
slow/fast	0.88	1.67	268	1.12	1.51	180
conventional/inventive	0.08	1.78	268	0.10	1.71	180
obstructive/supportive	0.84	1.55	268	0.79	1.60	180
bad/good	1.20	1.69	268	1.28	1.64	180
complicated/easy	1.00	1.72	268	1.06	1.64	180
unlikable/pleasing	0.98	1.70	268	1.04	1.71	180
usual/leading edge	-0.32	1.64	268	-0.23	1.58	180
unpleasant/pleasant	1.11	1.69	268	1.13	1.69	180
not secure/secure	0.85	1.44	268	0.94	1.41	180
demotivating/motivating	0.64	1.50	268	0.86	1.48	180
does not meet expectations/meets expectations	1.14	1.86	268	1.26	1.74	180
inefficient/efficient	0.91	1.77	268	0.97	1.76	180
confusing/clear	1.03	1.82	268	1.08	1.82	180
impractical/practical	1.13	1.63	268	1.18	1.66	180
organized/cluttered	0.86	1.86	268	0.92	1.80	180
unattractive/attractive	0.92	1.76	268	0.82	1.82	180
unfriendly/friendly	1.21	1.63	268	1.42	1.56	180
conservative/innovative	0.19	1.56	268	0.22	1.63	180

A series of Mann-Whitney U Tests with a Bonferroni-corrected alpha of 0.002 indicated that there was no statistically significant difference between any of the individual adjective pair scores due to UEQ versions. Table 4 shows the results of the Mann-Whitney U Tests.

Table 4. Mann-Whitney Results from Testing the Effect of UEQ Version on UEQ/UEQ-G Individual Adjective Pair Scores

Individual UEQ/UEQ-G Adjective Pairs	U	z	p
annoying/enjoyable	23119.00	-0.76	.450
not understandable/understandable	23947.00	-0.13	.894
dull/creative	24082.50	-0.03	.977
difficult to learn/easy to learn(changed to “difficult to grasp/easy to grasp” for the UEQ-G)	23115.00	-0.77	.441
inferior/valuable	22211.00	-1.45	.147
boring/exciting	22286.00	-1.39	.165
not interesting/interesting	23213.50	-0.69	.493
unpredictable/predictable	23809.00	-0.24	.813
slow/fast	22435.50	-1.28	.200
conventional/inventive	23997.00	-0.09	.926
obstructive/supportive	23769.50	-0.27	.791
bad/good	23612.00	-0.39	.699
complicated/easy	23876.00	-0.19	.853
unlikable/pleasing	23411.50	-0.54	.591
usual/leading edge	23850.50	-0.20	.838
unpleasant/pleasant	23954.50	-0.13	.900
not secure/secure	23223.50	-0.69	.493
demotivating/motivating	22263.50	-1.42	.157
does not meet expectations/meets expectations	23558.50	-0.43	.668
inefficient/efficient	23640.50	-0.36	.716
confusing/clear	23641.50	-0.36	.716
impractical/practical	23434.00	-0.52	.602
organized/cluttered	23876.00	-0.18	.853
unattractive/attractive	23465.50	-0.50	.620
unfriendly/friendly	22351.50	-1.35	.177
conservative/innovative	23977.50	-0.11	.914

UEQ and UEQ-G Factor Score Comparisons

Table 5 shows the factor score means and standard deviations for each UEQ version.

Table 5. Mean UEQ/UEQ-G Factor Scores Organized by UEQ Version

UEQ/UEQ-G Factors	UEQ			UEQ-G
UEQ/UEQ-G Factors	x̅	s	n	x̅	s	d
Attractiveness	1.01	1.56	268	1.08	1.54	180
Perspicuity	1.23	1.56	268	1.23	1.55	180
Efficiency	0.95	1.45	268	1.05	1.48	180
Dependability	0.92	1.26	268	0.94	1.27	180
Stimulation	0.64	1.41	268	0.82	1.38	180
Novelty	0.16	1.40	268	0.18	1.42	180

A series of Mann-Whitney U Tests with a Bonferroni-corrected alpha of 0.008 indicated that there was no statistically significant difference between any of the factor scores due to UEQ versions. Table 6 shows the results of the Mann-Whitney U Tests.

Table 6. Mann-Whitney Results from Testing the Effect of UEQ Version on UEQ/UEQ-G Factor Scores

UEQ/UEQ-G Factors	U	z	p
Attractiveness	23553.00	-0.42	.673
Perspicuity	24037.50	-0.06	.951
Efficiency	22948.00	-0.87	.382
Dependability	23690.00	-0.32	.748
Stimulation	22191.00	-1.44	.150
Novelty	24032.50	-0.07	.948

Internal Consistency of the UEQ Factors

Cronbach’s alpha (Cortina, 1993) was calculated for each of the UEQ factors. Alpha values of 0.70 and above are generally thought to show a high level of internal consistency (Landauer, 1997). Each factor was found to be highly internally consistent including attractiveness (6 items, α = .959), perspicuity (4 items, α = .935), efficiency (4 items, α = .854), dependability (4 items, α = .802), stimulation (4 items, α = .899), and novelty (4 items, α = .862).

Internal Consistency of the UEQ-G Factors

Cronbach’s alpha (Cortina, 1993) was calculated for each of the UEQ-G factors. Again, a value of 0.70 is thought to indicate a high level of internal consistency (Landauer, 1997). Each factor was found to be highly internally consistent including attractiveness (6 items, α = .954), perspicuity (4 items, α = .942), efficiency (4 items, α = .899), dependability (4 items, α = .831), stimulation (4 items, α = .883), and novelty (4 items, α = .876).

Correlations Among UEQ/UEQ-G and AttrakDiff2 Factors

As no significant difference was found between UEQ and UEQ-G individual or factor scores, data from both were combined, and a Spearman correlation was performed to identify if there were significant relationships among combined UEQ/UEQ-G factors and AttrakDiff2 factors. Statistically significant relationships were found among all UEQ/UEQ-G and AttrakDiff2 factors with 446 degrees of freedom and p < .001 for all factor combinations. Table 7 shows the correlation coefficients for each factor combination. Except for the relationship between novelty and pragmatic quality, each of the correlation coefficients was greater than 0.50 and can be classified as strong positive correlations. Additionally, the novelty and pragmatic quality correlation coefficient was greater than 0.30 and can be classified as a moderate positive correlation (Cohen, 1977).

Table 7. Correlations Coefficients Among UEQ/UEQ-G Factor Scores and AttrakDiff2 Factor Scores

UEQ/UEQ-G Factors	AttrakDiff2 Factors
UEQ/UEQ-G Factors	Attractiveness	Pragmatic Quality	Identification	Stimulation	Hedonic Quality
Attractiveness	.940	.769	.829	.728	.841
Perspicuity	.741	.872	.689	.484	.624
Efficiency	.777	.845	.789	.499	.681
Dependability	.770	.840	.757	.481	.655
Stimulation	.888	.694	.812	.725	.831
Novelty	.724	.434	.669	.861	.840

Conclusion

In this study, participants were asked to interact with two mobile apps, and after experiencing each, they completed either the UEQ or the UEQ-G and the AttrakDiff2. Upon analyzing the participants’ responses to the questionnaires, no difference could be found between the UEQ and UEQ-G in either the individual component scores or the calculated factor scores. The inability to detect a difference between the two questionnaire versions, despite having sufficient sample size, indicates that the survey versions are likely to elicit the same participant responses for the same product experience. As there were only minor changes between the UEQ and UEQ-G, this finding is not surprising.

When the UEQ was originally created, correlations were found between some of its factors and the AttrakDiff2’s factors (Laugwitz et al., 2008). This study also explored those correlations as well, and the same significant relationships with the AttrakDiff2 factors were found. The findings of this study strengthen the confidence in the original UEQ findings as well as indicate the similarity of the new UEQ-G to its predecessor. Using Cronbach’s alpha, this study also explored the internal consistency of the factors contained in both the UEQ and UEQ-G. Each factor across both versions of the UEQ was found to be quite high. Again, these findings both strengthen confidence in the existing UEQ as well as the new UEQ-G.

This study sought to introduce and begin to qualify the UEQ-G, a new, slightly modified version of the UEQ with language revisions that support service-based experience evaluations in addition to the product experience evaluations to which the legacy UEQ was purposed. However, rather than immediately testing the UEQ-G in a novel, service experience scenario, this study sought to test it in a traditional, product experience scenario alongside the original UEQ. Results from this study indicate that the UEQ-G is an appropriate evaluation tool for the traditional scenarios for which the original UEQ was created.

Phase 2: Using the UEQ-G to Evaluate Controlled Non-Digital Service Experiences

Delivering thoughtful, refined experiences is important, but experiences can be complex, having both pragmatic and hedonic qualities including product experience and service experience modalities within the same scenario. The UEQ-G has the potential to aid in this challenge by measuring those complex experiences. However, a critical question must be answered to ensure the UEQ-G is able to meet these expectations: Is the UEQ-G sensitive, valid, and reliable when evaluating service experiences? The purpose of this study is to answer this question.