- rt Scale Definition, Examples and Analysis
- rt Scale Examples for Surveys
- How can you analyze data from a rt scale?
- How to reference this article:
- rt Scales vs. Slider Scales in commercial market research
- What is the rt Scale?
- Criticism of rt Scales in market research
- The Slider Scale, a valid alternative?
- Reliability and validity of Slider Scales
- Use of rt Scales With Children
rt Scale Definition, Examples and Analysis
Various kinds of rating scales have been developed to measure attitudes directly (i.e. the person knows their attitude is being studied). The most widely used is the rt scale (1932).
In its final form, the rt scale is a five (or seven) point scale which is used to allow the individual to express how much they agree or disagree with a particular statement.
I believe that ecological questions are the most important issues facing human beings today.
A rt scale assumes that the strength/intensity of an attitude is linear, i.e. on a continuum from strongly agree to strongly disagree, and makes the assumption that attitudes can be measured.
For example, each of the five (or seven) responses would have a numerical value which would be used to measure the attitude under investigation.
rt Scale Examples for Surveys
In addition to measuring statements of agreement, rt scales can measure other variations such as frequency, quality, importance, and lihood, etc.
- Strongly Agree
- Strongly Disagree
- Very Important
- Moderately Important
- Slightly Important
- Very Poor
- Almost Always True
- Usually True
- Occasionally True
- Usually Not True
- Almost Never True
- Probably Not
- Definitely Not
For a complete table of rt scale examples click here.
How can you analyze data from a rt scale?
The response categories in rtscales have a rank order, but theintervals between values cannot bepresumed equal.
Therefore, the mean (and standard deviation)are inappropriate for ordinal data(Jamieson, 2004)
Statistics you can use are:
• Summarize using a median or a mode (not a mean as it is ordinal scale data ); the mode is probably the most suitable for easy interpretation.
• Display the distribution of observations in a bar chart (it can’t be a histogram, because the data is not continuous).
rt Scales have the advantage that they do not expect a simple yes / no answer from the respondent, but rather allow for degrees of opinion, and even no opinion at all.
Therefore quantitative data is obtained, which means that the data can be analyzed with relative ease.
Offering anonymity on self-administered questionnaires should further reduce social pressure, and thus may wise reduce social desirability bias.
Paulhus (1984) found that more desirable personality characteristics were reported when people were asked to write their names, addresses and telephone numbers on their questionnaire than when they told not to put identifying information on the questionnaire.
However, all surveys, the validity of the rt scale attitude measurement can be compromised due to social desirability.
This means that individuals may lie to put themselves in a positive light. For example, if a rt scale was measuring discrimination, who would admit to being racist?
Download this article as a PDF
How to reference this article:
McLeod, S. A. (2019, August 03). rt scale. Simply Psychology. www.simplypsychology.org/rt-scale.html
Bowling, A. (1997). Research Methods in Health. Buckingham: Open University Press.
Burns, N., & Grove, S. K. (1997). The Practice of Nursing Research Conduct, Critique, & Utilization. Philadelphia: W.B. Saunders and Co.
Jamieson, S. (2004). rt scales: how to (ab) use them. Medical Education, 38(12), 1217-1218.
rt, R. (1932). A Technique for the Measurement of Attitudes. Archives of Psychology, 140, 1–55.
Paulhus, D. L. (1984). Two-component models of socially desirable responding. Journal of personality and social psychology, 46(3), 598.
rt Scales vs. Slider Scales in commercial market research
The rt Scale – in its various formats – is widely used, for instance in psychology, social sciences but also in commercial market research. Respondents may be asked about their attitudes, perceptions or evaluations of organisations, services or brands.
The use of rt Scales, however, has come under scrutiny. It is argued that the traditional 5-point rating scales are boring, repetitive and overly long. The proposed alternative is the Slider Scale.
The question then is this: are Slider Scales really better than rt Scales?
After I finished my PhD, I decided it was time to quit Academia and head for pastures new: The Exciting Universe Of Market Research. As a way of integrating myself in this new universe, I started reading various blogs and (quasi) scientific articles on market research and online surveying.
As CheckMarket is specialised in online surveys this seemed a fitting starting point.
However, to my initial surprise, it took me only a couple of blogs to reach the conclusion that rt Scales – the same scales that are practically declared holy in my previous universe; the academic one – have come under some scrutiny.
Even to the point that it is seriously questioned whether they should be retained or replaced by so-called ‘Slider Scales’. However, in this post I will come to the defence of rt Scales – if they actually need me defending them. I will argue that the case of rt Scale vs.
Slider Scale is the design and the customer-experience of online surveys, rather than on sound methodological arguments. Without a shadow of a doubt, slider scales have a more inviting design but as it turns out, there are not too many methodological reasons to favour them over ‘boring’ rt Scales.
What is the rt Scale?
First things first. What is a rt Scale? A rt Scale, named after its developer, the psychologist Rensis rt, is a scale that is designed to measure underlying attitudes or opinions. This scale usually consists of several items, so-called rt Items, on which respondents can express their opinions.
Say for instance that we are trying to measure the underlying ‘general happiness’ of market researchers, then we would probably ask them several questions regarding their happiness in their private and professional life. These questions such as ‘How happy are you with your current employment?’ or ‘How happy are you with your current family life?’ are the so-called rt Items.
these rt items, we would then construct the rt Scale on ‘general happiness of market researchers’.
Usually respondents express their opinions by choosing one of the response alternatives from the response scale. This response scale can take various formats. On the one hand there is a distinction between worded and numerical formats and on the other hand there is the distinction between the most common 5-, 7-, 10- or 11-point format (Dawes, 2008).
Consequently, it seems quite logical that the 5-point format is usually worded (e.g. strongly disagree, disagree, neutral, agree, strongly agree) while a 10-point format is most often numerical.
After all, the gradation of (dis)agreement on a 10-point rating scale probably becomes too granular to easily express in words .
The rt Scale – in its various formats – is widely used, for instance in psychology, social sciences but also in commercial market research (Dawes, 2008). In commercial market research respondents may be asked about their attitudes, perceptions or evaluations of amongst others organisations, services or brands.
However, these rt Scales are often used for measuring single-item issues, rather than for measuring an underlying attitude. Recalling our example of the ‘general happiness of market researchers’, this means that we use the rt Scale for measuring how happy they are with their salary, rather than measuring how happy they are in general.
Even though this distinction is hardly ever made, we should be aware of it and the fact that when we are talking about ‘rt Scales vs. Slider Scales’, we are actually talking about the 5-, 7-, 11-point rating scale of rt Items rather than rt Scales.
Nevertheless, for the sake of clarity I will consistently use rt Scale, even if I am technically talking about the X-point rating scale.
Criticism of rt Scales in market research
Going back to the original purpose for writing this post, it appears that the use of rt Scales has come under some scrutiny in the commercial market research. While this is not necessarily a bad thing, the various criticisms levelled at the use of traditional rt Scales ought to be valid.
Let us look at the two most common criticisms that I have come across. First of all, it is argued that the traditional 5-point rating scales are quite boring, repetitive and certainly overly long.
Furthermore, a large battery of questions using 5-point rating scales might discourage respondents. Second it is equally argued that respondents are forced into expressing an opinion that is not their real opinion because there are too few response alternatives offered.
In short, it is argued that the 5-point rt Scale is too blunt a) to detect differences between items and b) to precisely measure specific opinions as the respondent’s true opinion can lie in between the answer categories.
While the first criticism is probably true, especially in commercial market research in which respondents have to be convinced to participate in surveys, there are various reasons to question the validity of the second.
The Slider Scale, a valid alternative?
The proposed alternative to the traditional rt Scale is the (admittedly more attractive) Slider Scale.
These Slider Scales are basically rt Scales with much more response categories and, instead of selecting one of the response alternatives, respondents use a slider to position themselves on a certain question.
It is argued that this Flash-based alternative does not only enable more interactivity in online surveys but that they equally enable respondents to indicate more precisely their opinions. Respondents have more response options, ergo the results will be more finely grained.
This argument is then strengthened by some experimental findings that show that respondents seem to answer questions slightly different when they are first confronted with a traditional rt Scale and subsequently with a Slider Scale.
Nevertheless, I tend to argue that this case of ‘rt Scales vs. Slider Scales’ is mainly about the design and interactivity of online surveys, i.e. to make it a more pleasing experience for the respondents, than about sound methodological reasons. After all, there do not seem to be that many sound methodological arguments to favor Slider Scales. Let us look at some evidence.
First of all, there has been done quite a lot of scientific research on the effect of the number of response alternatives on the psychometric properties of a scale, i.e. the reliability and validity.
The former refers to the fact that the scale can be interpreted across various situations and the latter refers to the fact that the scale actually measures what it set out to measure.
The recurring conclusion is that when the number of response alternatives is increased, the reliability and validity of the underlying factor increase as well. For instance Lozano et al.
(2008) have shown that both the reliability and validity of a rt Scale decrease when the number of response options is reduced. Vice versa, if you increase the number of response options, the reliability and the validity increase. They conclude that the bare minimum of a rt Scale should be four response categories.
Reliability and validity of Slider Scales
So, at first sight, this seems to favour the use of Slider Scales with more response alternatives. Nevertheless, the positive relationship between the number of response categories and the reliability and validity of the scale is not a linear one. Lozano et al.
(2008) equally show that the increase in reliability and validity is very rapid at first but it tends to level off at about 7 response alternatives.
After 11 response alternatives there is even hardly any gain in reliability and validity from increasing the number of response categories.
In short, from a psychometric point of view it is shown that the gains are scarce when including more finely graded scales than a 11-point scale, i.e. more than 11 response categories. They hardly improve the scale reliability nor its validity (Dawes, 2008).
Second, plenty of scientific research has shown that respondents use the meaning of the labels attached to some response categories when mapping judgments to response scales (Rohrmann, 2003; Wegner, Faulbaum, & Maag, 1982; Wildt & Mazis, 1978).
So, a larger number of response categories with few labelled points makes it harder for respondents to orientate themselves.
Intuitively, it makes sense that labelled response categories are less ambiguous to respondents than when only the end labels are provided as “respondents need to figure out the meaning of the intermediate response categories to determine the option that comes closest to expressing their opinion” (Weijters, Cabooter & Schillewaert, 2010).
If we project these findings on the Slider Scale, i.e. most often a scale with many response categories and few labelled points, then it is rather easy to see that the often lauded strong point of this scale type might not be so strong after all.
After all, it is commonly argued that Slider Scales offer respondents more response categories and as a result generate more precise data.
Nevertheless, selecting the right response option that corresponds with their real opinion will be more challenging when respondents need to make up the right meaning for each response category (De leeuw, 1992; Krosnick, 1991) and this will only become more challenging if the number of response categories increases.
In short, while a Slider Scale is most definitely more attractive, interactive and is more consumer-friendly than the archetypical rt Scale, the case of ‘rt Scale vs. Slider Scale’ is almost certainly a discussion about design rather than about methodology. Personally, I would judge the case to be inadmissible. For an overview of frequently used formats of rt Scales in market research I refer the reader to figure 1 in ‘Weijters, B., Cabooter, E. & Schillewaert, N. (2010), The effect of rating scale format on response styles: The number of response categories and response category lables, International Journal of Research in Marketing, 27 (3), 236-247.  See for instance Lozano, L., Garcia-Cueto, E. & Muniz, J. (2008), Effect of the number of response categories on the reliability and validity of rating scales, Methodology, 4 (2), 73-79.  Note that here I am talking about the rt Scale as in its original purpose, i.e. measuring an underlying attitude via a scale.
Use of rt Scales With Children
- Split View
- Article contents
- Figures & tables
- Supplementary Data
Objective We investigated elementary school children’s ability to use a variety of rt response formats to respond to concrete and abstract items. Methods 111 children, aged 6–13 years, responded to 2 physical tasks that required them to make objectively verifiable judgments, using a 5-point response format. Then, using 25 items, we ascertained the consistency between responses using a “gold standard” yes/no format and responses using 5-point rt formats including numeric values, as well as word-based frequencies, similarities to self, and agreeability. Results All groups responded similarly to the physical tasks. For the 25 items, the use of numbers to signify agreement yielded low concordance with the yes/no answer format across age-groups. Formats words provided higher, but not perfect, concordance for all groups. Conclusions Researchers and clinicians need to be aware of the limited understanding that children have of rt response formats.
abstract tasks; children; rt scale
The use of rt scales, which call for a graded response to a series of statements, is a common means of assessing people’s attitudes, values, internal states, and judgments about their own or others’ behaviors in both research and clinical practice.
Users include professionals such as pediatric psychologists and other health professionals who administer psychometric tests that use rt scale formats in their research and their practice with children with medical conditions.
Since first described by rt (1932), the range of variables assessed by these scales, as well as their scalar ranges, has proliferated. Further, the populations with which they have been used have expanded to include children as well as adults.
However, the degree to which such scalar formats yield valid data when used with children has not been well established. The aim in this article was to investigate this issue.
In his seminal article describing this response format, rt (1932) reported highly satisfactory reliability data, which he claimed compared favorably with that obtained by other means.
Research using adult participants since then has typically confirmed that rt format scales are generally reliable and valid instruments for the measurement of a range of attitudes and mood states.
They have also been found to yield data that approximate the probability density function thought to fit the data in question even if skewed (e.g., subjective well-being, Cummins, 1998).
This distribution is particularly important when measuring, for example, attitudes to stark issues, which require respondents to either agree or disagree. Overall, rt-type scales provide a useful and relatively simple method of obtaining data in the social sciences.
Recently, rt scales have been used in a range of research projects and clinical settings in which children are the focus of study or treatment. Some examples of the different scales using rt formats in research with children, including ages of the samples and response formats used, are presented in Table I.
Response scales typically vary from 3 to 5 response points. For example, the Children’s Impact of Traumatic Events Scale-Revised (Wolfe, 1996) is a 3-point response scale (very true; somewhat true; not true), as is the Strengths and Difficulties Questionnaire (Goodman, 1997; not true; somewhat true; certainly true).
The Social Anxiety Scale for Children-Revised is a 5-point scale, with items rated in terms of how much the item is “true” for the respondent (1 = not at all, 5 = all the time). Variations include dichotomous choices, for example, “Yes” or “No” responses to items about feelings or behavior (e.g.
, the Children’s Manifest Anxiety Scale and the Piers-Harris Children’s Self-Concept Scale), or the selection of one of three statements that best describe the respondent’s feelings over the past 2 weeks (Children’s Depression Inventory).
Responses to the Self-Perception Profile for Children are more complex in that it requires respondents to read two statements, choose the description that best fits them, and then choose whether the description is really true of them or sort of true of them.
Examples of rt Scales Used in Research and Clinical Practice With Children
|Goodman, 1997||Self-report version of the Strengths and Difficulties Questionnaire||General psychological and behavioral problems||4–16||3-point rt|
|Harter, 1985||Self-Perception Profile for Children||Self-esteem||8–15||4-point rt|
|Kovacs, 1992||Children's Depression Inventory||Depression||7–16||3-point rt|
|La Greca & Stone, 1993||Social Anxiety Scale for Children-Revised||Social anxiety||9–13||5-point rt|
|McCabe & Ricciardelli, 2002||Children’s version of the Eating Attitude Test||Body image concerns, engagement in body change strategies||8–11||5-point rt|
|Mellor & Moore, 2003||Questionnaire on Teacher Interaction||Perceived teacher style||11–14||5-point rt|
|Moore & Mellor, 2003||Social Interaction Questionnaire||Social anxiety and peer relations||11–14||4-point rt|
|Piers & Harris, 1969||Piers-Harris Children’s Self-Concept Scale||Self-esteem||7–18||Yes/no|
|Reynolds & Richmond, 1985||Reynolds Children's Manifest Anxiety Scale||Anxiety||6–19||Yes/no|
|Wolfe, 1996||Children’s Impact of Traumatic Events Scale-revised||Posttraumatic stress||8–16||3-point rt|
|Reynolds & Kamphaus, 2004||Behavior Assessment System for Children||Behavioral problems||8–11 and 12–21||Combination of true/false and 4-point frequency|
|Valla, Bergeron Berube, Gaudet, & St-Georges, 1994||Dominic-R and Terry questionnaires||DSM mental disorders||6–11||Yes/no|
In consideration of the capacity of children to respond to such scales, some authors have been careful in choosing item wording (e.g.
, Piers-Harris Children’s Self-Concept Scale) where items are written at a second-grade reading level, or they have reduced the number of response choices, for example, Wright and Asmundson (2003) who changed the original 5-point rt scale response format for the Illness Attitudes Scale to a 3-point format to make it more easily understood by children.
Other authors have followed Tischer and Lang (1983) and substituted faces on which various degrees of happiness or sadness are depicted for written choice points (e.g., Mellor, McCabe, Ricciardelli, & Ball, 2004).
Despite these variations, little other consideration seems to have been given to the more fundamental issue of whether children actually have the capacity to respond to rt scale formats in a way that accurately reflects their judgments, attitudes, or values.
Cognitive development literature would suggest that this matter is of critical importance. For example, Gelman and Baillargeon (1983) argued that younger children primarily think dichotomously. Thus, asking them to respond on a 5-point scale may be beyond their capacity.
With regard to content, Marsh (1986) examined a sample of children aged between 7 and 12 years and found that some children, specifically younger children and those with poor verbal skills, were less able to respond to negatively worded items.
Other researchers have tested children aged 5–12 years (Chambers & Craig, 1998; Chambers & Johnstone, 2002) and 5–11 years (von Baeyer, Carlson, & Webb, 1997) and suggested that younger children have a tendency to endorse responses at the extreme end of scales when presented with items a rt scale, thus providing unrefined measures of the constructs under investigation. However, Chambers and Johnston (2002) did suggest that this may vary according to what is being assessed.
The importance of these findings is that, as described above, many scales administered to children are used to assess intangible theoretical constructs (including emotions) or subjective judgments about the self. These are different from judgments about matters having an objective accuracy (e.g., a number of objects, or people).
In Chambers and Johnston’s (2002) study, younger children were found to respond as accurately as older children to tasks involving judgments about physical objects, but used extremes in responding to questions about feelings.
This pattern was found with both 3-point and 5-point response scales, suggesting that simplifying the response format did not increase children’s capacity to use scales.
For a scale to produce reliable and valid data, it must accurately and consistently reflect the measured judgment, attitude, or value. Of critical importance to the use of rt scales with children is whether an accurate or appropriate internal response will be elicited by the declarative statements.
The use of the rt format assumes that the accurate and representative response has already been internally generated by the child, which may not necessarily be the case. Zeman, Cassano, Perry-Parrish, and Stegall (2006) noted that children’s emotional development shares a transactional relationship with their social, neurophysiological, cognitive, and language development.
Thus, any scale that uses a rt format to assess feeling states may be confronted with issues of whether the states are differentiated internally by the child, as well as their cognitive capacity.
The work of cognitive developmentalists such as Piaget (1954) would suggest that certain types of judgments should be harder for children in the stage of concrete operations (7–11 years of age), during which the child develops the capacity to make judgments and reason about the physical world, than the subsequent stage of formal operations (11–16 years) in which the capacity to think in abstract terms (usually) evolves. Thus, it would seem that the use of rt scales for assessing judgments about tangible/physical materials or their representations may be more amenable to assessment in younger children than those about intangible/abstract concepts such as internal feelings. Furthermore, theorists focusing on working memory capacity (e.g., Barrouillet & Lepine, 2005) and on basic arithmetic proficiency (e.g., Haverty, Koedinger, Klahr, & Alibali, 2000) typically support such an age progression in abilities. However, others have found a U-shape exists across ages 7–11 years on mathematical equivalence. For instance, McNeil (2007) found a decrease in performance between the ages of 7 and 9 years, which was reversed by age 11. Of course, children’s metacognitive development can also be enhanced and perhaps earlier than the formal operations stage as shown by White and Frederiksen (2005) in their manipulation of metacognitive abilities among fifth-grade children.
This study explored this issue by investigating children’s responses to rt scale items requiring judgments about both physical and abstract concepts.
If children are unable to respond accurately to the objectively verifiable and manipulated physical events, then it could be argued that the rt format cannot accurately assess their judgment about subjective and more abstract matters.
On the other hand, if they can respond with accuracy to questions about physical matters, it might be argued that they could have the capacity to use rt scales in other realms.
In line with the findings of Chambers and Johnston (2002), we expected that older children would be able to use rt formats in both domains, but that younger children’s ability would be limited to the concrete physical domain.
However, since Zeman, Klimes-Dougan, Cassano, and Adrian (2007) suggested that future research should focus on alternative response formats for assessing children with rt scales, we also examined a number of alternative anchor points to establish which provides the optimal scale format for all children when abstract constructs are under investigation, in terms of their consistency with a “gold standard” yes/no response. We used yes/no as the gold standard because we believed that it provided the least ambiguity for the participants, as they were not required to respond in terms of degrees of agreement. While Fritzley et al., reported a “yes” bias to this format in a sample of 2–5-year-olds, this bias was found mainly among 2- and 3-year-olds. Other recent research by Rocha, Marche, and Briere (2013) supported the use of the yes/no format in older children. They argued that according to fuzzy trace theory, some forms of multiple-choice questions should elicit higher error rates than yes/no questions.
The alternative anchor formats were numeric values (1–5), as well as word-based frequencies (e.g., never to regularly), similarities to self (e.g., not me at all, to very much me), and agreeability (strongly agree to strongly disagree). The rationale for selecting these different rt anchor formats is that they are used commonly in various measures.
One hundred and eleven Anglo-Australian children aged between 6 and 13 years (M = 9.64 years, SD = 1.82) participated in the study. There were 59 girls and 52 boys in the sample.
All children were students at elementary schools in a regional city in the state of Victoria, Australia, and were tested on two or three occasions, 2 weeks apart.
The sample was divided into three age-groups: 6 and 7 years (