construct measurement in research

The process of creating the indicators is called scaling. Likert’s summative scaling method. For instance, in the SES index, isn’t income correlated with education and occupation, and if so, should we include one component only or all three components? MEASURES OF MARKETING CONSTRUCTS Gilber A. Churchill (1979) Introduced by Azra Dedic in the course of “Measurement in Business Research” Introduction Measurements are “rules for assigning numbers to objects to represent qualities of attributes”. Rather, scaling is the formal process of developing scale items, before rating scales can be attached to those items. Research objectives typically call for the measurement of constructs. For instance, one can create a political typology of newspapers based on their orientation toward domestic and foreign policy, as expressed in their editorial columns, as shown in Figure 6.2. Operationalisation refers to the process of developing indicators or items for measuring these constructs. Note that many variables in social science research are qualitative, even when represented in a quantitative manner. Constructs exist at a higher level of abstraction than concepts. Because items appear equally throughout the entire 11-point range of the scale, this technique is called an equal-appearing scale. Note that some of these scales may include multiple items, but all of these items attempt to measure the same underlying dimension. Nominal scales merely offer names or labels for different attribute values. For instance, academic aptitude can be measured using two separate tests of students’ mathematical and verbal ability, and then combining these scores to create an overall measure for academic aptitude. Answering all of these questions is the key to measuring the prejudice construct correctly. Multidimensional constructs consist of two or more underlying dimensions. As with Thurstone’s method, the Likert method also starts with a clear definition of the construct of interest, and using a set of experts to generate about 80 to 100 potential scale items. Scientific research requires operational definitions that define constructs in terms of how they will be empirically measured. Quantitative data can be analyzed using quantitative data analysis techniques, such as regression or structural equation modeling, while qualitative data require qualitative data analysis techniques, such as coding. If an employment status item is modified to allow for more than two possible values (e.g., unemployed, full-time, part-time, and retired), it is no longer binary, but still remains a nominal scaled item. Next, a matrix or table is created showing the judges’ responses to all candidate items. This is particularly the case with many social science constructs such as self-esteem, which are assumed to have a single dimension going from low to high. An instrument was developed based on a critical review of both the conceptualization and practice of this construct. Sophisticated transformation such as positive similar (e.g., multiplicative or logarithmic) are also allowed. This research was conducted to identify the causes of design–construction interface problems in large building construction projects in Palestine. Values of attributes may be quantitative (numeric) or qualitative (non-numeric). Multidimensional constructs consist of two or more underlying dimensions. NOT objects themselves. This matrix is sorted in decreasing order from judges with more ‘yes’ at the top to those with fewer ‘yes’ at the bottom. Unidimensional scale measures constructs along a single scale, ranging from high to low. Because items appear equally throughout the entire 11-pointrange of the scale, this technique is called an equal-appearing scale. Stevens, S. (1946). Perceived severity refers to an individual’s belief about the seriousness of contracting an illness or disease, or the severity of the consequences of leaving it untreated. It is different from scales in that scales also aggregate measures, but these measures measure different dimensions or the same dimension of a single construct. Lastly, validate the index score using existing or new data. Understand that “scales”, as discussed in this section, are a little different from “rating scales” discussed in the previous section. A concept is an idea that is generalizable or agreed-upon by many people. For instance, students’ rankings in class say nothing about their actual GPAs or test scores, or how they well performed relative to one another. Philip M. Podsakoff. A typical example of a six-item Likert scale for the ‘employment self-esteem’ construct is shown in Table 6.3. For instance, you’ll need to decide how you will categorise occupations, particularly since some occupations may have changed with time (e.g., there were no web developers before the Internet). Scales and indexes generate ordinal measures of unidimensional constructs. These scales are called “ratio” scales because the ratios of two points on these measures are meaningful and interpretable. Construct measurement is of pivotal importance if we are to advance management research and scholars must (at a minimum) demonstrate that (1) measures employed plausibly capture the theoretical constructs and (2) theoretical and empirical levels of analysis for the proposed construct match (Lawrence, 1997). For each item, compute the median and inter-quartile range (the difference between the 75th and the 25th percentile—a measure of dispersion), which are plotted on a histogram, as shown in Figure 6.1. The next chapter will examine how to evaluate the reliability and validity of the scales developed using the above approaches. First, you have to understand the fundamental ideas involved in measuring. Y indicates exceptions that prevents this matrix from being perfectly cumulative. Based on the four generic types of scales discussed above, we can create specific rating scales for social science research. However, in semantic differential scales, the statement remains constant, while the anchors (adjective pairs) change across items. This website provides definitions of major theoretical constructs employed in health behavior research, and information about the best measures of these constructs. Conceptualization is the mental process by which fuzzy and imprecise constructs (concepts) and their constituent components are defined in concrete and precise terms. All statistical methods are allowed. Examples include simple constructs such as a person’s weight, wind speed, and probably even complex constructs like self-esteem (if we conceptualise self-esteem as consisting of a single dimension, which of course, may be an unrealistic assumption). AU - Schriesheim, Chester A. Should you use an odd or even number of attributes (i.e., do you wish to have neutral or mid-point value)? In this vein, this paper (a) critically reviews the state of construct measurement in organizational strategy research … The median value of each scale item represents the weight to be used for aggregating the items into a composite scale score representing the construct of interest. For example, the temperature scale (in Fahrenheit or Celsius), where the difference between 30 and 40 degree Fahrenheit is the same as that between 80 and 90 degree Fahrenheit. Quantitative analysis: Inferential statistics. As in the Likert scale, the overall scale score may be a summation of individual item scores. Each of these methods are discussed next. There are customary methods for defining and measuring constructs. For example, if religiosity is defined as a construct that measures how religious a person is, then attending religious services may be a reflective indicator of religiosity. Quantitative analysis: Descriptive statistics, 15. We now have a scale which looks like a ruler, with one item or statement at each of the 11 points on the ruler (and weighted as such). Unidimensional scaling methods were developed during the first half of the twentieth century and were named after their creators. For example, male and female (or M and F, or 1 and 2) are two levels of the indicator ‘gender’. AU - Scandura, Terri A. Unidimensional constructs are those that are expected to have a single underlying dimension. All measures of central tendencies, including geometric and harmonic means, are allowed for ratio scales, as are ratio measures, such as studentised range or coefficient of variation. The process of creating an index is similar to that of a scale. More formally, scaling is a branch of measurement that involves the construction of measures by associating qualitative judgments about unobservable constructs with quantitative, measurable metric units. Designed by Guttman (1950),[4] the cumulative scaling method is based on Emory Bogardus’ social distance technique, which assumes that people’s willingness to participate in social relations with other people vary in degrees of intensity, and measures that intensity using a list of items arranged from ‘least intense’ to ‘most intense’. Based on this definition, potential scale items are generated to measure this construct. A scalogram analysis is used to examine how closely a set of items corresponds to the idea of cumulativeness. Construct Measurement and Validation Procedures in MIS and Behavioral Research: Integrating New and Existing Techniques. If an employment status item is modified to allow for more than two possible values (e.g., unemployed, full-time, part-time, and retired), it is no longer binary, but still remains a nominal scaled item. For instance, the word ‘prejudice’ conjures a certain image in our mind, however, we may struggle if we were asked to define exactly what the term meant. Downloadable (with restrictions)! In this module, it will be assumed that all measures have an acceptable level of reliability and validity. In this work, we document the state of the art of measurement in strategic management research, and discuss the implications for interpreting the results of research in this field. Binary scales are nominal scales consisting of binary items that assume one of two possible values, such as yes or no, true or false, and so on. Social Science Research: Principles, Methods, and Practices. Some argue that the sophistication of the scaling methodology makes scales different from indexes, while others suggest that indexing methodology can be equally sophisticated. However, there may be a few exceptions, as shown in Table 6.6, and hence the scale is not entirely cumulative. For instance, if religiosity is defined as composing of a belief dimension, a devotional dimension, and a ritual dimension, then indicators chosen to measure each of these different dimensions will be considered formative indicators. Constructs are considered latent variable because they cannot be directly observable or measured. Based on this definition, potential scale items are generated to measure this construct. Likert method assumes equal weights for all items, and hence, respondent’s responses to each item can be summed to create a composite score for that respondent. Designed by Rensis Likert, this is a very popular rating scale for measuring ordinal data in social science research. A classic example in the natural sciences is Moh’s scale of mineral hardness, which characterizes the hardness of various minerals by their ability to scratch other minerals. In closing, scale (or index) construction in social science research is a complex process involving several key decisions. However, note that the numbers are only labels associated with respondents’ personal evaluation of their own satisfaction, and the underlying variable (satisfaction) is still qualitative even though we represented it in a quantitative manner. Each of the underlying dimensions in this case must be measured separately—for example, using different tests for mathematical and verbal ability—and the two scores can be combined, possibly in a weighted manner, to create an overall value for the academic aptitude construct. While some constructs in social science research, such as a person’s age, weight, or a firm’s size, may be easy to measure, other constructs, such as creativity, prejudice, or alienation, may be considerably harder to measure. Designed by Guttman (1950), the cumulative scaling method is based on Emory Bogardus’ social distance technique, which assumes that people’s willingness to participate in social relations with other people vary in degrees of intensity, and measures that intensity using a list of items arranged from “least intense” to “most intense”. Judges may include academics trained in the process of instrument construction or a random sample of respondents of interest (i.e., people who are familiar with the phenomenon). The process of creating the indicators is called scaling. The outcome of a scaling process is a scale , which is an empirical structure for measuring items or indicators of a given construct. Like previous scaling methods, the Guttman method also starts with a clear definition of the construct of interest, and then uses experts to develop a large set of candidate items. For instance, if an unobservable theoretical construct such as socioeconomic status is defined as the level of family income, it can be operationalized using an indicator that asks respondents the question: what is your annual family income? Rather, scaling is the formal process of developing scale items, before rating scales can be attached to those items. Binary scales are nominal scales consisting of binary items that assume one of two possible values, such as yes or no, true or false, and so on. Entrepreneurship Theory and Practice 2001 25: 4, 101-113 Download Citation. In the classical model of test validity, construct validity is one of three main types of validity evidence, alongside content validity and criterion validity. Even if we assign unique numbers to each value, for instance 1 for male and 2 for female, the numbers don’t really mean anything (i.e., 1 is not less than or half of 2) and could have been easily been represented non-numerically, such as M for male and F for female. In most research methods texts, construct validity is presented in the section on measurement. The initial pool of candidate items (ideally 80 to 100 items) should be worded in a similar manner, for instance, by framing them as statements to which respondents may agree or disagree (and not as questions or other things). For instance, is “compassion” the same thing as “empathy” or “sentimentality”? Given the high level of subjectivity and imprecision inherent in social science constructs, we tend to measure most of those constructs (except a few demographic constructs such as age, gender, education, and income) using multiple indicators. Answering all of these questions is the key to measuring the prejudice construct correctly. Unlike scales or indexes, typologies are multi-dimensional but include only nominal variables. The first decision to be made in operationalizing a construct is to decide on what is the intended level of measurement. For example, a firm of size zero means that it has no employees or revenues. The CPI is a measure of how much consumers have to pay for goods and services in general, and is divided into eight major categories (food and beverages, housing, apparel, transportation, healthcare, recreation, education and communication, and “other goods and services”), which are further subdivided into more than 200 smaller items. If you have a proposition stating that “compassion is positively related to empathy”, you cannot test that proposition unless you can conceptually separate empathy from compassion and then empirically measure these two very similar constructs correctly. Construct measurement and validation procedures in MIS and behavioral research: Integrating new and existing techniques. In the end, researcher’s’ judgment may be used to obtain a relatively small (say 10 to 15) set of items that have high item-to-total correlations and high discrimination (i.e., high -values). ATTRIBUTES of objects. MIS Quarterly, 35(2), 293-334, 2011. As an example, the construct “attitude toward immigrants” can be measured using five items shown in Table 6.5. These constructs can be measured using a single measure or test. Likert items allow for more granularity (more finely tuned response) than binary items, including whether respondents are neutral to the statement. In most research methods texts, construct validity is presented in the section on measurement. However, the scale does not indicate the actual hardness of these minerals or even provides a relative assessment of their hardness. The previous section discussed how we can measure respondents’ responses to predesigned items or indicators belonging to an underlying construct. A typical example of a six-item Likert scale for the “employment self-esteem” construct is shown in Table 6.3. Permissible statistics are chi-square and frequency distribution, and only a one-to-one (equality) transformation is allowed (e.g., 1 = Male, 2 = Female). Judges may include academics trained in the process of instrument construction or a random sample of respondents of interest (i.e., people who are familiar with the phenomenon). However, note that the numbers are only labels associated with respondents’ personal evaluation of their own satisfaction, and the underlying variable (satisfaction) is still qualitative even though we represented it in a quantitative manner. A key characteristic of a Likert scale is that even though the statements vary in different items or indicators, the anchors (“strongly disagree” to “strongly agree”) remain the same. This index is a combination of three constructs: income, education, and occupation. Income is measured in dollars, education in years or degrees achieved, and occupation is classified into categories or levels by status. Y1 - 1993/4. Unlike scales or indexes, typologies are multidimensional but include only nominal variables. Latent construct example. In the context of survey research, a construct is the abstract idea, underlying theme, or subject matter that one wishes to measure using survey questions. Semantic differential scale. The three approaches are similar in many respects, with the key differences being the rating of the scale items by judges and the statistical methods used to select the final items. Note that any item with reversed meaning from the original direction of the construct must be reverse coded (i.e., 1 becomes a 5, 2 becomes a 4, and so forth) before summating. Another example of index is socio-economic status (SES), also called the Duncan socio-economic index (SEI). This method starts with a clear conceptual definition of the construct of interest. I see construct validity as the overarching quality with all of the other measurement … As in the Likert scale, the overall scale score may be a summation of individual item scores. For instance, there may be certain tribes in the world who lack prejudice and who cannot even imagine what this concept entails. In almost all cases elementary definitional operations are performed. Hence, this method is called a summated scale. Binary scales. Based on the four generic types of scales discussed above, we can create specific rating scales for social science research. Using a complicated weighting scheme that takes into account the location and probability of purchase of each item, these prices are combined by analysts, which are then combined into an overall index score using a series of formulas and rules. AU - Powers, Kathleen J. Suppose a researcher is interested in measuring subjects' degrees of extraversion with a survey. A well-known example of an index is the consumer price index (CPI), which is computed every month by the Bureau of Labor Statistics of the U.S. Department of Labor. are equidistant from each other. Most measurement in the natural sciences and engineering, such as mass, incline of a plane, and electric charge, employ ratio scales, as are some social science variables such as age, tenure in an organization, and firm size (measured as employee count or gross revenues). For instance, if we conceptualize a person’s academic aptitude as consisting of two dimensions – mathematical and verbal ability – then academic aptitude is a multidimensional construct. The statistical properties of these scales are shown in Table 6.1. A key characteristic of a Likert scale is that even though the statements vary in different items or indicators, the anchors (‘strongly disagree’ to ‘strongly agree’) remain the same. Ordinal scales can also use attribute labels (anchors) such as “bad”, “medium”, and “good”, or “strongly dissatisfied”, “somewhat dissatisfied”, “neutral”, or “somewhat satisfied”, and “strongly satisfied”. How do you wish to label the scale attributes (especially for semantic differential scales)? Consider for example a recent attempt to operationalize the involvement construct (Zaichkowsky 1985). The IQ scale is also an interval scale, because the scale is designed such that the difference between IQ scores 100 and 110 is supposed to be the same as between 110 and 120—although we do not really know whether that is truly the case. A multi-dimensional typology of newspapers. For instance, we often use the word “prejudice” and the word conjures a certain image in our mind; however, we may struggle if we were asked to define exactly what the term meant. High quality quantitative dissertations are able to clearly bring together theory, constructs and variables.Broadly speaking, constructs are the building blocks of theories, helping to explain how and why certain phenomena behave the way that they do. Thurstone also created two additional methods of building unidimensional scales – the method of successive intervals and the method of paired comparisons – which are both very similar to the method of equal-appearing intervals, except for how judges are asked to rate the data. Far too often do management scholars resort to crude and often inappropriate measures of fundamental constructs in their research; an approach which calls in question the interpretation and validity of their findings. To understand how these items were derived, refer to the ‘Scaling’ section later on in this chapter. Income is measured in dollars, education in years or degrees achieved, and occupation is classified into categories or levels by status. These three approaches are similar in many respects, with the key differences being the rating of the scale items by judges and the statistical methods used to select the final items. A rating scale is used to capture the respondents’ reactions to a given item, for instance, such as a nominal scaled item captures a yes/no reaction and an interval scaled item captures a value between “strongly disagree” to “strongly agree.” Attaching a rating scale to a statement or instrument is not scaling. Be both creative and precise! Binary scales can also employ other values, such as male or female for gender, full-time or part-time for employment status, and so forth. For any conceptual definition of a construct, there will be many different operational definitions or ways of measuring it. The Kelvin temperature scale is also a ratio scale, in contrast to the Fahrenheit or Celsius scales, because the zero point on this scale (equalling -273.15 degree Celsius) is not an arbitrary value but represents a state where the particles of matter at this temperature have zero kinetic energy. Hence, the name paired comparison method. Hence, statistical analyses may involve percentiles and non-parametric analysis, but more sophisticated techniques such as correlation, regression, and analysis of variance, are not appropriate. For each item, compute the median and inter-quartile range (the difference between the 75 th and the 25 th percentile – a measure of dispersion), which are plotted on a histogram, as shown in Figure 6.1. Permissible statistical analyses include all of those allowed for nominal and ordinal scales, plus correlation, regression, analysis of variance, and so on. However, SES index measurement has generated a lot of controversy and disagreement among researchers. Note that the satisfaction scale discussed earlier is not strictly an interval scale, because we cannot say whether the difference between “strongly satisfied” and “somewhat satisfied” is the same as that between “neutral” and “somewhat satisfied” or between “somewhat dissatisfied” and “strongly dissatisfied”. Likert scale. The first decision to be made in operationalising a construct is to decide on the intended level of measurement. The process of specifying the observable instances of a construct is often referred to as providing an operational definition for a construct. Since most scales employed in social science research are unidimensional, we will next three examine approaches for creating unidimensional scales. Construct validity is most important which tells us whether we are able to correctly measure what we are supposed to measure. For instance, the operational definition of a construct such as temperature must specify whether we plan to measure temperature in Celsius, Fahrenheit, or Kelvin scale. Some argue that the sophistication of the scaling methodology makes scales different from indexes, while others suggest that indexing methodology can be equally sophisticated. Permissible statistics are chi-square and frequency distribution, and only a one-to-one (equality) transformation is allowed (e.g., 1=Male, 2=Female). Quantitative data can be analysed using quantitative data analysis techniques, such as regression or structural equation modelling, while qualitative data requires qualitative data analysis techniques, such as coding. However, in semantic differential scales, the statement remains constant, while the anchors (adjective pairs) change across items. However, the scale does not indicate the actual hardness of these minerals, or even provide a relative assessment of their hardness. Attaching a rating scale to a statement or instrument is not scaling. If someone says bad things about other racial groups, is that racial prejudice? Enthusiasm 2. Ordinal scales are those that measure rank-ordered data, such as the ranking of students in a class as first, second, third, and so forth, based on their grade point average or test scores. An index is a composite score derived from aggregating measures of multiple constructs (called components) using a set of rules and formulas. Interval scales are those where the values measured are not only rank-ordered, but are also equidistant from adjacent attributes. In practice, we seldom find a set of items that matches this cumulative pattern perfectly. Other less common scales are not discussed here. If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. For instance, is ‘compassion’ the same thing as ‘empathy’ or ‘sentimentality’? These very different measures are combined to create an overall SES index score, using a weighted combination of ‘occupational education’ (percentage of people in that occupation who had one or more year of university education) and ‘occupational income’ (percentage of people in that occupation who earned more than a specific annual income). If women earn less than men for the same job, is that gender prejudice? Second, indexes often combine objectively measurable values such as prices or income, while scales are designed to assess subjective or judgmental constructs such as attitude, prejudice, or self-esteem. This index is a combination of three constructs: income, education, and occupation. A formative indicator is a measure that “forms” or contributes to an underlying construct. The Likert method assumes equal weights for all items, and hence, a respondent’s responses to each item can be summated to create a composite score for that respondent. Construct measurement in management research. A 20 item measure is proposed. The previous section discussed how to measure respondents’ responses to predesigned items or indicators belonging to an underlying construct. Finally, what procedure would you use to generate the scale items (e.g., Thurstone, Likert, or Guttman method) or index components? Construct Measurement in Organizational Strategy Research: A Critique and Proposal N. VENKATRAMAN Massachusetts Institute of Technology JOHN H. GRANT University of Pittsburgh Strategic management researchers have emphasized concept devel-opment but generally have ignored construct measurement issues. The conceptualization process is all the more important because of the imprecision, vagueness, and ambiguity of many social science constructs. It involves the operation to construct variables, and the development and application of instruments or tests to quantify these variables [Kimberlin & Winterstein, 2008]. Examples include simple constructs such as a person’s weight, wind speed, and probably even complex constructs like self-esteem (if we conceptualize self-esteem as consisting of a single dimension, which of course, may be a unrealistic assumption). Indicators may be reflective or formative. If you have a proposition stating that ‘compassion is positively related to empathy’, you cannot test that proposition unless you can conceptually separate empathy from compassion and then empirically measure these two very similar constructs correctly. There are two major issues that will be considered here. For instance, the operational definition of a construct such as temperature must specify whether we plan to measure temperature in Celsius, Fahrenheit, or Kelvin scale. However, instead of relying entirely on statistical analysis for item selection, a better strategy may be to examine the candidate items at each level and selecting the statement that is the most clear and makes the most sense. Testing theories (i.e., theoretical propositions) require measuring these constructs accurately, correctly, and in a scientific manner, before the strength of their relationships can be tested. Louis Thurstone. Construct validity is "the degree to which a test measures what it claims, or purports, to be measuring." Nevertheless, indexes and scales are both essential tools in social science research. Note that the satisfaction scale discussed earlier is not strictly an interval scale, because we cannot say whether the difference between ‘strongly satisfied’ and ‘somewhat satisfied” is the same as that between ‘neutral’ and ‘somewhat satisfied’ or between ‘somewhat dissatisfied’ and ‘strongly dissatisfied. The appropriate measure of central tendency of a nominal scale is mode, and neither the mean nor the median can be defined. The process of understanding what is included and what is excluded in the concept of prejudice is the conceptualisation process. Given the high level of subjectivity and imprecision inherent in social science constructs, we tend to measure most of those constructs (except a few demographic constructs such as age, gender, education, and income) using multiple indicators. Some of these decisions are: Should you use a scale, index, or typology? For example, a firm of size 10 employees is double that of a firm of size 5, and the same can be said for a firm of 10,000 employees relative to a different firm of 5,000 employees. The three most popular unidimensional scaling methods are: (1) Thurstone’s equal-appearing scaling, (2) Likert’s summative scaling, and (3) Guttman’s cumulative scaling. If someone says bad things about other racial groups, is that racial prejudice? The process of creating an index is similar to that of a scale. measures for construct measurement in management research Conceptual issues and application guidelines Christoph Fuchs Adamantios Diamantopoulos Die Verwendung von Single-Item Messinstrumen-ten ist Gegenstand vermehrter Diskussionen in der aktuellen betriebswirtschaftlichen Forschung. One important decision in conceptualising constructs is specifying whether they are unidimensional or multidimensional. Scales can be unidimensional or multidimensional, based on whether the underlying construct is unidimensional (e.g., weight, wind speed, firm size) or multidimensional (e.g., academic aptitude, intelligence). These items are generated by experts who know something about the construct being measured. Responses are obtained on a seven point … Second, operationalize and measure each component. Our definition of such constructs is not based on any objective criterion, but rather on a shared (“inter-subjective”) agreement between our mental images (conceptions) of these constructs. While defining constructs such as prejudice or compassion, we must understand that sometimes, these constructs are not real or can exist independently, but are simply imaginary creations in our mind. Likert scales are ordinal scales because the anchors are not necessarily equidistant, even though sometimes we treat them like interval scales. I think Construct validity is close to the concept of sensitivity. This resource is designed for health behavior researchers in public health, health communications, nursing, psychology, and related fields. These items are generated by experts who know something about the construct being measured. Are there different levels of prejudice, such as high or low? Measurement refers to careful, deliberate observations of the real world and is the essence of empirical research. How many scale attributes should you use (e.g., 1–10; 1–7; −3 to +3)? are equidistant from each other. This can be done by grouping items with a common median, and then selecting the item with the smallest inter-quartile range within each median group. Third, create a rule or formula for calculating the index score. The intelligence quotient (IQ) scale is also an interval scale, because the scale is designed such that the difference between IQ scores 100 and 110 is supposed to be the same as between 110 and 120 (although we do not really know whether that is truly the case). On the theory of scales of measurement. All measures of central tendencies, including geometric and harmonic means, are allowed for ratio scales, as are ratio measures, such as studentized range or coefficient of variation. This process of measuring abstract concepts in concrete terms remains one of the most difficult tasks in empirical social science research. For instance, a ‘gender’ variable may have two attributes: male or female. grasping the inextricable link between scale validity and effective research [Thompson, 2003]. How many scale attributes should you use (e.g., 1 to 10; 1 to 7; −3 to +3)? Each item in the above Guttman scale has a weight (not indicated above) which varies with the intensity of that item, and the weighted combination of each response is used as aggregate measure of an observation. And, it is typically presented as one of many different types of validity (e.g., face validity, predictive validity, concurrent validity) that you might want to be sure your measures have. The process of regarding mental constructs as real is called reification , which is central to defining constructs and identifying measurable variables for measuring them. For instance, the word ‘prejudice’ conjures a certain image in our mind, however, we may struggle if we were asked to define exactly what the term meant. Each item in this scale is a binary item, and the total number of ‘yes’ indicated by a respondent (a value from zero to six) can be used as an overall measure of that person’s political activism. These items are then rated by judges on a 1 to 5 (or 1 to 7) rating scale as follows: 1 for strongly disagree with the concept, 2 for somewhat disagree with the concept, 3 for undecided, 4 for somewhat agree with the concept, and 5 for strongly agree with the concept. How do you wish to label the scale attributes (especially for semantic differential scales)? Though this appears simple, there may be a lot of disagreement among judges on what components (constructs) should be included or excluded from an index. Thurstone, L. L. (1925) A method of scaling psychological and educational tests. Levels of measurement , also called rating scales , refer to the values that an indicator can take (but says nothing about the indicator itself). Notice that in Likert scales, the statement changes but the anchors remain the same across items. As with Thurstone’s method, the Likert method also starts with a clear definition of the construct of interest, and using a set of experts to generate about 80 to 100 potential scale items. The central tendency measure of an ordinal scale can be its median or mode, and means are uninterpretable. Likewise, if you have a scale that asks respondents’ annual income using the following attributes (ranges): $0–10,000, $10,000–20,000, $20,000–30,000, and so forth, this is also an interval scale, because the mid-point of each range (i.e., $5,000, $15,000, $25,000, etc.) Nominal scales merely offer names or labels for different attribute values. Interval scale allows us to examine “how much more” is one attribute when compared to another, which is not possible with nominal or ordinal scales. Lastly, validate the index score using existing or new data. This process allows us to examine the closeness amongst these indicators as an assessment of their accuracy (reliability). A construct is an abstract idea inferred from specific instances that are thought to be related. In this chapter, we will examine the related processes of conceptualization and operationalization for creating measures of such constructs. Constructs: Constructs are measured with multiple variables. Interval scales allow us to examine ‘how much more’ is one attribute when compared to another, which is not possible with nominal or ordinal scales. In his seminal article titled ‘On the theory of scales of measurement’ published in Science in 1946,[1] psychologist Stanley Smith Stevens defined four generic types of rating scales for scientific measurements: nominal, ordinal, interval, and ratio scales. But how do we create the indicators themselves? This can be done by grouping items with a common median, and then selecting the item with the smallest inter-quartile range within each median group. Construct validity is "the degree to which a test measures what it claims, or purports, to be measuring." These very different measures are combined to create an overall SES index score, using a weighted combination of “occupational education” (percentage of people in that occupation who had one or more year of college education) and “occupational income” (percentage of people in that occupation who earned more than a specific annual income). However, scales typically involve a set of similar items that use the same rating scale (such as a five-point Likert scale). In the context of survey research, a construct is the abstract idea, underlying theme, or subject matter that one wishes to measure using survey questions. This is a composite (multi-item) scale where respondents are asked to indicate their opinions or feelings toward a single statement using different pairs of adjectives framed as polar opposites. How will you rate your opinions on the following statements about immigrants? Examples include gender (two values: male or female), industry type (manufacturing, financial, agriculture, etc. This is particularly the case with many social science constructs such as self-esteem, which are assumed to have a single dimension going from low to high. Louis Thurstone—one of the earliest and most famous scaling theorists—published a method of equal-appearing intervals in 1925. Permissible statistical analyses include all of those allowed for nominal and ordinal scales, plus correlation, regression, analysis of variance, and so on. The statistical properties of these scales are shown in Table 6.1. The outcome of a scaling process is a scale, which is an empirical structure for measuring items or indicators of a given construct. A formative indicator is a measure that ‘forms’ or contributes to an underlying construct. Hence, the name paired comparison method. As an example, the construct ‘attitude toward immigrants’ can be measured using five items shown in Table 6.5. Once a theoretical construct is defined, exactly how do we measure it? Management and Organizations; Research output: Contribution to journal › Review article › peer-review. A rating scale is used to capture the respondents’ reactions to a given item—for example, a nominal scaled item captures a yes/no reaction—,and an ordinal scaled item captures a value between ‘strongly disagree’ and ‘strongly agree’. The Likert method, a unidimensional scaling method developed by Murphy and Likert (1938),[3] is quite possibly the most popular of the three scaling approaches described in this chapter. The resulting matrix will resemble Table 6.6. Note that some of these scales may include multiple items, but all of these items attempt to measure the same underlying dimension. Following this rating, specific items can be selected for the final scale can be selected in one of several ways: (1) by computing bivariate correlations between judges rating of each item and the total item (created by summing all individual items for each respondent), and throwing out items with low (e.g., less than 0.60) item-to-total correlations, or (2) by averaging the rating for each item for the top quartile and the bottom quartile of judges, doing a t-test for the difference in means, and selecting items that have high t-values (i.e., those that discriminates best between the top and bottom quartile responses). Though this appears simple, there may be a lot of disagreement among judges on what components (constructs) should be included or excluded from an index. For instance, in the SES index, if income is correlated with education and occupation, should we include one component only or all three components? Reviewing the literature, using theories, and/or interviewing experts or key stakeholders may help resolve this issue. research or about the speciﬁc criteria that should be used to distinguish between formative and reﬂective indicator constructs.