User-friendliness is extremely important for web questionnaires. The more user-friendly a web questionnaire is, the higher the chance a respondent will take the questionnaire seriously and complete it.
A short questionnaire increases the chances of all the questions being answered. This would imply that the answers should be as compact as possible (by using drop-down menus instead of radio buttons, double columns of answers etc.). According to e.g. Dillman (see Dillman’s standard work, 1999) a high user-friendliness goes hand in hand with a lot of white space, which actually results in long lists of questions. Does making the question compact have a negative effect on the quality of the response?
In this document the results are presented of a study carried out by the Technical University of Eindhoven. For this survey, members of the PanelClix panel were approached. The web questionnaire was programmed by Isiz. Subject of the research was whether the lay-out of the questions affects the quality of the results. The following variations were applied.
For inexperienced respondents it does make a difference whether the answers are given in the form of radio buttons or offered as drop-down menus. There are also significant differences in results when scaled questions are offered as radio buttons as opposed to sliding bars (or sliders). It makes a difference whether an answer is in the first column or in the second.
Both radio buttons and drop-down menus are often used in online surveys. Both controls can be used for simple multiple-choice questions. In this survey, we looked at the differences in answering patterns between radio buttons and drop-down menus, to find out whether they yield different answering patterns.
Because radio buttons take up more space than drop-down menus, various web questionnaires prefer to use drop-down menus. But is this a wise decision? Does it improve the data quality?
One could suggest that radio buttons are easier to fill in, but because drop-down menus are a well-know Windows control element, one would also expect that there is not much difference. The following figure shows an example of answer categories using drop-down menus or radio buttons. The important difference – which cannot be seen in the figure – is that with a drop-down menu there is no (or at the most one) visible answering category. The other categories only become visible when the respondent clicks on the arrow.
In this survey, 2 difference types of question methods were tested. These are simple multiple-choice questions (simple lay-out) and a battery question with various simple multiple-choice questions (complex lay-out).
For the simple lay-out, four questions were asked on Internet purchases. Respondents were asked about buying and selling behaviour, paying via the Internet and Internet fraud.
One of the possible questions was:
How often have you bought products or services via an online auction or shopping site, such as eBay, Marktplaats, ViaVia etc.
Complex lay-out (battery question)
For the complex lay-out (battery question), the ‘position generator’ was used. This is a method often used in social sciences to map out the respondent’s ‘social network’ (Lin & Dumin, 1986; van der Gaag & Snijders, 2003). The question measured the number of contacts/acquaintances in various occupational groups. For a list (battery) of 30 occupations, the respondent was asked if they knew someone with that occupation, and how they were related to that person. Possible answers were: 0= No, 1= Yes, an acquaintance, 2=Yes, a friend, 3=Yes, a family member. A higher average score for all occupations indicated that this person has a more extensive social network.
frequency table shows clear differences between the various variations offered to the respondents. We are showing the results for one of our four questions here:
A Chi2-test (a statistical test to calculate whether two or more population distributions differ from each other) shows that we have to reject the fact that these four conditions lead to the same answers (p=0.001). Not only is the p-value low for this question, this is also the case for the three other simple lay-out questions (p=0.035/0.009/0.001).
If we look at the direction of the differences, we can see that the lowest score (1.97 average) is yielded by the drop-down menu, followed by the radio button (2.07), the reversed drop-down menu (2.15) and the reversed radio button (2.25). In other words: For both radio buttons and drop-down menus the categories at the top are selected more often (this has a bigger impact than whether or not a radio button is used). In addition, when a drop-down menu is used, the top answer categories are selected more often than when a radio button is used.
To get a feel of the size of the differences: If we look at the category ‘Never’, we can see that this category appears with the drop-down menu 1.5 to 2 % more often. This is, however, slightly too little to be statistically significant in a sample of this size. If the answer category ‘Never’ is at the top, it is clicked 4 to 5% more often (compared to at the bottom).
Within the group of respondents (N=756 to N=838 per variation), a subgroup can be defined which is mainly responsible for this deviation: The respondents who have little experience in answering PanelClix questionnaires. The expectation is that this is also a group of people who, on average, have less computer experience, but we have not compared that here. Among those who have participated in online surveys by PanelClix more than 4 times in the last 12 months (which are less than half of all participants), the differences are smaller. The difference between the answers in the ‘Never’ category actually completely disappears!
When analyzing the battery question, on first glance there seems to be no difference between the conditions. The average scores for the questions are nearly the same:
|Drop-down per row:||Reliability = 1.70 (N=362)|
|Radio button per row:||Reliability = 1.82 (N=575|
Maar indien er betrouwbaarheidswaardes voor beide condities worden uitgerekend (Cronbach’s Alpha), zijn er wel verschillen:
|Dropdown per regel:||Betrouwbaarheid = 1,70 (N=362)|
|Radiobutton per regel:||Betrouwbaarheid = 1,82 (N=575)|
he conclusion is that measuring using radio buttons give more reliable results. Even if we only look at the Yes/No answers and do not make a distinction of how well they know a person, the findings remain valid.
|Drop-down per row:||Average = 1.50 Reliability = 0.78|
|Radio button per row:||Average = 1.50 Reliability = 0.87|
Based on the previous analyses, the following conclusions can be made:
With proportionate, interval and ordinal scales, a deviation of ‘top answers’ can be counteracted by varying the answers in ascending order and alternately per respondent. The software can then decide randomly whether the respondent sees the ascending or descending version of the question. This ensures that the deviation can be ‘averaged’ over all respondents.
With nominal scales, the answers can be shown completely randomly. Alphabetic lists are often preferred, because it makes it easier for respondents to find the answers. With a relatively small number of answer categories, this issue is less important, and randomization is actually recommended.
With battery questions, the preference is to show the answers using radio buttons over drop-down menus. Cronbach’s Alpha value is significantly higher (and the answers are more reliable).
When presenting scaled questions (e.g. very bad - very good) online surveys often make use of radio buttons. Some survey software also supports the use of sliders. By means of a slider, the respondent can drag a marker from one side of the scale to the other side (very bad to very good in this example). This slider is technically more challenging, and can give problems in certain browsers. Radio buttons are easier to implement in web questionnaires, but can the survey show that there is a clear preference for either one or the other? And is there a difference in answers to scaled questions when radio buttons or sliders are used?
A slider gives a much more ‘accurate’ result than a radio button. On a 5-point scale of radio buttons, the answer can be 1-5. A slider can easily give a score between 0 and 100. Strictly speaking, one could expect that if the results of a slider are coded back to a 5-point scale, it would give the same answers. The following figure shows an example of such a slider.
In our survey, the implementation of the slider was done as follows. The slider was always shown with the marker placed to the left. Respondents always had to at least click on the slider, otherwise there question would count as unanswered. After they clicked on the slider, respondents could choose to drag it to the right (and back) and select a position along the continuum.
In the survey, 16 similar questions were asked, to find out whether there was a difference between scaled questions using a slider or a radio button. In this white paper one – for reasons of legibility – of these 16 questions will be highlighted.
3006 respondents were asked the following question:
How do you judge the Dutch society with regard to the reaction of the Dutch government on the Iraqi issue? This question and the other 15 questions were on international issues and are from a survey done by the Sociaal Cultureel Planbureau. The 16 questions were subdivided in two battery questions of 8 statements. For the respondents, these were questions 9 and 14 from a longer survey.
In total 1507 respondents answered the previous question using radio buttons (very bad, bad, neutral, good, very good). In total 1499 respondents answered this question using a slider. The slider also gave these five answers, but the respondent had the opportunity to move the slider in between the answers. The scale went from 0 to 100.
Below is a frequency table of the radio button answers:
If the slider answers are coded back to a 5-point scale, this gives the following table:
Instead of 13% and 1% in the extreme categories, we now get 29% and 4%. This is no coincidence. If we compare the other 15 questions in a similar manner, it is possible for all 16 cases, on a reasonable level of significance, to reject the idea that these two question methods yield the same results.
The battery questions do not offer a very suitable method to calculate a scale value, because they do not properly measure the same underlying issue. If we do this anyway, we can see a slightly higher scale value for the slider. 0.65 for the radio buttons versus 0.71 for the sliders.
Sliders have several obvious disadvantages: It takes more time to answer the question and they offer a certain ‘pseudo-accuracy’. We cannot assume that someone who answers a question using a slider, has a stronger preference than someone using a radio button. The sliders do, however, show a broader distribution, even if the slider answers are coded back to 5 categories. We cannot make conclusions based on our survey of what the ‘right’ answers are, but for now our preference goes to offering sliders. The categories found have to be coded back to prevent pseudo-accuracy.
In this survey, ‘stepless’ sliders were used. It is also possible to use sliders with steps (or ticks). The respondent is then no longer able to set the marker to a random position, but to one of the pre-set conditions only. The slider has been built in such a way that it has to be moved first to be able to save the value. In other words: If the slider has not been moved, the question will be marked as unanswered. It is unknown what the differences are between stepless sliders and sliders with steps.
If the respondent does not have to scroll when answering a web questionnaire, the chances of quitting early is a lot smaller. This has been shown by several internal abort/exit analyses carried out by Isiz. The lay-out of the questions on the screen is a tricky issue. Even though we know it is very important, it is hard to give general guidelines (see e.g. Dillman’s standard work, 1999). It is important to offer the questions in a clear and structured manner, with a lot of white space, but if this means respondents have to scroll, the medicine may be worse than the disease.
According to this principle, a standard webpage (1024x768) can accommodate 15 answer lines, taking navigation controls, additional design items, toolbars etc. into account. If a question has more than 15 possible answers, quite often the decision is made to divide the answers over two or more columns. But what effect does it have on the answers selected? Does the way in which answers are displayed in multiple columns influence the results? With a small number of answers, one would not expect a difference between whether an answers is in the first or second column. In practice, however, it does seem to make a difference.
In this survey, we looked at the questions with multiple answers. These were questions of the following type:
|What were the reasons why you have been using the Internet over the last 3 months? You can select more than one answer|
|To communicate with friends||To download music|
|To look for information||To download videos|
|To play games||To shop online|
|To communicate professionally||Other reason, namely:________|
Respondents were able to tick as many boxes as they wanted. The answer categories here (and online) are given side by side (2x4) and not as a list.
In the survey, 8 different questions were asked using this lay-out. There were variations in the left and right columns. One respondent would see To communicate with friends – To communicate professionally in the left column, while another respondent would see these in the right column.
he results can be analyzed as follows. Per question, we have in effect 7 answer categories which can be compared; we are excluding ‘Other reason’ to make the comparison easier. There are 8 different questions, which gives a total of 8x7 = 56 answer categories. For these 56 categories, we can compare the number of times they were clicked in the left column versus the right column.
The difference between the left and right column in percent can be seen in the following table:
The difference is quite substantial, but always larger than 0, and the difference on average is 3.5%. The main part of the differences is between 1.5 and 8.5%. In all cases analyzed, we found that an answer in the left column was selected more often than an answer in the right column.
or some applications, a difference of 1.5 to 8.5% might be acceptable; knowing that this deviation can occur, may be a reason for caution. With 8 answers, the answers can be shown as a single list (this option was not researched in this survey).
Alternatively, the answers can also be randomized. With longer lists of e.g. brand names this has a negative effect on the user-friendliness of the questionnaire. With lists where a respondent does not have to search for a specific name and alphabetization is not as necessary, we recommend randomization. This ensures that the preference to select an answer from the left row is averaged.
In this survey, the option to limit the number of answers selected was not used. It might be that if a respondent can only select a limited number of answers, they will think about the selections a bit more. This could then lead to better results. The restriction can be quite broad, because it is mostly the idea that the respondent can only select a limited number of answers.
Dillman, D.A. (1999) Mail and Internet surveys : The tailored design method. New York: Wiley.
Lin, N.; Dumin, M. (1986) Access to occupations through social ties. Social Networks 8: 365-385.
van der Gaag, M. & Snijders, T.A.B. (2003) A comparison of measures for individual social capital. http://www.xs4all.nl/~gaag/work/
The web questionnaire used during the survey can be found here: http://www.isiz.nl/whitepapers.
Simple lay-out – Drop-down menus
Simple lay-out – Radio buttons
Complex lay-out – Battery question – Drop-down menus
Complex lay-out – Battery question – Radio buttons