Measuring General Aviation Pilot Judgment Using a Situational Judgment Technique

Creation of a Situational Judgement Test for general aviation pilots

Hunter (2003) reported on two studies that were conducted to develop and evaluate a situational judgement test for general aviation pilots. This article summarises and reanalyses the results to offer a clear insight into the findings.

General aviation pilot testing

The first test took place with pen and paper style tests being sent out to randomly selected participants who were then to answer the 51 questions and return them. The test had been run past high ranking pilots and industry officials to determine the scoring options for each of the four answers per question allowing for a correct answer and then ordering of the next most desired answer. The results of the first test show the tested scale has acceptable reliability as measured by coefficient alpha and a normal distribution that centres around a mean score close to 50%. See table below:

Coefficient alpha Mean SD Range Mean % Distribution
Test 1 .753 27.2 6.0 6-44 53.3% Normal

The second test took place with an online survey on an FAA sponsored website. The same 51 question test was used with the same scoring calculations. The results of this test were again analysed by the same means but this time returning a coefficient alpha of 0.703. Twelve items were indentified with a point-biserial correlation less than .20 and were subsequently eliminated leaving a 39 item scale. The reulting figures validated the refined test as also having accepatable reliability. See table below:

Coefficient alpha Mean SD Mean % Distribution
Test 2 .747 23.5 5.32 60.2 Normal

To show its validity the study also looked into comparitive scores across different licence holders. A direct corelation was found when comparing the higher scores to higher levels of experience shown in the table below.

SJT score comparison for different level of licence holder

Pilot Type Mean SD
Sudent 22.56 5.51
Private 23.70 5.13
Commercial 24.36 5.59

Comparison of paper and pencil test and Internet based test

To assess the comparison between the two tests the study looked at the means, varience and correlations. In esence they should be equal across parallel tests and due to the second test only having 39 of the original 51 questions test 1 had, the comparisson was done on these 39 questions. As per the table below, none of the comparisons were statistically significant.

Study 1 (N = 246) Study 2 (N = 253) Statistic p
Mean 23.2 23.5 T = 0.5192 0.519
Variance 25.715 28.302 F = 1.1006 0.390
r (age) .071 .100 Z = 0.2664 0.791
r (total time) -.082 .002 Z = 1.037 0.299


The results of the study into the usefullness of Situational Judgement Tests shows that there are acceptable psychometric properties, in terms of internal consistency and distribution of scores, and its construct validity is supported through its significant correlation with higher level training and experience.


Research approach

Exploratory research into the creation of a situational judgement test for general aviation students.


For study 1 a random sample of 1000 private pilots were selected from FAA records from Eastern, Northwest and Southwest FAA mountain regions of which 246 usable responses were received and used for the study. The mean age of participants was 47 (SD=13) with the mean total flying time being 750 hours (SD=1054) and the median total flying time being 400 hours. 96% of the sample were male with 98% of the sample holding private licences and the remaining 2% holding commercial licences but with comparable flight hours.

For study 2 a sample of convenience were recruited via an FAA sponsored website which totalled 467 pilots who completed the study over a six month period. The mean age of participants (of 124 that answered this question) was 45.2 (SD=13), the mean total flight time (of 425 that answered this question) was 591 hours (SD=1202) and median total flight time being 210 hours. 253 participants (making up 56%) held a private licence and were used for the final study.


Study 1 was administered in paper and pencil format to the selected sample. In study 2 the test was administered to volontary participants over the Internet. Both tests were constructed from the same 51 questions that were drawn from a review of accident casual factors and from anecdotes about critical events provided by general aviation pilots. Five content areas were identified with the need to make an immediate safety related decision by the pilot as follows:

1. Weather phenomena
2. Mechanical malfuctions
3. Biological crises - e.g. sick pilot or passenger
4. Social influenes - e.g. passenger requests
5. Organisation - Employer or air traffic control requests

Each question and a scenario statement establishing the scene with all nesacary information and the situation which demanded the immediate action. Four alternative solutions were written which were all plausable and realistic and differed in their degree of risk. The questions and answers were reviewed by a group of senior pilots and flight instructors who edited the items and narrowed down the questions leaving only those appropriate to general aviation pilots. The questions and answeres were then put to another group this time consisting of subject matter experts and senior flight instructors who ranked the risk factor of the answers according to what they would recomend to a private pilot with approximately 500 total flight hours.


The study uses a situational judgement technique (SJT) measurement method to asses the judgment of general aviation pilots. A pilot judgment test was administered in both study 1 and study 2 (the test being refined and its construct validity assessed from study 1). The 51 questions for both test 1 and test 2 were placed in random order with detailed instructions on how the participant was to answer. For each question the participant was told to select, in order, what course of action they would use first, second, third and forth.

Generalization potential

The sample only covered pilots from mountainous regions of the US and was done largely on a volountry basis. Secondly the data given is based upon self report and therefore may be inaccurate due to respondent forgetfulness, bias or misinterpretation of questions.

1. HUNTER David R (2003) Measuring General Aviation Pilot Judgment Using a Situational Judgment Technique. TheInternational Journal of Aviation Psychology, 2003, 13(4), 373–386.

Contributors to this page

Authors / Editors

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License