Are There Groups in My Bivariate
Data (Multi-Modal Distribution of the Residuals)?
Example Data
Data for 2 groups (n = 16 / group). “Good” variables are continuously distributed. “Bad” variables are distributed as discontinuous clumps. For all 4 variables, there were reliable mean differences across the two groups, ps < .001.
|
ID |
Group |
X-good |
Y-good |
X-bad |
Y-bad |
|
1 |
1 |
1.706869103 |
1.653548065 |
35.4431819 |
35.04978001 |
|
2 |
1 |
2.994848769 |
2.621083418 |
76.74094841 |
75.9303925 |
|
3 |
1 |
3.871574995 |
3.949357264 |
69.72873546 |
69.1680618 |
|
4 |
1 |
4.849016166 |
4.399096261 |
54.87897561 |
54.76085723 |
|
5 |
1 |
5.735704311 |
5.563367331 |
93.72286098 |
93.37277626 |
|
6 |
1 |
6.041889223 |
6.605389671 |
39.37971873 |
39.16544205 |
|
7 |
1 |
7.743283518 |
7.998757631 |
39.15042014 |
39.01157035 |
|
8 |
1 |
8.01002315 |
8.648685928 |
30.97638529 |
30.88125096 |
|
9 |
1 |
9.955827228 |
9.390534795 |
19.10974437 |
19.89232355 |
|
10 |
1 |
10.80471514 |
10.3187547 |
97.85356376 |
97.43721588 |
|
11 |
1 |
11.70123165 |
11.00740773 |
2.241225011 |
2.410262939 |
|
12 |
1 |
12.9626271 |
12.17204018 |
16.53634847 |
16.69128277 |
|
13 |
1 |
13.81794477 |
13.24690849 |
55.43216023 |
55.4163467 |
|
14 |
1 |
14.09956246 |
14.01424398 |
61.28500307 |
61.93372736 |
|
15 |
1 |
15.10141231 |
15.18160767 |
90.48044169 |
90.0038368 |
|
16 |
1 |
16.78028021 |
16.34330535 |
57.02527672 |
56.89114624 |
|
17 |
2 |
17.81312934 |
17.95453546 |
1030.92508 |
1031.477291 |
|
18 |
2 |
18.70411669 |
18.34138039 |
1038.239984 |
1037.378745 |
|
19 |
2 |
19.90809538 |
19.61610263 |
1038.994597 |
1039.110803 |
|
20 |
2 |
20.95524163 |
20.63941779 |
1032.535647 |
1032.360714 |
|
21 |
2 |
21.9334208 |
21.33697043 |
1049.91314 |
1050.65413 |
|
22 |
2 |
22.59794314 |
22.61091149 |
1017.558147 |
1017.731457 |
|
23 |
2 |
23.97309406 |
23.96813104 |
1091.987385 |
1092.315969 |
|
24 |
2 |
24.2262783 |
24.49048101 |
1067.472213 |
1067.336126 |
|
25 |
2 |
25.22527904 |
25.67679616 |
1096.43594 |
1096.966764 |
|
26 |
2 |
26.02644968 |
26.97367096 |
1010.930397 |
1010.777961 |
|
27 |
2 |
27.63384582 |
27.21110565 |
1082.695333 |
1082.802688 |
|
28 |
2 |
28.47988155 |
28.12804475 |
1040.542836 |
1039.787307 |
|
29 |
2 |
29.73885441 |
29.50044641 |
1000.484869 |
1000.582004 |
|
30 |
2 |
30.09549581 |
30.07568797 |
1035.205303 |
1035.531397 |
|
31 |
2 |
31.54291848 |
31.21369724 |
1024.196812 |
1024.813376 |
|
32 |
2 |
32.93191134 |
32.51255305 |
1009.511315 |
1009.672211 |
Scatterplots of Raw Data


Residuals for “Good” Data




Tests
of Normality
|
|
Kolmogorov-Smirnov(a) |
Shapiro-Wilk |
||||
|
Statistic |
df |
Sig. |
Statistic |
df |
Sig. |
|
|
Unstandardized Residual |
.141 |
32 |
.105 |
.947 |
32 |
.115 |
a Lilliefors Significance Correction
Residuals for “Bad” Data (Dependent Variable
Distributed Discontinuously)




Tests
of Normality
|
|
Kolmogorov-Smirnov(a) |
Shapiro-Wilk |
||||
|
Statistic |
df |
Sig. |
Statistic |
df |
Sig. |
|
|
Unstandardized Residual |
.108 |
32 |
.200(*) |
.979 |
32 |
.779 |
* This is a lower bound of the true
significance.
a Lilliefors Significance Correction
User-Defined Plots That Distinguish the “Good” From
the “Bad” Data
Residuals as a Function
of the Predictor (IV)


Residuals as a Function
of the Predicted Variable (DV)


Residuals as a Function
of the Predicted Value of the DV

Bivariate
Normality
