Are There Groups in My Bivariate Data (Multi-Modal Distribution of the Residuals)?

 

Example Data

 

Data for 2 groups (n = 16 / group). “Good” variables are continuously distributed. “Bad” variables are distributed as discontinuous clumps. For all 4 variables, there were reliable mean differences across the two groups, ps < .001.  

 

ID

Group

X-good

Y-good

X-bad

Y-bad

1

1

1.706869103

1.653548065

35.4431819

35.04978001

2

1

2.994848769

2.621083418

76.74094841

75.9303925

3

1

3.871574995

3.949357264

69.72873546

69.1680618

4

1

4.849016166

4.399096261

54.87897561

54.76085723

5

1

5.735704311

5.563367331

93.72286098

93.37277626

6

1

6.041889223

6.605389671

39.37971873

39.16544205

7

1

7.743283518

7.998757631

39.15042014

39.01157035

8

1

8.01002315

8.648685928

30.97638529

30.88125096

9

1

9.955827228

9.390534795

19.10974437

19.89232355

10

1

10.80471514

10.3187547

97.85356376

97.43721588

11

1

11.70123165

11.00740773

2.241225011

2.410262939

12

1

12.9626271

12.17204018

16.53634847

16.69128277

13

1

13.81794477

13.24690849

55.43216023

55.4163467

14

1

14.09956246

14.01424398

61.28500307

61.93372736

15

1

15.10141231

15.18160767

90.48044169

90.0038368

16

1

16.78028021

16.34330535

57.02527672

56.89114624

17

2

17.81312934

17.95453546

1030.92508

1031.477291

18

2

18.70411669

18.34138039

1038.239984

1037.378745

19

2

19.90809538

19.61610263

1038.994597

1039.110803

20

2

20.95524163

20.63941779

1032.535647

1032.360714

21

2

21.9334208

21.33697043

1049.91314

1050.65413

22

2

22.59794314

22.61091149

1017.558147

1017.731457

23

2

23.97309406

23.96813104

1091.987385

1092.315969

24

2

24.2262783

24.49048101

1067.472213

1067.336126

25

2

25.22527904

25.67679616

1096.43594

1096.966764

26

2

26.02644968

26.97367096

1010.930397

1010.777961

27

2

27.63384582

27.21110565

1082.695333

1082.802688

28

2

28.47988155

28.12804475

1040.542836

1039.787307

29

2

29.73885441

29.50044641

1000.484869

1000.582004

30

2

30.09549581

30.07568797

1035.205303

1035.531397

31

2

31.54291848

31.21369724

1024.196812

1024.813376

32

2

32.93191134

32.51255305

1009.511315

1009.672211

 

 

Scatterplots of Raw Data

 

 

 

 

Residuals for “Good” Data

 

                                                                                    Tests of Normality

 

 

Kolmogorov-Smirnov(a)

Shapiro-Wilk

  

Statistic

df

Sig.

Statistic

df

Sig.

Unstandardized Residual

.141

32

.105

.947

32

.115

a  Lilliefors Significance Correction

 

 

 

 

Residuals for “Bad” Data (Dependent Variable Distributed Discontinuously)

                                                                                    Tests of Normality

 

 

Kolmogorov-Smirnov(a)

Shapiro-Wilk

  

Statistic

df

Sig.

Statistic

df

Sig.

Unstandardized Residual

.108

32

.200(*)

.979

32

.779

*  This is a lower bound of the true significance.

a  Lilliefors Significance Correction

 

 

User-Defined Plots That Distinguish the “Good” From the “Bad” Data

Residuals as a Function of the Predictor (IV)

 

Residuals as a Function of the Predicted Variable (DV)

 

Residuals as a Function of the Predicted Value of the DV

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Bivariate Normality