Best-subset analysis with size-correction: SPSS 12-13 script

 

Pavel Klimov©

 

Description. This VBA script will generate all combinations or subset of combinations of the independent variables and performs either Canonical variates analysis (Discriminant function analysis) or Logistic regression. The output will be saved in the working directory and can be analyzed further in any spreadsheet application. You can modify this script for your own needs.

Installation.

1. Open SPSS ver. 12 or above.

2. Go to menu File, select Script.

3. Paste the content of this file

4. Save the script

You can test the script using this data file (Right-click: download link to disk)

 

Use of the script

 

1) Name your independent variables as "var00001 ... var00010 ... " (default for SPSS)

2) Name your dependent variable as "depend" (no quotation marks)

3) Define the following variables:

Nvar  - Number of independent variables

LNTr  - Do logarithmic (base e) transformation? (True/False)

ExternValid - Perform external validation? (True/False)

SubSetMin  - if you want to obtain subsets of particular size, enter its lower limit, otherwise enter 1

SubSetMax - if you want to obtain subsets of particular size, enter its upper limit, otherwise enter the number of you independent variables (should be equal to Nvar)

4) If you have a large number of independent variables (>12), SPSS may experience a memory problem terminating your analysis. To avoid this, your large analysis is divided onto several analyses each performing a smaller number of  iterations (on my computer, it is about 8000).  Define the following variables

Prt =True  activates this option, "False" turns it off

StartRange=1 (from 1 to n) starts  with specified number of iterations*

EndRange=8000 - stops after specified number of iterations and writes results to disk*

 

* These settings will perform 8000 analyses. To conduct another 8000 analyses set the variables again: StartRange=8001 and EndRange=1600

 

CVA

Define the following variables:

LR=False (tells the script to run CVA instead of Logistic regression)

if ExternValid=true you have to define another variable: SelectSet

By default SelectSet=vbCrLf & "/SELECT=val(0)" , where  "0" is the code for your analysis subset; "val" is a variable name of the variable defining the internal and external datasets (must be created in your datamatrix)

 

Logistic regression

Define the following variables:

LR=True (tells the script to run Logistic regression instead of CVA)

if ExternValid=true you have to define another variable: SelectSet

By default  SelectSet=vbCrLf & "/SELECT = val EQ 0"  where  "0" is the code for your analysis subset; "val" is a variable name of the variable defining the internal and external datasets (must be created in your datamatrix)

 

Output processing

The following VBA script will process your output leaving only variable names and hit ratio value

Installation:

1. Open MS Word

2. In menu select Tools-Macro-Macros and press "Create" and give a name of the macro

2. Paste the content of the this file

Use:

1. Open SPSS output file as text

2. Run the script: in menu select Tools-Macro-Macros and select the name of the macro from step 3 (Installation)

 

 

An outdated page that generates command syntax for datasets with small number of independent variables can be found here.