Last modified: 2003 Nov 14
HOME

Greenland/Witte Gauss Programs: Descriptions

This page contains, verbatim, the descriptions of the Greenland/Witte GAUSS utilities, as found at http://darwin.cwru.edu/~witte/gauss/prg/index.html on 14 Nov 2003. The links are to archived versions of these utilities as of the same date, as is this link to the entire set as a .zip file. Please visit http://darwin.cwru.edu/%7Ewitte/gauss/ for possible updates.

afcc:
This procedure computes the attributable fraction and its variance from case-control data modeled by logistic regression, two ways: maximum likelihood (Greenland & Drescher, 1993); and Benichou & Gail, 1990. WARNING: ML METHOD ASSUMES THE FOLLOWING: 1) FIRST ncons COEFFICIENTS ARE INTERCEPTS. 2) INCOMING bcov MATRIX IS INVERSE-INFORMATION MATRIX, UNCORRECTED FOR CASE-CONTROL SAMPLING, as one would obtain from an ordinary logistic program (proc will correct intercept variances by subtracting 1/m1k + 1/m0k from them). 3) ASSUMES UNMATCHED SAMPLING.
afcoh:
This procedure computes the attributable fraction and its variance from cohort data modeled by logistic regression, in two ways: iafse is based on score-beta covariance formula from Cox & Hinkley, cov(mx,b) = (1,0,...,0) if intercept is first term in b, where mx is observed total no. cases. See Greenland & Drescher, Biometrics, 1983. WARNING: iafse assumes first column in x is constant! gafse does not assume this, and does not use score-beta covariance formula.
aftg:
This proc does the Robins test on br and bp.
bdiag:
This proc creates a block diagonal matrix from a matrix that is a column of blocks, which need not be square. See bdiagc.g or bdiagr.g for square blocks.
bdiagc:
This proc creates a block diagonal matrix from a matrix that is a column of square matrices.
bdiagr:
This proc creates a block diagonal matrix from a matrix that is a row of square blocks.
bootst:
This proc takes a bootstrap sample from the data matrix x using the proc rndsrep (random sampling with replacement).
categ:
This proc takes a matrix x and returns a matrix xc of the same size whose elements are the number of cutpoints in cutp that the corresponding element of x is greater than.
cdfbin:
This proc computes the exact lower tail of a binomial(p,n) distribution using the F distribution (see, e.g., Ling, Amer Statist 1992;46:53-54.
cdfbinc:
This proc computes the exact upper tail of a binomial(p,n) distribution using the F distribution (see, e.g., Ling, Amer Statist 1992;46:53-54.
cdfnt:
This proc computes the two-tailed area for the standard normal distribution.
cdfpois:
This proc computes the exact lower tail of a Poisson distribution using the chi-squared distribution (see, e.g., Ulm, AJE 1990;131:373-5)
cdfpoisc:
This proc computes the exact upper tail of a Poisson distribution using the chi-squared distribution (see, e.g., Brownlee)
cdftt:
This proc computes the two-tailed area for a t distribution
changept:
Test program: let v = 1 2 1 1 1 4 5 5 6; format /rdn 4,0; v'; { ic,lnt } = changept(v); ic'; lnt'; end;
cinvnorm:
This procedure finds the value that cuts off the upper p proportion of a standard normal distribution F(x), i.e., solves p = 1 - F(x) for x, using the Derenzo approximation (see Hoaglin, Am Statist 1989; 43:289). Absolute error less than .00013 for p > 10E-7. NOTE THAT GAUSS 3.2.13 & LATER HAS A EQUALLY ACCURATE PROCEDURE 1-CDFNI(x).
clrimp:
This procedure does conditional logistic regression on multiple-imputed 1:m matched case-control data, using Rubin's method. A stack of multiple-imputed covariate matrices suitable for input to this procedure may be generated from MNIMP.G; however -- MAKE SURE THAT THE ORIGINAL DATA ARE SORTED ON id, THE MATCHED-SET ID NUMBERS, BEFORE GENERATING THE IMPUTATIONS!
combin:
This proc computes the combinatorial coefficients n!./(k!.*(n-k)!)
condlr:
This program does conditional logistic regression for 1:m matched sets with variable matching ratios. WARNINGS:1) EACH MATCHED SET IS ASSUMED TO HAVE AT LEAST ONE CASE AND ONE CONTROL. 2) NO MISSING VALUES OR OFFSETS ARE ALLOWED. 3) THE INPUT VECTORS AND MATRICES ARE ASSUMED TO BE SORTED ON id, THE VECTOR OF MATCHED-SET ID NUMBERS, so that all subjects in a matched set should be contiguous in the analysis data. CONDLRM.G DOES NOT HAVE THESE RESTRICTIONS.
condlrm:
This program does conditional logistic regression for m:(n-m) matched sets with variable matching ratio.
condlrmp:
This program does conditional logistic regression for m:(n-m) matched sets with variable matching ratio, penalized by a linear constraint b = z*bs, where bs is unknown.
corrp:
This proc returns the matrix of all pairwise correlations between columns of x and columns of y. output: correlation matrix with columns of x corresponding to rows and columns of y corresponding to columns; thus, the returned matrix will have cols(x) rows and cols(y) columns.
counteq:
For each element of y, this proc counts the number of occurrences in x that equal the element. Equivalent to sumr(x'.eq y).
countge:
For each element of y, this proc counts the number of occurrences in x that are > or = the element. Equivalent to sumr(x'.ge y).
countle:
For each element of y, this proc counts the number of occurrences in x that are < or = the element. Equivalent to sumr(x'.le y).
countpt:
This proc counts the person time about each failure in failure-time data.
coxmod:
This proc fits Cox models using exponential regression; it optionally tabulates and graphs the corresponding baseline survival curve. NOTES: 1. If n subjects are observed an average of k intervals each, this proc will generate k*n row working data matrices. 2. For time-dependent covariates, a separate subject record must be created for each covariate shift for each subject, with entry time at the covariate shift. 3. Baseline hazard and survival refer to distribution of failure times when all covariates are zero. Therefore, BE SURE YOUR COVARIATES ARE CODED SO THAT ZERO IS A MEANINGFUL VALUE FOR EACH!
coxmodem:
This proc fits Cox models using the discrete-time Poisson approximation with a profile-likelihood algorithm based on the Nelson-Cox-Oakes approximation to the cumulative (discrete) hazard (p. 108 of Cox and Oakes). NOTES: 1. If n subjects are observed with k unique failure times, this proc will generate a k*n matrix, and several k*n row vectors. 2. For time-dependent covariates, a separate record must be created for each covariate shift for each subject. 3. Program does not calculate true deviances or loglikelihoods; those printed during iteration are for monitoring convergence and model comparisons only. They are not valid tests of fit. 4. Baseline hazard and survival refer to distribution of failure times when all covariates are zero. Therefore, BE SURE YOUR COVARIATES ARE CODED SO THAT ZERO IS A MEANINGFUL VALUE FOR EACH!
crosstab:
This program creates a crosstabulation matrix from a data matrix, and prints the matrix if variable names are supplied.
cumsumr:
This proc computes the cumulative sums of rows of a matrix.
datetime:
This proc prints the date and time in the form "day mo, yr, at hr:min"
dateunpk:
Test program: let d = 030709 150987 112089; dateunpk(d); end;
deleteb:
This proc approximates the delta-betas (observation-deleted betas MINUS fitted betas) for models linear in the natural parameter of a one-parameter exponential-family error distribution. for generalized-linear (in the natural parameter) regressions.
density:
This proc computes the Epanechnikov kernel density estimate for x
desc:
This proc computes some basic descriptive statistics for the distributions of the columns of x, including percentiles useful for analysis of Monte-Carlo simulation results. Will also supply histograms to accompany these statistics. See DESCRIBE.G for a simpler, faster running proc to describe actual data, including GAUSS datasets.
descmiss:
This proc computes basic descriptive statistics on a dataset with missing values supplied by user (not GAUSS codes). Missing values are counted and dropped from computations.
describe:
This proc computes basic descriptive statistics on a dataset that has no missing values.
diagm:
This proc creates a diagonal matrix with diagonal vector x.
diagnose:
This proc does studentized residual analysis for generalized-linear regessions with canonical links.
diagplot:
This proc does residual and deltabeta plots.
difrence:
This proc computes vertical (row) differences of any order m.
dim:
This proc prints and returns the dimensions of the matrix x
dltabeta:
This proc does delta-beta analysis.
ebind:
(missing)
ebrr:
This is a GAUSS procedure for empirical-Bayes or semi-Bayes estimation of correlated person-time rates, or of regression coefficients from multiplicative relative-risk models (e.g., logistic, Cox, or exponential-Poisson regression). It is based on generalizations of Morris's 1983 (JASA) formulas (see Greenland, Stat Med 1993).
elapdays:
This proc takes contiguous-element index date(s) and returns days elapsed from reference date(s) to the index date(s).
epank:
This proc computes the Epanechnikov kernel matrix for a vector of values using a fixed bandwidth (see Silverman, 1986).
erpolyp:
This procedure does polytomous ML exponential regression with likelihood penalized by a linear constraint vec(b) = z*bs, where bs is unknown.
erquick:
This proc computes ML exponential-Poisson regression coefficients - no outputs or special options - returns only b, bcov, cnv.
exact2x2:
Test program: data from Martin & Austin, Epidemiology 1991: output file = exact2x2.out reset; _midp = 1; _clevel = .90; let a = 2 6 3 8; let b = 2 4 0 4; let c = 4 2 1 1; let d = 6 0 1 1; let names = "Endo ca" whcr bmi; _rrt = 5.85; _rrt = 4.573; { ormh,lse,pv } = ormh2x2(a,b,c,d,names); { rrm,rrl,rru,plf,puf,pm } = exact2x2(a,b,c,d,names); let a = 0 0 0 0; _rrt = .4864; { ormh,lse,pv } = ormh2x2(a,b,c,d,names); _rrt = .5; { rrm,rrl,rru,plf,puf,pm } = exact2x2(a,b,c,d,names); _rrt = 2; { rrm,rrl,rru,plf,puf,pm } = exact2x2(c,d,a,b,names); end; testing numerical limitations: _oflow = 740;
exactpt:
Test program: data from Guess et al. AJE 1987, with added zero stratum: output file = exactpt.out reset; _clevel = .95; _midp = 1; _maxit = 100; let a = 1 2 1 1 2 0; let b = 0 1 0 2 0 0; let t1 = 152 131 142 158 193 200; t1 = t1/100; let t0 = 1 1 1 1 1 1; let names = "rect ca" "isoniazd" "age"; { rdmh,dse,rrmh,rse,chi } = mhrate(a,b,t1,t0,names); test upper M-H limit: _rrt = rrmh*exp(1.645*rse);
exactsmr:
Test program: format 10,6; call exactsmr(10,1,"SMR"); end;
expit:
This proc does the logistic transform on x
expitg:
This proc returns the generalized logistic transform of x given s, a matrix of nonzero proportions each row of which sums to less than 1 (useful for multinomial logistic modelling). NOTE: x and s' must be conformable for x.*s'
expreg:
This procedure does ML exponential-Poisson regression
expregp:
This procedure does ML exponential Poisson regression with likelihood penalized by a linear constraint b = z*bs, where bs is unknown.
exprisk:
Test program (see also EXPREG.TST): output file = exprisk.out reset; let y = 8 5 22 16; let n = 106 120 98 85; _clevel = .9; let x[4,2] = 1 0 0 0 1 1 0 1; t2 = 1000000; let bnames = tolbutam agegt55; let yname = death; call exprisk(x,y,n,0,1,0,bnames,yname); call expriskp(x,y,n,0,0,t2,0,1,0,bnames,yname,0); call exprisk(x~prodr(x),y,n,0,1,0,bnames|"tol*age",yname); call expriskp(x~prodr(x),y,n,0,0,t2,0,1,0,bnames|"TOL*AGE",yname,0); end;
expriskp:
This procedure does ML exponential risk regression with likelihood penalized by a linear constraint b = z*bs, where bs is unknown.
expspec:
This proc computes covariate-specific risk or rate ratios and differences from exponential (log-linear) regression output.
expstand:
This proc computes standardized rate ratios and differences from exponential-regression output.
firstdif:
This proc returns the matrix of first differences for the columns of x. Note that the returned matrix has one less row than x.
freq:
This proc does frequency counts on the columns of x. Column proportions will also be printed if only 1 count per column is specified,
freqmean:
This proc does frequency counts & group means on the columns of x.
glml:
This proc does Maximum-Likelihood estimation via Newton-Raphson for generalized-linear models. Separate record must be entered for each observed regressor-outcome category NOTE CAREFULLY: ll, dll1, and dll2 1) MUST be defined to handle vector inputs. 2) MUST be passed with the character "&" in front of their names. ALSO: 3) The first two moments of the y distribution should exist. 4) Each record must represent an independent observation.
groupdat:
indc:
This proc takes a matrix x and returns a matrix xind with the same number of rows, each column an indicator for whether the x value in that row of col k equals the corresponding value in codes. NINDC.G generates names for the columns of xind.
indc2:
This proc takes a matrix x and returns a matrix xind with the same number of rows, each column an indicator for whether the x value in that row of col k equals a particular value seen in that column. The indicators for a column are ordered to correspond to ascending values in the column. NINDC2.G generates names for the columns of xind.
index:
Test program: let listn = age sex "sex" state race income;
indinc:
This proc takes a matrix x and returns an incremental-indicator matrix xind with the same number of rows as x, and with a block of columns corresponding to each column of x. Each column in block k of xind indicates whether the x[.,k] value in that row is greater than or equal to a unique observed value of x[.,k]. The indicators for a column are ordered to correspond to ascending values in the column. NINDC2.G will generate names for the columns of xind.
inter:
Test program: x = seqa(1,1,12); let names = a; x = reshape(x,2,6); let names = abcdef bcdef cdef def ef f; xint = inter(x); namint = ninter(names); format /rdn 10,0; x; xint; $namint'; end;
invnorm:
Test: invnorm(.05|.95);
invt:
Test program: let p = .025 .5 .95 .975 .99 .995 .995 .975; let df = 8 4 3 3 3 4 3 2; t = invt(p,df); p~df~t~(1-cdftc(t,df)); end;
kapmeier:
This proc computes the Kaplan-Meier survival probability for the times t without ordering the times.
kapmu:
This proc computes the Kaplan-Meier survival probability for the times t without ordering the times.
kernmean:
This proc computes kernel (running weighted) means using the Epanechnikov kernel function.
liklim:
This proc searches for LR limits.
liklimp:
This proc searches for penalized LR limits. CURRENTLY WORKS ONLY FOR REGRESSORS WITH INDEPENDENT OR FLAT PRIOR Will not work for those regressors penalized by a linear constraint b = z*bs, where bs is unknown.
logit:
This proc takes the logit of a number p between 0 and 1
logitg:
This proc returns the generalized logit transform of the matrix p of nonzero proportions. Each row of p must sum to less than 1. (useful for multinomial logistic modelling)
logitr:
This proc calculates risks, risk ratios, and risk differences and their confidence limits from logistic coefficients and their covariances, for the indicated regressor vectors.
logrank:
This proc does unweighted (1 df), Prentice-Wilcoxon (1 df), and Robins (2 df) logrank trend tests on input data, and returns the test scores and variances.
logreg:
This procedure does ML logistic regression
logregp:
This procedure does ML logistic regression with likelihood penalized by a linear constraint b = z*bs, where bs is unknown.
logregs:
This procedure does kernel-smoothed logistic regression via local likelihood
lr2stage:
This procedure computes the Breslow-Cain pseudo-maximum likelihood estimates for logistic regression analysis of two-stage case-control data. Note that a constant term is required and automatically added to the model. See Breslow, N.E. and Cain, K.C. (1988), Biometrika, v. 75, p. 11-20.
lrimp:
This procedure does unconditional logistic regression on multiple-imputed data using Rubin's method. A stack of multiple-imputed covariate matrices suitable for input to this procedure (denoted by ximp below) may be generated from MNIMP.G.
lrordco:
This procedure does ordinal ML logistic regression using the cumulative-odds model.
lrordcr:
This procedure does ordinal ML logistic regression using the continuation-ratio model, via nested ordinary logistic regression.
lrordst:
This procedure does ordinal ML logistic regression using the stereotype model, via constrained polytomous logistic fitting. With fixed scores, this is equivalent to the adjacent-category logit model. WARNING: With estimated scores option (s = 0), convergence can be very slow and ordinary standard error estimates are invalid. To address this problem, a naive bootstrap option is provided (set nboot > 30).
lrpm:
This procedure does ML logistic regression with likelihood penalized by multiple linear constraints of the form b = z*bs, where bs is unknown.
lrpmp:
This procedure does polytomous ML logistic regression with likelihood penalized by MULTIPLE linear constraints of the form vec(b) = z*bs, where bs is unknown and the constraints are independent.
lrpoly:
This procedure does polytomous ML logistic regression
lrpolyp:
This procedure does polytomous ML logistic regression with likelihood penalized by a linear constraint vec(b) = z*bs, where bs is unknown.
lrpquick:
This proc does penalized-likelihood logistic regression - no outputs or special options - returns only b,bcov,bs,vs,t2,cnv.
lrpseudo:
This procedure does logistic regression via the Flanders-Greenland pseudo-likelihood (Stat Med 1991) for nested and two-stage studies.
lrquick:
This proc computes ML binomial-logistic regression coefficients - no outputs or special options - returns only b, bcov, cnv.
lrquickl:
This proc computes ML binomial-logistic regression coefficients - no outputs or special options - returns only b, bcov, llik, cnv.
lrspec:
This proc computes covariate-specific proportion ratios and differences from logistic-regression output.
lrstand:
This proc computes standardized proportion ratios and differences from logistic-regression output.
lrtpq:
This proc computes truncated-Poisson logistic regression coefficients - no outputs or special options - returns only b, bcov, cnv.
matcol:
Test program: let dat[6,3] = 0 0 3 4 1 1 0 0 3 0 0 3 4 1 1 1 0 3; let rep = 1 1 2 1 8 1; gdat = matcol(dat~rep,1|2|3); format /rds 5,0; dat~rep; gdat; output file = temp.out reset; load dat[1633,10] = \gauss\dat\gedi.asc; t0 = date; gdat = matcol(dat,0); "Total run time matcol: ";; etstr(ethsec(t0,date)); sumc(gdat[.,cols(gdat)]); format /rds 5,0; gdat; end;
max:
This proc returns the elementwise maximum of x and y
maxr:
This proc returns the maximum in each row of x
mclimits:
This procedure computes 95% limits, medians, and two-sided p values from a Monte-Carlo distribution.
meancw:
This proc returns the w-weighted column means of the input matrix x. w may have either 1 column, for equal weights across columns of x, or cols(x) columns, for differing weights across columns of x.
meancwts:
This proc returns the w-weighted column means of the input matrix x. w may have either 1 column, for equal weights across columns of x, or cols(x) columns, for differing weights across columns of x.
medianc:
This procedure finds the medians of the columns of x. Runs in about 76% the time of median(x).
metanal:
This proc does basic unstratfied meta-analysis - see METAREG.G for meta-regression.
metanalc:
This proc does basic unstratfied meta-analysis for a continuous outcome - see METAREG.G for meta-regression.
metareg:
This program does log-linear weighted relative-risk regression, both fixed and random intercept ("random effects"), customized for meta-analyses of relative risks.
mexpand:
Test program: x = seqa(1,1,4); m = rev(x); x = x~(x+4); mexpand(x,m); end;
mhclr:
This procedure fits conditional logistic models by the estimating-equation (generalized Mantel-Haenszel) method of Liang (Biometrics, June 1987, 289-299). Calls: mlogreg.g, abbreviated logistic regression proc.
mhrate:
This proc does Mantel-Haenszel analysis of person-time rates.
mhrisk:
This proc does Mantel-Haenszel analysis of average risks (incidence proportions) in cohort data.
mhtrend:
This proc computes the Mantel-Haenszel trend statistic for 2xJxK arrays.
middle:
This proc finds the middle of the three values in each row of x~y~z, where x,y, & z are vectors. Runs in under 7% the time of median(x).
min:
This proc returns the elementwise minimum of x and y
minr:
This proc returns the minimum in each row of x
misclass:
This proc does sensitivity analysis of the impact of misclassification of columns (within rows) on the Mantel-Haenszel odds ratio for K 2x2 tables, allowing for possible differential errors across rows.
mnimp:
This proc imputes values for those missing in each column of a matrix xm by randomly sampling values from a fitted multivariate normal regression model for xm, using any complete variables xc as the regressors. To provide reasonably fast computing times, the missing-data means are fit by a noniterative least squares method rather than by maximum likelihood.
mpsumc:
This proc partially sums the columns of matrix x according to the sequence given in m.
mscore:
This proc computes the conditional logistic deviance, score, and information at b for matched-set case-control sum differences xs.
negpart:
This proc returns the negative part of the elements of the matrix x.
negpower:
test program: negpower(-(2|8|4),0|(1/3)|(-1/2));
nindc:
Test program:
nindc2:
This proc takes the names "names" of columns of a matrix x and returns a vector iname of names for the columns of the indicator matrix for x, naming them names1,names2,etc., where names gets truncated at the seventh character. Companion to INDC2.G.
ninter:
This proc creates names for interaction (product) terms for regressions. Companion to INTER.G.
ninter2:
This proc creates names for interaction (product) terms for regressions for two separate sets of regressors cross-multiplied, i.e., for x1*~x2.
obstime:
This proc returns the time observed in each interval on each subject, where
ormh2x2:
This proc does Mantel-Haenszel OR analysis of a series of 2-by-2 tables, provided there are no zero cells in the crude table.
ormhrxc:
This proc does Mantel-Haenszel and exact odds-ratio analysis of a series of r-by-c tables, treating the first row and column as the reference levels (reference levels may be changed by setting globals; see below)
orst2x2:
This proc does standardized OR analysis of a series of 2-by-2 tables, using the total population distribution (as imputed from the cases) as standard, provided there are no zero cells in the strata. Not recommended if there are many small cells within strata.
orstrxc:
This proc does standardized odds-ratio analysis of series of r-by-c tables, standardized to the total population distribution (as imputed from cases), treating the first row and column as the reference levels (reference levels may be changed by setting globals; see below)
pbinf:
This proc computes the exact probability of a binomial(p,n) outcome k using the F distribution (see, e.g., Ling, Amer Statist 1992;46:53-54)
pbinl:
Test program: format 9,4; t = 10000; n = 4*ones(3*t,1); let k = 0 2 4; k = k.*.ones(t,1); let p = 0 .5 1; p = ones(t,1).*.p; t0 = date; call pbinl(k,p,n); "Total run time pbinl: ";; etstr(ethsec(t0,date)); t0 = date; call pbinf(k,p,n); "Total run time pbinf: ";; etstr(ethsec(t0,date)); k~p~n~pbinl(k,p,n)~pbinf(k,p,n);
pctile:
Test program: x = 100*rndu(11,2); c = pctile(x,3|4); format /rds 9,3; c; sortc(x[.,1],1)~sortc(x[.,2],1); end;
pearson:
This procedure computes the Pearson chi-squared for a 2-way table.
permuter:
Test program: x = seqa(.1,.1,20); format /rdn 6,1;
phyperg:
Test program:
phypnull:
This proc computes the null hypergeometric probabilities of the 2x2 tables that are the rows of t:
pospart:
This proc returns the positive part of the elements of x.
powbin:
Test (should yield .95,.95): powbin(.25,.4,.01,530|357,265|357)';
ppois:
Test program: ppois(0,1); ppois(1,0); ppois(0,0); end;
prodr:
This proc computes the column vector of the products across rows of x
quadsoln:
This proc solves the quadratic equation a*x^2 + b*x + c = 0. Returns the roots (which may be complex) in a two-column matrix (one row for each equation).
rangec:
This proc returns a column vector r containing the ranges of each column of the input matrix x
rangechk:
This proc checks that range spanned by each column j of x is within the range of the corresponding column of r (or the single column of r).
ratetab:
This proc crosstabulates the ratios sumc(y)'/sumc(d) within subcategories of the variables in x.
readme:
THE *.G PROCEDURES FROM S. GREENLAND 13 Feb 1998 This is a set of over 150 GAUSS 3.2 procedures for statisical analysis and stat programming. The procedures range from one-line language extensions (such as MAX.G, MIN.G, & SUMR.G) that simplify coding, to full programs for nonlinear regression models, including a few (such as stereotype logistic regression, LRORDST.G; logistic regression for two-stage and nested studies, LR2STAGE.G and LRPSEUDO.G; and penalized-likelihood regression, EXPREGP.G, EXPRISKP.G, LOGREGP.G, LRPM.G, LRPOLYP.G, and LRPMP.G) that are currently not available in standard packages. EACH PROCEDURE FILE BEGINS WITH A PROCEDURE DESCRIPTION. READ IT TO FIND OUT HOW TO USE THE PROCEDURE. Read the procedure code to see how it works. This procedure set is experimental and is under continuous development. Copies or updates may be obtained by sending an MS-DOS formatted diskette in a STAMPED SELF-ADDRESSED DISK MAILER, with a cover letter making your request and giving your title & address, to me: Sander Greenland 22333 Swenson Dr. Topanga, CA 90290 (310) 455-1197 voice, (310) 455-1428 fax Incomplete requests will be ignored. If you use these procedures for more than a year, an update is strongly recommended. If you use or examine them enough, you will undoubtedly discover bugs, stat errors, and programming inefficiencies in the procedures. I would greatly appreciate being notified of anything you find, and would welcome suggestions for improving the code or set. In coding these programs, I have tried to strike a balance between transparency and efficiency, but in some cases may be far from both. Because the regression procedures are partly for instructional purposes, however, I have used only basic nonlinear algorithms. REQUIREMENTS: You must have a licensed copy of GAUSS 3.x and run the procedures from a GAUSS program or the GAUSS interpreter. This means that you must know how to read and manipulate your data and call procedures in GAUSS. A math coprocessor is required. You may encounter problems if your GAUSS version is before 3.2 and your CPU is a x86SL or x86SLC, as found in some laptops and a few rare desktops. INSTALLATION: Currently, you have at least two options for mounting the set: Option 1: Create a subdirectory of your GAUSS directory called (for example) PRG. Copy the entire set into that subdirectory. Then use a text editor to modify the GAUSS configuration program GAUSSI.CFG in the GAUSS directory as follows: Change the line src_path = $(GAUSSDIR)\src to src_path = $(GAUSSDIR)\src;$(GAUSSDIR)\prg YOU MUST DO THIS EVERY TIME YOU INSTALL GAUSS OR A GAUSS UPDATE. Option 2: Load the set into the SRC subdirectory. BUG IN GAUSS 3.2 COMPILER: There is a bug in the Gauss compiler which causes complilation to terminate with a "Wrong number of returns" error when calling certain *.G procedures under certain conditions, especially DIM.G, INVNORM.G, and SPLINE.G. If this should happen, simply append the cited procedure to the end of the calling program. WARNINGS: Although the procedures are simple ASCII text files, practice safe file transfer protocol: virus scan the disk before installing. DISCLAIMER: This procedure set is supplied with absolutely no warranties expressed or implied. Sander Greenland accepts no responsibility for any damages arising from its use.
reganova:
This proc produces a matrix, a, which converts a coefficient vector beta wtih regression-coded indicators (j-1 variables coded 0,1 for j categories) to a coefficent vector alpha with ANOVA dummy variables (j variables coded -1,1 for j categories); also supplied is a matrix, b, which converts alpha back to beta. a and b are conditional inverses of one another, i.e., a*b*a = a and b*a*b = b, but are not Moore-Penrose inverses of one another (a*b is not symmetric). NOTES: 1. Indicator groups MUST be contiguous in beta! 2. Indicator groups MUST form the terminal segment of beta! 3. beta and alpha MUST have intercept in first place!
regcurve:
This proc computes and (if reqested) graphs the fitted values and their confidence limits for the given regressor values.
restrict:
This proc restricts (truncates) a vector x by lb below and ub above. lb and ub may be scalar or vector of same dimension as x.
rlsmooth:
This proc computes the smoothed running-weighted linear regression of y on x.
rndbern:
This procedure generates a matrix of Bernoulli variates with probabilities p
rndbin:
This procedure generates a vector y of binomial variates with probabilities p and totals n
rndchi:
This proc generates chi-squared variates by adding gamma and Gaussian variates.
rnddisc:
This program generates discrete variates with distributions (1-sumr(p))~p over the range 0 to cols(p)
rndgami:
This program generates gamma variates for integral values of the shape parameter and unit values of the scale parameter by summing exponential variates. Not recommended for large values of a. See Devroye, 1986, p. 405. NOTE: To generate a gamma(a + 1/2,1) variate, add a chi-squared(1)/2 variate to a gamma(a,1) variate, or use a chi-squared(2a+1)/2 variate.
rndmn:
Test program (should print back a number near r):
rndmult2:
This program generates a multinomial variate with distribution p and total n. Faster than rndmultn.g when n > 55.
rndmultn:
This procedure generates a multinomial variate with distribution p and total m. Faster than rndmult2 when n < 55.
rndpois:
This procedure generates a vector y of Poisson variates with means mu -- truncation is done at far right tail -- efficient only if all values of mu are no too large (under 10)
rndsrep:
Test program: t0 = date; i = 0; t = 400; p = 0; r = 400; rt = r*t; do until i ge t; i = i + 1; p = p + rows(unique(rndsrep(r,r),1))/rt; endo; "Total run time: ";; etstr(ethsec(t0,date)); format /rdn 7,4; "Average proportion selected:";; p; "(stochastically converges to ";; 1-exp(-1);; " as t and r increase)"; end;
roc:
test program: let y = 1 1 1 1 1 0 0 0 0 0; let py = .7 .3 .5 .5 .4 .5 .4 .6 .2 .1; _ni = 10; call roc(y,py,"y",1); end; load dat[130,8] = \gauss\prg\testdat\rosm3.dat; y = dat[.,1:4]; py = dat[.,8 5:7]; let names = g c2 c5 c11; i = 2; _ni = 60; call roc(y[.,i],py[.,1],names[i],0); end;
rree:
This procedure fits multiplicatve risk-ratio and odds-ratio models to matched sets using the Mantel-Haenszel estimating equations of Greenland (Applied Statistics,1994), with Gauss-Newton fitting.
rreport:
This proc generates report of results of RR regression. NOTE: Returned P-values & confidence limits are from normal distribution unless df is negative, in which case a t distribution with -df degrees of freedom is used.
scoreint:
This procedure checks all interactions among covariates with indices indint.
scoretst:
This proc computes the score test for columns of x with coefficients constrained to zero in a generalized linear model linear in the natural parameter, such as binomial-logistic and Poisson-exponential models.
seleltc:
Test program: let s[3,2] = 1 2 2 3 2 1; x = reshape(seqa(0,1,9),3,3); format /rdn 10,0; x; s; seleltc(reshape(seqa(0,1,9),3,3),s); end;
sign:
This proc returns -1 for negative arguments, 1 for positive arguments, 0 for zero arguments.
sortonc:
Test program: n = 20; x = ceil(10*rndu(2*n,1)); v = permuter(unique(x,1)); v = v[1:5]|100|v[6:10]; xs = sortonc(x,1,v); format /rds 5,0; x[1:n]~xs[1:n]~x[n+1:2*n]~xs[n+1:2*n]; v'; end;
spline:
This proc creates regression-spline variables (excluding constant).
ssbin:
Test (should yield 530,357): ssbin(.25,.4,.01,.95,.5|1)';
stepfun:
Test program: b = seqa(0,1,10); h = 10*rndu(9,2); { x,y } = stepfun(b,h); library pgraph; xy(x,y); end;
subst:
Test program: format /rdn 7,0; v = seqa(1,1,3); let s[3,2] = 1 2 2 3 3 4; subst(-ones(3,4),s[.,.],v); end;
sumall:
This procedure sums all elements of a matrix
sumcomb:
Test program: let x[5,2] = 1 2 3 4 5 6 7 8 9 10 ; x; print; xc = sumcomb(x,2); xc; end;
sumr:
This proc sums the rows of x
tracemat:
This procedure computes the trace of a matrix
vecomb:
Test program: format /rdn 6,0; vecomb(seqa(1,1,5),3); end;
wls:
This procedure does weighted-least squares regression with an information-theory covariance estimate (w assumed to be inverse covariance matrix or vector of inverse variances for y). NOTE: OLS estimates can be obtained using w = 1, in which case bcov will be premultiplied by rss/df, the residual mean square, and a t distribution will be used for p-values & confidence limits.
wlsd:
(missing)
wlsquick:
This proc computes the WLS coefficients and inverse sum-of-covariate-cross-products matrix from x,y,w.
ypred:
Test program: let x = -1 0 1; let y = -3 1 2; let w = 1 2 1; let rep = 20 10 20; rep = 1; const = 1; x = x~(x^2); { b,bcov,rss,rsq,df } = wls(x,y,w,rep,const,"x"|"x^2","y"); xf = 2*sortc(rndu(50,1),1) - 1; xf = xf~(xf^2); _pdate = 0; r = 1.96; call ypred(xf,b,bcov,const,r,0,2,"x","y"); end;