Last modified: 2003 Nov 14
HOME
Greenland/Witte Gauss Programs: Descriptions
This page contains, verbatim, the descriptions of the Greenland/Witte GAUSS utilities,
as found at http://darwin.cwru.edu/~witte/gauss/prg/index.html on 14 Nov 2003.
The links are to archived versions of these utilities as of the same date,
as is this link to the entire set as a .zip file.
Please visit http://darwin.cwru.edu/%7Ewitte/gauss/ for possible updates.
- afcc:
-
This procedure computes the attributable fraction and its variance from
case-control data modeled by logistic regression, two ways:
maximum likelihood (Greenland & Drescher, 1993); and Benichou & Gail, 1990.
WARNING: ML METHOD ASSUMES THE FOLLOWING:
1) FIRST ncons COEFFICIENTS ARE INTERCEPTS.
2) INCOMING bcov MATRIX IS INVERSE-INFORMATION MATRIX,
UNCORRECTED FOR CASE-CONTROL SAMPLING, as one would obtain
from an ordinary logistic program (proc will correct intercept
variances by subtracting 1/m1k + 1/m0k from them).
3) ASSUMES UNMATCHED SAMPLING.
- afcoh:
-
This procedure computes the attributable fraction and its variance from
cohort data modeled by logistic regression, in two ways:
iafse is based on score-beta covariance formula from Cox & Hinkley,
cov(mx,b) = (1,0,...,0) if intercept is first term in b,
where mx is observed total no. cases.
See Greenland & Drescher, Biometrics, 1983.
WARNING: iafse assumes first column in x is constant!
gafse does not assume this, and does not use score-beta covariance formula.
- aftg:
-
This proc does the Robins test on br and bp.
- bdiag:
-
This proc creates a block diagonal matrix from a matrix that is a column of
blocks, which need not be square. See bdiagc.g or bdiagr.g for square blocks.
- bdiagc:
-
This proc creates a block diagonal matrix
from a matrix that is a column of square matrices.
- bdiagr:
-
This proc creates a block diagonal matrix from a matrix
that is a row of square blocks.
- bootst:
-
This proc takes a bootstrap sample from the data matrix x using the proc
rndsrep (random sampling with replacement).
- categ:
-
This proc takes a matrix x and returns a matrix xc of the same size whose
elements are the number of cutpoints in cutp that the corresponding element
of x is greater than.
- cdfbin:
-
This proc computes the exact lower tail of a binomial(p,n) distribution
using the F distribution (see, e.g., Ling, Amer Statist 1992;46:53-54.
- cdfbinc:
-
This proc computes the exact upper tail of a binomial(p,n) distribution
using the F distribution (see, e.g., Ling, Amer Statist 1992;46:53-54.
- cdfnt:
-
This proc computes the two-tailed area for the standard normal distribution.
- cdfpois:
-
This proc computes the exact lower tail of a Poisson distribution
using the chi-squared distribution (see, e.g., Ulm, AJE 1990;131:373-5)
- cdfpoisc:
-
This proc computes the exact upper tail of a Poisson distribution
using the chi-squared distribution (see, e.g., Brownlee)
- cdftt:
-
This proc computes the two-tailed area for a t distribution
- changept:
-
Test program:
let v = 1 2 1 1 1 4 5 5 6;
format /rdn 4,0; v'; { ic,lnt } = changept(v); ic'; lnt';
end;
- cinvnorm:
-
This procedure finds the value that cuts off the upper p proportion of a
standard normal distribution F(x), i.e., solves p = 1 - F(x) for x, using
the Derenzo approximation (see Hoaglin, Am Statist 1989; 43:289).
Absolute error less than .00013 for p > 10E-7.
NOTE THAT GAUSS 3.2.13 & LATER HAS A EQUALLY ACCURATE PROCEDURE
1-CDFNI(x).
- clrimp:
-
This procedure does conditional logistic regression on multiple-imputed
1:m matched case-control data, using Rubin's method.
A stack of multiple-imputed covariate matrices suitable for input to this
procedure may be generated from MNIMP.G; however --
MAKE SURE THAT THE ORIGINAL DATA ARE SORTED ON id,
THE MATCHED-SET ID NUMBERS, BEFORE GENERATING THE IMPUTATIONS!
- combin:
-
This proc computes the combinatorial coefficients n!./(k!.*(n-k)!)
- condlr:
-
This program does conditional logistic regression for 1:m matched sets
with variable matching ratios.
WARNINGS:1) EACH MATCHED SET IS ASSUMED TO HAVE AT LEAST ONE CASE AND
ONE CONTROL.
2) NO MISSING VALUES OR OFFSETS ARE ALLOWED.
3) THE INPUT VECTORS AND MATRICES ARE ASSUMED TO BE SORTED ON id,
THE VECTOR OF MATCHED-SET ID NUMBERS, so that all subjects
in a matched set should be contiguous in the analysis data.
CONDLRM.G DOES NOT HAVE THESE RESTRICTIONS.
- condlrm:
-
This program does conditional logistic regression for m:(n-m) matched sets
with variable matching ratio.
- condlrmp:
-
This program does conditional logistic regression for m:(n-m) matched
sets with variable matching ratio, penalized by a linear constraint
b = z*bs, where bs is unknown.
- corrp:
-
This proc returns the matrix of all pairwise correlations between columns
of x and columns of y.
output: correlation matrix with columns of x corresponding to rows
and columns of y corresponding to columns; thus, the returned
matrix will have cols(x) rows and cols(y) columns.
- counteq:
-
For each element of y, this proc counts the number of occurrences in x
that equal the element. Equivalent to sumr(x'.eq y).
- countge:
-
For each element of y, this proc counts the number of occurrences in x
that are > or = the element. Equivalent to sumr(x'.ge y).
- countle:
-
For each element of y, this proc counts the number of occurrences in x
that are < or = the element. Equivalent to sumr(x'.le y).
- countpt:
-
This proc counts the person time about each failure in failure-time data.
- coxmod:
-
This proc fits Cox models using exponential regression; it optionally
tabulates and graphs the corresponding baseline survival curve.
NOTES: 1. If n subjects are observed an average of k intervals each, this
proc will generate k*n row working data matrices.
2. For time-dependent covariates, a separate subject record must be
created for each covariate shift for each subject, with entry
time at the covariate shift.
3. Baseline hazard and survival refer to distribution of failure
times when all covariates are zero. Therefore, BE SURE YOUR
COVARIATES ARE CODED SO THAT ZERO IS A MEANINGFUL VALUE FOR EACH!
- coxmodem:
-
This proc fits Cox models using the discrete-time Poisson approximation
with a profile-likelihood algorithm based on the Nelson-Cox-Oakes
approximation to the cumulative (discrete) hazard (p. 108 of Cox and Oakes).
NOTES: 1. If n subjects are observed with k unique failure times, this
proc will generate a k*n matrix, and several k*n row vectors.
2. For time-dependent covariates, a separate record must be
created for each covariate shift for each subject.
3. Program does not calculate true deviances or loglikelihoods;
those printed during iteration are for monitoring convergence
and model comparisons only. They are not valid tests of fit.
4. Baseline hazard and survival refer to distribution of failure
times when all covariates are zero. Therefore, BE SURE YOUR
COVARIATES ARE CODED SO THAT ZERO IS A MEANINGFUL VALUE FOR EACH!
- crosstab:
-
This program creates a crosstabulation matrix from a data matrix, and
prints the matrix if variable names are supplied.
- cumsumr:
-
This proc computes the cumulative sums of rows of a matrix.
- datetime:
-
This proc prints the date and time in the form "day mo, yr, at hr:min"
- dateunpk:
-
Test program: let d = 030709 150987 112089; dateunpk(d); end;
- deleteb:
-
This proc approximates the delta-betas (observation-deleted betas
MINUS fitted betas) for models linear in the natural parameter of a
one-parameter exponential-family error distribution.
for generalized-linear (in the natural parameter) regressions.
- density:
-
This proc computes the Epanechnikov kernel density estimate for x
- desc:
-
This proc computes some basic descriptive statistics for the
distributions of the columns of x, including percentiles useful for
analysis of Monte-Carlo simulation results. Will also supply histograms
to accompany these statistics. See DESCRIBE.G for a simpler, faster
running proc to describe actual data, including GAUSS datasets.
- descmiss:
-
This proc computes basic descriptive statistics on a dataset
with missing values supplied by user (not GAUSS codes).
Missing values are counted and dropped from computations.
- describe:
-
This proc computes basic descriptive statistics on a dataset
that has no missing values.
- diagm:
-
This proc creates a diagonal matrix with diagonal vector x.
- diagnose:
-
This proc does studentized residual analysis for generalized-linear
regessions with canonical links.
- diagplot:
-
This proc does residual and deltabeta plots.
- difrence:
-
This proc computes vertical (row) differences of any order m.
- dim:
-
This proc prints and returns the dimensions of the matrix x
- dltabeta:
-
This proc does delta-beta analysis.
- ebind:
-
(missing)
- ebrr:
-
This is a GAUSS procedure for empirical-Bayes or semi-Bayes estimation
of correlated person-time rates, or of regression coefficients from
multiplicative relative-risk models (e.g., logistic, Cox, or
exponential-Poisson regression). It is based on generalizations of
Morris's 1983 (JASA) formulas (see Greenland, Stat Med 1993).
- elapdays:
-
This proc takes contiguous-element index date(s) and returns
days elapsed from reference date(s) to the index date(s).
- epank:
-
This proc computes the Epanechnikov kernel matrix for a vector of values
using a fixed bandwidth (see Silverman, 1986).
- erpolyp:
-
This procedure does polytomous ML exponential regression with likelihood
penalized by a linear constraint vec(b) = z*bs, where bs is unknown.
- erquick:
-
This proc computes ML exponential-Poisson regression coefficients
- no outputs or special options - returns only b, bcov, cnv.
- exact2x2:
-
Test program: data from Martin & Austin, Epidemiology 1991:
output file = exact2x2.out reset; _midp = 1; _clevel = .90;
let a = 2 6 3 8; let b = 2 4 0 4; let c = 4 2 1 1; let d = 6 0 1 1;
let names = "Endo ca" whcr bmi; _rrt = 5.85; _rrt = 4.573;
{ ormh,lse,pv } = ormh2x2(a,b,c,d,names);
{ rrm,rrl,rru,plf,puf,pm } = exact2x2(a,b,c,d,names);
let a = 0 0 0 0; _rrt = .4864;
{ ormh,lse,pv } = ormh2x2(a,b,c,d,names);
_rrt = .5; { rrm,rrl,rru,plf,puf,pm } = exact2x2(a,b,c,d,names);
_rrt = 2; { rrm,rrl,rru,plf,puf,pm } = exact2x2(c,d,a,b,names);
end;
testing numerical limitations: _oflow = 740;
- exactpt:
-
Test program: data from Guess et al. AJE 1987, with added zero stratum:
output file = exactpt.out reset; _clevel = .95; _midp = 1; _maxit = 100;
let a = 1 2 1 1 2 0; let b = 0 1 0 2 0 0;
let t1 = 152 131 142 158 193 200; t1 = t1/100; let t0 = 1 1 1 1 1 1;
let names = "rect ca" "isoniazd" "age";
{ rdmh,dse,rrmh,rse,chi } = mhrate(a,b,t1,t0,names);
test upper M-H limit: _rrt = rrmh*exp(1.645*rse);
- exactsmr:
-
Test program: format 10,6; call exactsmr(10,1,"SMR"); end;
- expit:
-
This proc does the logistic transform on x
- expitg:
-
This proc returns the generalized logistic transform of x given s,
a matrix of nonzero proportions each row of which sums to less than 1
(useful for multinomial logistic modelling).
NOTE: x and s' must be conformable for x.*s'
- expreg:
-
This procedure does ML exponential-Poisson regression
- expregp:
-
This procedure does ML exponential Poisson regression with likelihood
penalized by a linear constraint b = z*bs, where bs is unknown.
- exprisk:
-
Test program (see also EXPREG.TST): output file = exprisk.out reset;
let y = 8 5 22 16; let n = 106 120 98 85; _clevel = .9;
let x[4,2] = 1 0 0 0 1 1 0 1; t2 = 1000000; let bnames = tolbutam agegt55;
let yname = death; call exprisk(x,y,n,0,1,0,bnames,yname);
call expriskp(x,y,n,0,0,t2,0,1,0,bnames,yname,0);
call exprisk(x~prodr(x),y,n,0,1,0,bnames|"tol*age",yname);
call expriskp(x~prodr(x),y,n,0,0,t2,0,1,0,bnames|"TOL*AGE",yname,0); end;
- expriskp:
-
This procedure does ML exponential risk regression with likelihood penalized
by a linear constraint b = z*bs, where bs is unknown.
- expspec:
-
This proc computes covariate-specific risk or rate ratios and differences
from exponential (log-linear) regression output.
- expstand:
-
This proc computes standardized rate ratios and differences
from exponential-regression output.
- firstdif:
-
This proc returns the matrix of first differences for the columns
of x. Note that the returned matrix has one less row than x.
- freq:
-
This proc does frequency counts on the columns of x. Column
proportions will also be printed if only 1 count per column is specified,
- freqmean:
-
This proc does frequency counts & group means on the columns of x.
- glml:
-
This proc does Maximum-Likelihood estimation via Newton-Raphson for
generalized-linear models. Separate record must be entered for each
observed regressor-outcome category
NOTE CAREFULLY: ll, dll1, and dll2
1) MUST be defined to handle vector inputs.
2) MUST be passed with the character "&" in front of their names.
ALSO: 3) The first two moments of the y distribution should exist.
4) Each record must represent an independent observation.
- groupdat:
-
- indc:
-
This proc takes a matrix x and returns a matrix
xind with the same number of rows, each column an indicator for whether
the x value in that row of col k equals the corresponding value in codes.
NINDC.G generates names for the columns of xind.
- indc2:
-
This proc takes a matrix x and returns a matrix
xind with the same number of rows, each column an indicator for whether
the x value in that row of col k equals a particular value seen in that
column. The indicators for a column are ordered to correspond to
ascending values in the column.
NINDC2.G generates names for the columns of xind.
- index:
-
Test program: let listn = age sex "sex" state race income;
- indinc:
-
This proc takes a matrix x and returns an incremental-indicator matrix
xind with the same number of rows as x, and with a block of columns
corresponding to each column of x. Each column in block k of xind indicates
whether the x[.,k] value in that row is greater than or equal to a unique
observed value of x[.,k]. The indicators for a column are ordered to
correspond to ascending values in the column.
NINDC2.G will generate names for the columns of xind.
- inter:
-
Test program:
x = seqa(1,1,12); let names = a;
x = reshape(x,2,6); let names = abcdef bcdef cdef def ef f;
xint = inter(x); namint = ninter(names);
format /rdn 10,0; x; xint; $namint';
end;
- invnorm:
-
Test: invnorm(.05|.95);
- invt:
-
Test program:
let p = .025 .5 .95 .975 .99 .995 .995 .975;
let df = 8 4 3 3 3 4 3 2;
t = invt(p,df); p~df~t~(1-cdftc(t,df)); end;
- kapmeier:
-
This proc computes the Kaplan-Meier survival probability for the times t
without ordering the times.
- kapmu:
-
This proc computes the Kaplan-Meier survival probability for the times t
without ordering the times.
- kernmean:
-
This proc computes kernel (running weighted) means using the Epanechnikov
kernel function.
- liklim:
-
This proc searches for LR limits.
- liklimp:
-
This proc searches for penalized LR limits.
CURRENTLY WORKS ONLY FOR REGRESSORS WITH INDEPENDENT OR FLAT PRIOR
Will not work for those regressors penalized by a linear constraint
b = z*bs, where bs is unknown.
- logit:
-
This proc takes the logit of a number p between 0 and 1
- logitg:
-
This proc returns the generalized logit transform of the matrix p
of nonzero proportions. Each row of p must sum to less than 1.
(useful for multinomial logistic modelling)
- logitr:
-
This proc calculates risks, risk ratios, and risk differences and
their confidence limits from logistic coefficients and their covariances,
for the indicated regressor vectors.
- logrank:
-
This proc does unweighted (1 df), Prentice-Wilcoxon (1 df),
and Robins (2 df) logrank trend tests on input data,
and returns the test scores and variances.
- logreg:
-
This procedure does ML logistic regression
- logregp:
-
This procedure does ML logistic regression with likelihood penalized
by a linear constraint b = z*bs, where bs is unknown.
- logregs:
-
This procedure does kernel-smoothed logistic regression via local likelihood
- lr2stage:
-
This procedure computes the Breslow-Cain pseudo-maximum
likelihood estimates for logistic regression analysis of two-stage
case-control data. Note that a constant term is required and
automatically added to the model. See Breslow, N.E. and Cain, K.C.
(1988), Biometrika, v. 75, p. 11-20.
- lrimp:
-
This procedure does unconditional logistic regression
on multiple-imputed data using Rubin's method.
A stack of multiple-imputed covariate matrices suitable for input to this
procedure (denoted by ximp below) may be generated from MNIMP.G.
- lrordco:
-
This procedure does ordinal ML logistic regression using the
cumulative-odds model.
- lrordcr:
-
This procedure does ordinal ML logistic regression using the
continuation-ratio model, via nested ordinary logistic regression.
- lrordst:
-
This procedure does ordinal ML logistic regression using the
stereotype model, via constrained polytomous logistic fitting.
With fixed scores, this is equivalent to the adjacent-category logit model.
WARNING: With estimated scores option (s = 0), convergence can be very slow
and ordinary standard error estimates are invalid. To address this
problem, a naive bootstrap option is provided (set nboot > 30).
- lrpm:
-
This procedure does ML logistic regression with likelihood penalized
by multiple linear constraints of the form b = z*bs, where bs is unknown.
- lrpmp:
-
This procedure does polytomous ML logistic regression with likelihood
penalized by MULTIPLE linear constraints of the form vec(b) = z*bs,
where bs is unknown and the constraints are independent.
- lrpoly:
-
This procedure does polytomous ML logistic regression
- lrpolyp:
-
This procedure does polytomous ML logistic regression with likelihood
penalized by a linear constraint vec(b) = z*bs, where bs is unknown.
- lrpquick:
-
This proc does penalized-likelihood logistic regression
- no outputs or special options - returns only b,bcov,bs,vs,t2,cnv.
- lrpseudo:
-
This procedure does logistic regression via the Flanders-Greenland
pseudo-likelihood (Stat Med 1991) for nested and two-stage studies.
- lrquick:
-
This proc computes ML binomial-logistic regression coefficients
- no outputs or special options - returns only b, bcov, cnv.
- lrquickl:
-
This proc computes ML binomial-logistic regression coefficients
- no outputs or special options - returns only b, bcov, llik, cnv.
- lrspec:
-
This proc computes covariate-specific proportion ratios and differences
from logistic-regression output.
- lrstand:
-
This proc computes standardized proportion ratios and differences
from logistic-regression output.
- lrtpq:
-
This proc computes truncated-Poisson logistic regression coefficients
- no outputs or special options - returns only b, bcov, cnv.
- matcol:
-
Test program: let dat[6,3] = 0 0 3 4 1 1 0 0 3 0 0 3 4 1 1 1 0 3;
let rep = 1 1 2 1 8 1; gdat = matcol(dat~rep,1|2|3);
format /rds 5,0; dat~rep; gdat;
output file = temp.out reset; load dat[1633,10] = \gauss\dat\gedi.asc;
t0 = date; gdat = matcol(dat,0);
"Total run time matcol: ";; etstr(ethsec(t0,date)); sumc(gdat[.,cols(gdat)]);
format /rds 5,0; gdat;
end;
- max:
-
This proc returns the elementwise maximum of x and y
- maxr:
-
This proc returns the maximum in each row of x
- mclimits:
-
This procedure computes 95% limits, medians, and two-sided p values
from a Monte-Carlo distribution.
- meancw:
-
This proc returns the w-weighted column means of the input matrix x.
w may have either 1 column, for equal weights across columns of x,
or cols(x) columns, for differing weights across columns of x.
- meancwts:
-
This proc returns the w-weighted column means of the input matrix x.
w may have either 1 column, for equal weights across columns of x,
or cols(x) columns, for differing weights across columns of x.
- medianc:
-
This procedure finds the medians of the columns of x.
Runs in about 76% the time of median(x).
- metanal:
-
This proc does basic unstratfied meta-analysis -
see METAREG.G for meta-regression.
- metanalc:
-
This proc does basic unstratfied meta-analysis for a continuous outcome -
see METAREG.G for meta-regression.
- metareg:
-
This program does log-linear weighted relative-risk regression,
both fixed and random intercept ("random effects"),
customized for meta-analyses of relative risks.
- mexpand:
-
Test program:
x = seqa(1,1,4); m = rev(x); x = x~(x+4); mexpand(x,m); end;
- mhclr:
-
This procedure fits conditional logistic models by the estimating-equation
(generalized Mantel-Haenszel) method of Liang (Biometrics, June 1987, 289-299).
Calls: mlogreg.g, abbreviated logistic regression proc.
- mhrate:
-
This proc does Mantel-Haenszel analysis of person-time rates.
- mhrisk:
-
This proc does Mantel-Haenszel analysis of average risks
(incidence proportions) in cohort data.
- mhtrend:
-
This proc computes the Mantel-Haenszel trend statistic for 2xJxK arrays.
- middle:
-
This proc finds the middle of the three values in each row of x~y~z,
where x,y, & z are vectors. Runs in under 7% the time of median(x).
- min:
-
This proc returns the elementwise minimum of x and y
- minr:
-
This proc returns the minimum in each row of x
- misclass:
-
This proc does sensitivity analysis of the impact of misclassification
of columns (within rows) on the Mantel-Haenszel odds ratio for
K 2x2 tables, allowing for possible differential errors across rows.
- mnimp:
-
This proc imputes values for those missing in each column of a matrix xm
by randomly sampling values from a fitted multivariate normal regression
model for xm, using any complete variables xc as the regressors. To provide
reasonably fast computing times, the missing-data means are fit by a
noniterative least squares method rather than by maximum likelihood.
- mpsumc:
-
This proc partially sums the columns of matrix x
according to the sequence given in m.
- mscore:
-
This proc computes the conditional logistic deviance, score,
and information at b for matched-set case-control sum differences xs.
- negpart:
-
This proc returns the negative part of the elements of the matrix x.
- negpower:
-
test program: negpower(-(2|8|4),0|(1/3)|(-1/2));
- nindc:
-
Test program:
- nindc2:
-
This proc takes the names "names" of columns of a matrix x and returns
a vector iname of names for the columns of the indicator matrix for
x, naming them names1,names2,etc., where names gets truncated at the
seventh character. Companion to INDC2.G.
- ninter:
-
This proc creates names for interaction (product) terms for regressions.
Companion to INTER.G.
- ninter2:
-
This proc creates names for interaction (product) terms for regressions
for two separate sets of regressors cross-multiplied, i.e., for x1*~x2.
- obstime:
-
This proc returns the time observed in each interval on each subject, where
- ormh2x2:
-
This proc does Mantel-Haenszel OR analysis of a series of 2-by-2 tables,
provided there are no zero cells in the crude table.
- ormhrxc:
-
This proc does Mantel-Haenszel and exact
odds-ratio analysis of a series of r-by-c tables,
treating the first row and column as the reference levels
(reference levels may be changed by setting globals; see below)
- orst2x2:
-
This proc does standardized OR analysis of a series of 2-by-2 tables,
using the total population distribution (as imputed from the cases)
as standard, provided there are no zero cells in the strata.
Not recommended if there are many small cells within strata.
- orstrxc:
-
This proc does standardized odds-ratio analysis of series of r-by-c tables,
standardized to the total population distribution (as imputed from cases),
treating the first row and column as the reference levels
(reference levels may be changed by setting globals; see below)
- pbinf:
-
This proc computes the exact probability of a binomial(p,n) outcome k
using the F distribution (see, e.g., Ling, Amer Statist 1992;46:53-54)
- pbinl:
-
Test program: format 9,4; t = 10000; n = 4*ones(3*t,1);
let k = 0 2 4; k = k.*.ones(t,1);
let p = 0 .5 1; p = ones(t,1).*.p;
t0 = date; call pbinl(k,p,n);
"Total run time pbinl: ";; etstr(ethsec(t0,date));
t0 = date; call pbinf(k,p,n);
"Total run time pbinf: ";; etstr(ethsec(t0,date));
k~p~n~pbinl(k,p,n)~pbinf(k,p,n);
- pctile:
-
Test program: x = 100*rndu(11,2); c = pctile(x,3|4);
format /rds 9,3; c; sortc(x[.,1],1)~sortc(x[.,2],1); end;
- pearson:
-
This procedure computes the Pearson chi-squared for a 2-way table.
- permuter:
-
Test program: x = seqa(.1,.1,20); format /rdn 6,1;
- phyperg:
-
Test program:
- phypnull:
-
This proc computes the null hypergeometric probabilities of
the 2x2 tables that are the rows of t:
- pospart:
-
This proc returns the positive part of the elements of x.
- powbin:
-
Test (should yield .95,.95): powbin(.25,.4,.01,530|357,265|357)';
- ppois:
-
Test program: ppois(0,1); ppois(1,0); ppois(0,0); end;
- prodr:
-
This proc computes the column vector of the products across rows of x
- quadsoln:
-
This proc solves the quadratic equation a*x^2 + b*x + c = 0.
Returns the roots (which may be complex) in a two-column matrix
(one row for each equation).
- rangec:
-
This proc returns a column vector r
containing the ranges of each column of the input matrix x
- rangechk:
-
This proc checks that range spanned by each column j of x is within
the range of the corresponding column of r (or the single column of r).
- ratetab:
-
This proc crosstabulates the ratios sumc(y)'/sumc(d) within subcategories of
the variables in x.
- readme:
-
THE *.G PROCEDURES FROM S. GREENLAND
13 Feb 1998
This is a set of over 150 GAUSS 3.2 procedures for statisical
analysis and stat programming. The procedures range from one-line
language extensions (such as MAX.G, MIN.G, & SUMR.G) that simplify
coding, to full programs for nonlinear regression models, including
a few (such as stereotype logistic regression, LRORDST.G; logistic
regression for two-stage and nested studies, LR2STAGE.G and LRPSEUDO.G;
and penalized-likelihood regression, EXPREGP.G, EXPRISKP.G, LOGREGP.G,
LRPM.G, LRPOLYP.G, and LRPMP.G) that are currently not available in
standard packages.
EACH PROCEDURE FILE BEGINS WITH A PROCEDURE DESCRIPTION. READ IT
TO FIND OUT HOW TO USE THE PROCEDURE. Read the procedure code to see
how it works.
This procedure set is experimental and is under continuous
development. Copies or updates may be obtained by sending an MS-DOS
formatted diskette in a STAMPED SELF-ADDRESSED DISK MAILER, with
a cover letter making your request and giving your title & address,
to me: Sander Greenland
22333 Swenson Dr.
Topanga, CA 90290
(310) 455-1197 voice, (310) 455-1428 fax
Incomplete requests will be ignored. If you use these procedures
for more than a year, an update is strongly recommended.
If you use or examine them enough, you will undoubtedly discover
bugs, stat errors, and programming inefficiencies in the procedures.
I would greatly appreciate being notified of anything you find, and
would welcome suggestions for improving the code or set. In coding
these programs, I have tried to strike a balance between transparency
and efficiency, but in some cases may be far from both. Because the
regression procedures are partly for instructional purposes, however,
I have used only basic nonlinear algorithms.
REQUIREMENTS:
You must have a licensed copy of GAUSS 3.x and run the procedures
from a GAUSS program or the GAUSS interpreter. This means that you must
know how to read and manipulate your data and call procedures in GAUSS.
A math coprocessor is required. You may encounter problems if your
GAUSS version is before 3.2 and your CPU is a x86SL or x86SLC, as found
in some laptops and a few rare desktops.
INSTALLATION:
Currently, you have at least two options for mounting the set:
Option 1: Create a subdirectory of your GAUSS directory called
(for example) PRG. Copy the entire set into that subdirectory. Then
use a text editor to modify the GAUSS configuration program
GAUSSI.CFG in the GAUSS directory as follows:
Change the line
src_path = $(GAUSSDIR)\src
to
src_path = $(GAUSSDIR)\src;$(GAUSSDIR)\prg
YOU MUST DO THIS EVERY TIME YOU INSTALL GAUSS OR A GAUSS UPDATE.
Option 2: Load the set into the SRC subdirectory.
BUG IN GAUSS 3.2 COMPILER:
There is a bug in the Gauss compiler which causes complilation to
terminate with a "Wrong number of returns" error when calling certain
*.G procedures under certain conditions, especially DIM.G, INVNORM.G,
and SPLINE.G. If this should happen, simply append the cited procedure
to the end of the calling program.
WARNINGS:
Although the procedures are simple ASCII text files, practice
safe file transfer protocol: virus scan the disk before installing.
DISCLAIMER:
This procedure set is supplied with absolutely no warranties
expressed or implied. Sander Greenland accepts no responsibility
for any damages arising from its use.
- reganova:
-
This proc produces a matrix, a, which converts a coefficient vector beta
wtih regression-coded indicators (j-1 variables coded 0,1 for j categories)
to a coefficent vector alpha with ANOVA dummy variables (j variables coded
-1,1 for j categories); also supplied is a matrix, b, which converts alpha
back to beta.
a and b are conditional inverses of one another, i.e.,
a*b*a = a and b*a*b = b,
but are not Moore-Penrose inverses of one another (a*b is not symmetric).
NOTES: 1. Indicator groups MUST be contiguous in beta!
2. Indicator groups MUST form the terminal segment of beta!
3. beta and alpha MUST have intercept in first place!
- regcurve:
-
This proc computes and (if reqested) graphs the fitted values and their
confidence limits for the given regressor values.
- restrict:
-
This proc restricts (truncates) a vector x by lb below and ub above.
lb and ub may be scalar or vector of same dimension as x.
- rlsmooth:
-
This proc computes the smoothed running-weighted linear regression of y on x.
- rndbern:
-
This procedure generates a matrix of Bernoulli
variates with probabilities p
- rndbin:
-
This procedure generates a vector y of binomial
variates with probabilities p and totals n
- rndchi:
-
This proc generates chi-squared variates by adding gamma and
Gaussian variates.
- rnddisc:
-
This program generates discrete variates
with distributions (1-sumr(p))~p over the range 0 to cols(p)
- rndgami:
-
This program generates gamma variates for integral values of the shape
parameter and unit values of the scale parameter by summing exponential
variates. Not recommended for large values of a. See Devroye, 1986, p. 405.
NOTE: To generate a gamma(a + 1/2,1) variate, add a chi-squared(1)/2 variate
to a gamma(a,1) variate, or use a chi-squared(2a+1)/2 variate.
- rndmn:
-
Test program (should print back a number near r):
- rndmult2:
-
This program generates a multinomial variate
with distribution p and total n.
Faster than rndmultn.g when n > 55.
- rndmultn:
-
This procedure generates a multinomial variate
with distribution p and total m.
Faster than rndmult2 when n < 55.
- rndpois:
-
This procedure generates a vector y of Poisson variates with means mu
-- truncation is done at far right tail
-- efficient only if all values of mu are no too large (under 10)
- rndsrep:
-
Test program:
t0 = date;
i = 0; t = 400; p = 0; r = 400; rt = r*t;
do until i ge t; i = i + 1;
p = p + rows(unique(rndsrep(r,r),1))/rt;
endo;
"Total run time: ";; etstr(ethsec(t0,date));
format /rdn 7,4; "Average proportion selected:";; p;
"(stochastically converges to ";; 1-exp(-1);; " as t and r increase)";
end;
- roc:
-
test program: let y = 1 1 1 1 1 0 0 0 0 0;
let py = .7 .3 .5 .5 .4 .5 .4 .6 .2 .1;
_ni = 10; call roc(y,py,"y",1); end;
load dat[130,8] = \gauss\prg\testdat\rosm3.dat;
y = dat[.,1:4]; py = dat[.,8 5:7]; let names = g c2 c5 c11;
i = 2; _ni = 60; call roc(y[.,i],py[.,1],names[i],0); end;
- rree:
-
This procedure fits multiplicatve risk-ratio and odds-ratio models to
matched sets using the Mantel-Haenszel estimating equations of
Greenland (Applied Statistics,1994), with Gauss-Newton fitting.
- rreport:
-
This proc generates report of results of RR regression.
NOTE: Returned P-values & confidence limits are from normal distribution
unless df is negative, in which case a t distribution with -df
degrees of freedom is used.
- scoreint:
-
This procedure checks all interactions among covariates with indices
indint.
- scoretst:
-
This proc computes the score test for columns of x with coefficients
constrained to zero in a generalized linear model linear in the natural
parameter, such as binomial-logistic and Poisson-exponential models.
- seleltc:
-
Test program:
let s[3,2] = 1 2 2 3 2 1;
x = reshape(seqa(0,1,9),3,3); format /rdn 10,0; x; s;
seleltc(reshape(seqa(0,1,9),3,3),s);
end;
- sign:
-
This proc returns -1 for negative arguments, 1 for positive arguments,
0 for zero arguments.
- sortonc:
-
Test program: n = 20; x = ceil(10*rndu(2*n,1));
v = permuter(unique(x,1)); v = v[1:5]|100|v[6:10];
xs = sortonc(x,1,v); format /rds 5,0;
x[1:n]~xs[1:n]~x[n+1:2*n]~xs[n+1:2*n]; v'; end;
- spline:
-
This proc creates regression-spline variables (excluding constant).
- ssbin:
-
Test (should yield 530,357): ssbin(.25,.4,.01,.95,.5|1)';
- stepfun:
-
Test program: b = seqa(0,1,10); h = 10*rndu(9,2);
{ x,y } = stepfun(b,h); library pgraph; xy(x,y); end;
- subst:
-
Test program:
format /rdn 7,0; v = seqa(1,1,3);
let s[3,2] = 1 2 2 3 3 4;
subst(-ones(3,4),s[.,.],v);
end;
- sumall:
-
This procedure sums all elements of a matrix
- sumcomb:
-
Test program:
let x[5,2] = 1 2 3 4 5 6 7 8 9 10 ; x;
print; xc = sumcomb(x,2); xc;
end;
- sumr:
-
This proc sums the rows of x
- tracemat:
-
This procedure computes the trace of a matrix
- vecomb:
-
Test program: format /rdn 6,0; vecomb(seqa(1,1,5),3); end;
- wls:
-
This procedure does weighted-least squares regression with an
information-theory covariance estimate (w assumed to be inverse
covariance matrix or vector of inverse variances for y).
NOTE: OLS estimates can be obtained using w = 1, in which case bcov
will be premultiplied by rss/df, the residual mean square, and
a t distribution will be used for p-values & confidence limits.
- wlsd:
- (missing)
- wlsquick:
-
This proc computes the WLS coefficients and
inverse sum-of-covariate-cross-products matrix from x,y,w.
- ypred:
-
Test program: let x = -1 0 1; let y = -3 1 2; let w = 1 2 1;
let rep = 20 10 20; rep = 1; const = 1; x = x~(x^2);
{ b,bcov,rss,rsq,df } = wls(x,y,w,rep,const,"x"|"x^2","y");
xf = 2*sortc(rndu(50,1),1) - 1; xf = xf~(xf^2); _pdate = 0;
r = 1.96; call ypred(xf,b,bcov,const,r,0,2,"x","y");
end;