Modules in Package SAS
Package SAS (Installed on ITL)
- ACECLUS
- Obtains approximate estimates of the pooled within-cluster covariance matrix when the clusters can be assumed multivariate normal with equal covariance matrices. Neither cluster membership nor the number of clusters need to be known. Options: weights, missing values.
- ANOVA
- Performs univariate and multivariate analysis of variance for balanced data, including Latin-square, certain balanced incomplete block designs, completely nested (hierarchical) designs. Options: numerous means comparisons, missing values.
- CANCORR
- Performs canonical correlation and tests correlation hypotheses using an F approximation. Both standardized and unstandardized canonical coefficients and correlations between canonical variable and the original variables are produced. Options: canonical redundancy analysis, partial canonical correlation, weights, output data sets of scores on each canonical variable and canonical coefficients.
- CANDISC
- Performs a canonical discriminant analysis, computes Mahalanobis distances, and does both univariate and multivariate one-way analyses of variance. Tests zero correlations using an F approximation. Options: weights, missing values.
- CATMOD
- Analyzes two-dimensional contingency tables by fitting linear models to functions of response frequencies using a maximum-likelihood estimation of parameters for log-linear models and the analysis of generalized logits, or using a weighted-least-squares estimation of parameters for general linear models. Options: weights, parameter testing.
- CHART
- Produces line printer vertical and horizontal bar charts (histograms), X-Y-Z block charts, pie charts, star charts, frequency and cumulative frequency plots.
- CLUSTER
- Hierarchically clusters observations by one of eleven procedures (standard linkage methods, density linkage (including kth-nearest-neighbor and two-stage), and maximum-likelihood for mixtures of spherical multivariate normal distributions). Input data can be either coordinates or distances. Options: trimming input data, missing values.
- CORR
- Computes correlation coefficients between variables, including Pearson product-moment and weighted product-moment correlations. Can also compute Spearman''s rank-order correlation, Kendall''s tau-b, and Hoeffding''s measurement of dependence. Options: some univariate descriptive statistics, missing values.
- DISCRIM
- Computes linear or quadratic discriminant functions for classifying observations into two or more groups. The distribution within each group should be approximately multivariate normal. The classification criterion can be based on either the individual within-group covariance matrices or the pooled covariance matrix. Options: homogeneity of the within-group covariance test, missing values.
- FACTOR
- Performs several types of common factor and component analysis for multivariate data, a correlation matrix, a covariance matrix, a factor pattern, or a matrix of scoring coefficients. A variety of methods are available for extracting factors, for prior communality estimation, and for rotation. Options: weights, factor scores.
- FASTCLUS
- Performs a disjoint cluster analysis by minimizing the sum of squared distances from the cluster means. User specifies the maximum number of clusters and optionally, the minimum radius of the clusters. Designed for use with large data sets. Options: weights, missing values.
- FREQ
- Builds frequency or crosstabulation tables for one-way to n-way categorical data. Can compute tests and measures of association for two-way tables and can do stratified analysis and compute statistics within as well as across strata for n-way tables. Options: missing values, weights, additional analysis.
- GLM
- Performs simple and multiple least-squares regression, analysis of variance (especially for unbalanced data), analysis of covariance, response-surface regression, polynomial regression, partial correlation, multivariate analysis of variance, and repeated measures analysis of variance. Options: weights, missing values.
- LIFEREG
- Fits parametric models to failure-time data that may be right censored. Models include exponential, Weibull, log normal, and log logistic. Parameters are estimated by maximum likelihood using a Newton-Raphson algorithm. Independent variable may be continuous or discrete.
- LIFETEST
- Computes nonparametric estimates of the survival distribution (by the product limit method or the life table method) and computes rank tests for association of the response variable with covariates for stratified data that may be right censored. Options: tests homogeneity between strata, missing values, printer plots.
- MEANS
- Produces univariate descriptive statistics for numeric variables in an entire data set or for groups of observations in the data set. Options: weights, missing values.
- NEIGHBOR
- Performs a nearest neighbor discriminant analysis, classifying observations into groups according to either the nearest neighbor rule or the k-nearest-neighbor when the classes do not have multivariate normal distributions. Proximity is determined by either Mahalanobis or Euclidean distances. Options: use of prior probabilities in the classification, missing values.
- NESTED
- Performs analysis of variance and analysis of covariance for nested random designs. Especially good for designs involving large numbers of classification levels and observations. The data set must be sorted by the classification variables (assumed to form a nested set of effects).
- NLIN
- Performs nonlinear least-squares regression using one of four iterative methods (modified Gauss-Newton, Marquardt, gradient, or steepest-descent, and multivariate secant or false position (DUD). User provides starting values for the parameters, and derivatives of the model for all but the DUD method. Options: weights, bounds on the parameter estimates, objective function to be minimized, grid search for starting values.
- NPAR1WAY
- Performs nonparametric one-way analysis of variance on ranks and four rank scores (Wilcoxon, median, van der Waerden, and Savage).
- PLAN
- Generates random permutations of positive integers for experimental plans (e.g., completely random, split-plot, and hierarchical designs) given a specification of the randomized plan including number of levels of nesting. Option: seed of first permutation.
- PLOT
- Produces a Y vs. X line printer scatterplot, a superimposed plot, or a contour plot. Options: missing values, user control of plot features.
- PRINCOMP
- Performs principle component analysis on raw data, a correlation matrix, or a covariance matrix. Options: weights, missing values.
- PROBIT
- Calculates maximum-likelihood estimates of the intercept, slope and natural (threshold) response rate for biological assay data using a modified Gauss-Newton algorithm.
- RANK
- Computes ranks for one or more numeric variables across the observations of a data set. Options: group continuous data into ranges, fractional ranks, normal scores (Blom, Tukey, or van der Waerden), Savage (exponential) scores.
- REG
- Fits least-squares estimates to linear regression models. Options: weights; parameter estimates, predicted values, residuals, Studentized residuals, confidence limits, hypothesis tests; collinearity diagnostics; influence diagnostics including partial regression leverage plots; Durbin-Watson statistic; hypothesis tests involving multiple dependent variables; parameter estimates subject to linear restriction.
- RSQUARE
- Uses the R-squared statistic to select optimal subsets of independent variables for multiple regression. Can specify largest and smallest number of independent variables for a subset and number of subsets of each size. Options: weights; statistics for each model selected including Akaike''s information criterion, Mallows'' C-p, and others.
- RSREG
- Estimates a quadratic response surface using least-squares regression and determines critical values to optimize the response. Options. weights, lack of fit test, surface plotting, eigenvalues of the associated quadratic form.
- SORT
- Sorts observations by one or more variables in ascending or descending order. Options: handle duplicate records, maintain order of observations with identical values of the sorting variables, several collating sequences.
- STANDARD
- Standardizes some or all of the variables in a data set to a given mean and standard deviation. Options: weights, replace missing values with variable mean.
- STEPDISC
- Performs a stepwise discriminant analysis by forward selection, backward elimination, or stepwise selection of variables. The classes are assumed to be multivariate normal with a common covariance matrix. Options: weights, missing values.
- STEPWISE
- Provides five methods (forward selection, backward elimination, stepwise, maximum and minimum R-squared improvements) for stepwise regression. Recommends twenty independent variables. Options: weights, form of input, significance levels, Mallows'' C-p statistic.
- SUMMARY
- Produces univariate descriptive statistics for numeric variables. Options: weights, missing values.
- TABULATE
- Produces hierarchical tables of descriptive statistics from compositions of classification variables, analysis variables, and statistics keywords. Options: weights, missing values.
- TIMEPLOT
- Produces a line printer plot of one or more variables over time intervals. Options: missing values, user control of plot features.
- TREE
- Prints a tree diagram from the output generated by SAS procedures CLUSTER or VARCLUS. Can also create an output data set identifying disjoint clusters at a specified level in the tree. Optional user control of plot features.
- TTEST
- Computes a t statistic for testing the hypothesis that the means of two groups of observations are equal.
- UNIVARIATE
- Produces simple descriptive statistics for numeric variables including extreme values and quantiles. Options: distribution plots, frequency table, missing values, weights, normality tests.
- VARCLUS
- Performs either disjoint or hierarchical clustering of variables by maximizing the variation accounted for by either the first principal component or the centroid component of each cluster. Options: weights, missing values.
- VARCOMP
- Provides four methods (Type I, MIVQUEO, maximum-likelihood, and restricted maximum-likelihood) for estimating variance components in a general linear model containing random effects and optionally fixed effects. Option: missing values.