
CHEMSPREAD
CHEMSPREAD is a spread-sheet-type statistical analysis program designed
for correlation of numerical data associated with chemical structures.
When applied to the results of calculations using MOPAC, special output
files (type *.pro) produced by the latter program are passed through
the auxiliary program SPREAD to produce an input file for CHEMSPREAD
(file type *.prs).
-
DISPLAY FORMAT. The program as supplied is capable of handing up to 200
properties (columns) associated with up to 200 structures (rows), but
can easily be re-dimensioned to accomodate larger or smaller problems.
At any one time up to seven data columns (plus a column of compound
identifiers) and up to 41 rows (plus column headings) may be displayed.
Scrolling up/down and left/right reveals all of the data entries .
-
PROGRAM CONTROL.
-
This is achieved using a permanent menu column together with pop-up
menus.
-
BASIC FACILITIES.
-
Input and output of data sets in three formats.
-
EXCEL format
-
ARTHUR format
-
SPREAD format
-
Facilities to add single rows in SPREAD format
-
Provision to allocate a column as a "non-numeric" classifier
(category)
-
Move row
-
Move column
-
Delete columns
-
Delete rows
-
Delete all marked columns and/or rows
-
Sort data entries (on basis of the values in a column)
-
Display of data
-
Original data
-
Normalised data
-
Mean-centred data
-
Auto-scaled data
-
Display column statistics (maximum, mimimum, range, mean, standard
deviation, second moment, third moment, fourth moment, skewness,
kurtosis).
-
Display column correlations
-
Display column covariance
-
Display row-row cartesian distances
-
2D plots of pairs of columns of data
-
3D plots of triplets of columns of data
-
DATA PRUNING
-
The data may be analysed to achieve a reduction in the number of
columns by several processes:-
-
Eliminate columns with high Kurtosis
-
Eliminate columns with zero range
-
Eliminate columns very highly correlated to other columns
-
Progressive elimination of columns which are highly correlated to other
columns
-
Columns are initially "marked for deletion" and appear in the
display "crossed out"
-
Subsequently marked columns are deleted
-
PRINCIPAL COMPONENT ANALYSIS
-
Standard principal component analysis techniques can be used
-
Choice of correlation matrix or covariance matrix
-
Display communalities
-
Varimax rotation
-
Orthogonalisation of rotated matrix
-
2D plot of pairs of residual eigenvectors
-
3D plot of triplets of residual eigenvectors
-
Use of Non-linear Partial Least Squares method (NIPALS)
-
Choice of autoscaled or mean-centred data matrix
-
COMPOUND CLUSTERING
-
Hierarchical clustering - choice of:-
-
Cartesian distance matrix,
-
Tanimoto coefficient matrix
-
Cosine coefficient matrix
-
Dendrogram construction
-
Jarvis-Patrick clustering
-
Reciprocal-nearest-neighbour clustering
-
K-means clustering
-
NON-LINEAR MAPPING
-
Reduce dimensionality of data to 2- or 3-dimensions
-
-
AUXILIARY PROGRAM
-
The program SPREAD is supplied to facilitate analysis of results
produced by MOPAC
Back to INTERPROBE Home Page
Revised 18th June 2007