Author:
D. BairdDataLoad - Translates Spreadsheet/Database files into Genstat/Gauss
format.
Copyright D. Baird, AgResearch, PO Box 60, Lincoln, New Zealand 1998-2001.
DataLoad - Translates Spreadsheet/Database files into InStat/GenStat format. Copyright D. Baird, AgResearch, PO Box 60, Lincoln, New Zealand 1998-2001. David.Baird@AgResearch.CO.NZ DataLoad is provided in the public domain as freeware and carries no warranties for its usage. Use it at your own risk. For any bugs/comments email David Baird at David.Baird@AgResearch.CO.NZ. DataLoad can be used with the non Windows version of GenStat by using the GEN output format which creates file containing GenStat directives which any version of GenStat can run. The default version of DataLoad runs only under Windows 95/NT and can handle up to 32,000,000 data items. In spreadsheet files, the data extracted are labels, numerical values and the results from formulae. A label of * in otherwise numerical column is taken as a missing value. Empty cells are taken as missing values. Empty rows at the start and end of a block are trimmed, but empty rows in the middle of a block are kept. Empty columns are also ignored. A single row of labels as the first non-empty row in the block will be taken as column names. Empty cells in this row will have a default name C1, C2 etc assigned. A column name ending with an exclamation mark ! will be converted to a factor in InStat/GenStat. In the GenStat procedure, the type of a column can be forced by providing a !,# or $ character on the end of the text items provided in the column names. Supported File Types: Excel 2-5,95,97,2000, Lotus WK1, Quattro WQ1,WB*, dBase 2-5, Paradox 3-5, GenStat GSH, SAS PC 6.03-12, Minitab 8-13, Systat, MStat, Instat, Epi-Info, SPSS/Win, Gauss Data/Matrix (PC/Win/Unix), MatLab, S+ (PC/Unix/Text) Stata 4-7, SAS Transport, Logbook Excel format, ArcView/Info Shapefiles, MapInfo Interchange (MIF), CSV/Tab/space delimited text files, Windows BMP bit mapped graphics files and Windows WAV sound files. The file type is worked out from the contents, so the usual extension need not be used with the exception of the following file types which do not contain a unique signature: Epi-Info (.REC), S+ Text (.SDD) and Paradox (.DB). Any files not containing a unique file signature, but ending in these extensions classified as above. DataLoad can save the output file in various formats. The format is taken from the output file extension: .GSH - a GenStat spreadsheet file. .GEN - a GenStat command file. .XLS - an Excel 95 spreadsheet file. .WQ1 - a Quattro spreadsheet file. .DBF - a dBase III worksheet file. .WOR - an Instat+ worksheet file. .FMT - a Gauss 86 matrix file. .MAT - a Matlab 4 matrix file. .SDD - a SPlus data frame file. .TPT - a SAS Transport file. .TXT - a space delimited text file. .CSV - a comma delimited text file. .TAB - a tab delimited text file. .RTF - a rich text format file (Word). .HTM - a hypertext markup format file (Internet viewable). .HTML - a hypertext markup format file (Internet viewable). Usage: DataLoad Inputfile/options Outputfile/optionsOptions must each be preceded with a / (i.e. /iv should be /i/v) and must not be separated from the filename by spaces (i.e. file.xls/w not file.xls /w). Inputfile - name of file to extract data from. /f An option of /f can be added to Instat filenames to not output factors with more than 223 levels. Otherwise the if the factor has labels, levels greater than 233 will be set to missing, or else the factor will be written as a variate. /i An option of /i can be added to SPSS/Win filenames to ignore SPSS missing value codes. /l An option of /l on an Excel file specifies that the Logbook conventions should be used in reading the data. The column_row range is found and used to extract the data. If multiple column_row ranges are contained in the spreadsheet, the data can be extracted from subsequent ranges by adding the range number, i.e. /l= , eg /l=3 for the third column_row range. /m= An option of /m can be added to filenames to specify the three Instat missing value codes, which are supplied in a comma delimited list. A asterisk can be used for the generic missing value indicator. E.g. /m=*,-99,-9999 specifies the three Instat missing values will be set as -1e309, -99, -999 if a Instat work sheet is saved, and if an Instat worksheet is being read, that the three Instat missing values are converted to these values /n No column name row /p= For .GEN output a /p= can be added to create a pointer with the given name pointing to all columns read in. /v and /p= are mutually exclusive, but /i can be used with either. /t= Strictness of text conversions to numerical values: 1 = Strict (label must be an exact representation of a number) 2 = Single (a single letter commonly entered as a number will be converted to a digit (o,O = 0, i,I,l,L = 1, z,Z = 2 s,S = 5, , = .) 3 = Common (any of the letters commonly entered as a number will be converted to a digit (as above) 4 = Standard a numerical value, after making common substitutions will be read from the label up to the first non numerical character. 5 = Lax - Any numerical values will be read from the string so that 'A1B2' would be read as 12. A file TxtToNum.ERR will be produced that list all text to numerical conversions made. /u For an Excel or Quattro file, the option /u produces a unsorted factor levels or labels. The factor labels/levels are put into the order that they are first met in the data. /v An option of /v can be added to take variable names from a .VAR file, i.e. for test.xls, a file test.var will contain the variable names, one per line. The column type can be forced by providing a single character code (!,#,$ for factor, variate and text respectively) at the end of the supplied name. If this file does not exist it will be created. /w For an Excel or Quattro file, the option /w produces a print out of the sheet names and named ranges and stops, producing no output file. /x= An option of /x=string can be added to interpret the string as as missing value in Excel. The default string is *. For example /x=M will interpret a label of M in a cell as a missing value. Outputfile - name of file to write data to - extension determines file type: .GSH - GenStat spreadsheet file loaded with SPLOAD in a batch file or File>Open in the Windows interface. .GEN - GenStat command file loaded with INPUT. .FMT - Gauss Matrix File (no text columns output). .XLS - Excel 5 Spreadsheet file or Logbook file. .WQ1 - Quattro Spreadsheet file. .DBF - dBase 3 database file. .MAT - Matlab matrix file. .WOR - Instat+ worksheet. .SDD - SPlus transport file. .TPT - SAS transport file. .TXT - ASCII Text file (space delimited). .CSV - ASCII CSV comma delimited text file. .TAB - ASCII tab delimited text file. .RTF - a rich text format file (Word). .HTM - a hypertext markup format file (Internet viewable). .HTML - a hypertext markup format file (Internet viewable). /* For a GEN output file, the option /* stops the standard RETURN statement being added at the end of the file. /b The option /b produces an ENDBREAK statement at the end of a GEN output file rather than the standard RETURN statement. Note filenames containing the text G42D are reserved for temporary file type in conjunction with the GenStat procedure DATALOAD. /c The option /c produces combines colours from an BMP file into a single RGB value (red + 256*green + 256*256*blue) or stereo channels in a WAV sound file into a average mono channel. /l= The number of layout factors in the logbook format. Eg /l=3 specifies the first two factors in the data are layout factors. Subsequent factors will be marked as treatments. /p The option /p is reserved also for use by the GenStat procedure. /q No data summaries produced in running *.GEN file /s An option of /s can be added to produce suffixed identifiers i.e. C[1] C[2] etc where columns are not named. /v= An option of /v=1 or /v=2 or /v=3 can be added to define the version of GSH spreadsheet file produced. (Default version 3). /- Don't quote text in CSV file. Sheet - optional name of worksheet or named range within a spreadsheet file to take data from (enclose names containing spaces in quotes eg "Sheet One" (Windows 95/NT only I think). Alternatively, a sheet number can be used i.e. 1,2,3... The default is the first worksheet in the file. Range - optional range of cells within a worksheet to take data from eg B3:D12 or AB2:AZ1201. If you do not know the last cell just enter an extreme value for the address, eg, IV64000. Just the column names can be entered to select all data in a column i.e. B:D will extract all data in columns B, C & D. The default is all valid data cells on the worksheet. The cell addresses in the range can leave out the row or column components, in which case the maximal range will be used, eg: B:Z, 3:102 are valid ranges equivalent to B1:Z64000 and A3:IV102 respectively. Also just the starting cell can be provided (eg D48 which is equivalent to D48:IV64000) Examples: DataLoad Test.DBF Test.GSH/s/v=1 (suffixed identifiers C[1] in V1 format) DataLoad Test.XLS Test.GEN Sheet1 DataLoad Test.XLS Test.GEN 2 (i.e. take the second sheet) DataLoad Test.WB2 Test.GSH "Sheet One" B2:AA1001 DataLoad Test.XLS Test.GEN DataRange (a named range) DataLoad Test.WB2 Test.TXT A C1:K100 DataLoad Test.SAV/i Test.GEN (SPSS Missing codes ignored) DataLoad Test.SAV/v Test.GEN (Take column names from TEST.VAR) DataLoad Test.SAV/p=Data Test.GEN (Create pointer called Data) DataLoad Test.SAV Test.XLS (Create an Excel file) DataLoad Test.SAV Test.DBF (Create an dBase file) DataLoad Test.SAV Test.FMT (Create an Gauss matrix file) DataLoad Test.SAV Test.WOR (Create an Instat file) DataLoad Test.SAV Test.SDD (Create an SPlus file) DataLoad Test.XLS/l Test.GSH (Read data in Logbook format) DataLoad Test.XLS Test.TAB (produce a tab delimited file) DataLoad Test.XLS Test.CSV (produce a comma delimited file) DataLoad Test.MST Test.XLS/l=2 (Create Excel file in logbook format with two layout factors) DataLoad.EXE is a Windows 95/NT command line utility. To run it start a command window (using the MSDOS icon) and enter the DataLoad command as above. If running DataLoad, the input files can be prefixed with their directories, if they are in another directory. eg c:\gen5ed\bin\dataload c:\excel\test.xls c:\data\test.gsh DataLoad.UNI is a UNIX Motif executable. It needs to be renamed to DataLoad.EXE to be used on a UNIX system. An icon on the desktop can be created for DataLoad by right clicking the desktop, and using the New>Shortcut menu item. Click the browse button and select DataLoad.EXE and click OK. Give it a name DataLoad and click OK. Now select the new icon, right click it and choose the Properties menu item. In the Target line add the files and options as above, and in the Working directory field enter the directory where the files are located, then Click OK. Now double clicking this icon will transfer the data between the two specified files. ____________________________________________________________________ Dr David Baird, Biometrician EMail: David.Baird@AgResearch.CO.NZ Mail: AgResearch, PO Box 60, Gerald St, Lincoln, NEW ZEALAND Phone: +64 3 983 3975 Fax: +64 3 983 3946