Organize Data for Epidemiological Studies
What is HTML DataBook?
HTML DataBook is a set of interlinked web pages that reports the following information for an epidemiologic study:
- Which datasets are collected? How many subjects in each dataset?
- Which variables are in each dataset?
- What data is available for each individual subject?
- For each dataset, what is the data distribution (mean, sd and percentiles, or frequency) of each variable?
Click here for an Example HTML DataBook
and click here for an explanation of the each table in the sample HTML DataBook
Why use HTML Databook?
A large research study usually collects data from multiple sources, such as questionnaires, physical exams, medical record reviews, laboratory tests, and so on.
Data from each source is usually saved in separate datasets. Empower DataBook provides an overview of datasets and the contents of each dataset, which will help investigators:
- Quickly get familiar with the study contents, datasets and variables
- Generate hypothesis and data analysis plans
- Share information and collaborate with others
How to use Empower DataBook
Simply input the Project Name and the Data Directory, then click “Create Databook” button
Below is a screenshot of Empower Databook Input window:
- Organize all datasets of a study into one directory
- your data files could be SAS datasets or text files (tab delimited, or comma delimited, or space delimited
- your SAS datasets could also be located in local PC or unix server
- If you are using tab, comma or space delimited text files, your data files should be saved in local PC
- Optional input information
- Subject ID variable: if you have a common variable for identifying each subject of the study in each data files, enter this variable name. With subject ID variable specified, Empower DataBook can report number of subject in each test item (table 2 and table 3) and test items for each subject (table 4)
- Project title is optional, it is a short description of the project
- Data files’ description is optional. If data files located in local PC, Empower will automatically search the data directory and list all data files. If data files located in remote server (SAS datasets only), Empower DataBook will write SAS code automatically detect all datasets in the directory
- Data document file is optional. For each data file, if you have the questionnaire or record sheet or any other documents, you can save it as a .pdf file with same name as the dataset name (eg. ques1.pdf is the document file for ques1.sas7bdat) and put it in the data directory. These .pdf file will automatically be linked to the report pages
- HTML DataBook files will automatically be saved in local windows “..\My documents\EmpowerDataBook\” directory. Each project will have its own subdirectory, and the subdirectory name is same as project name
- Empower DataBook automatically searches all data files under data directory, create SAS (for SAS datasets) or R (for text format data files) code to get contents of each dataset and organize the output into interlinked HTML web pages
Run SAS in Unix: If your data files are SAS dataset and are located in a unix server, Empower DataBook will ask you to set up SFTP/Putty connection parameter first, then Empower DataBook will automatically upload SAS code it created to Unix server and execute it. When SAS is finished executing in the server, you can download the HTML DataBook files to local PC for review.
Click “Change server setting” button to edit the following setting (see below) to configure the connection.
Usually what you only need to change is: your Unix server host name or IP address, login user name, password, and command to call SAS in your host. For example, some servers were set up to call SAS using “sas sas-program-file-name”, in that case, set SAS command as “sas”; some server were set up to call SAS using “qbs –q m sas-program-file-name”, in that case, set SAS command as “qbs –q m”. All other setting were pre-configured and do not need to change in most of the time
In the above form, the text between “[“ and “]” represents the relative replacement text during execution