Title: | Import 'Stata' Data Files |
---|---|
Description: | Function to read and write the 'Stata' file format. |
Authors: | Jan Marvin Garbuszus [aut], Sebastian Jeworutzki [aut, cre] , R Core Team [cph], Magnus Thor Torfason [ctb], Luke M. Olson [ctb], Giovanni Righi [ctb], Kevin Jin [ctb] |
Maintainer: | Sebastian Jeworutzki <[email protected]> |
License: | GPL-2 | file LICENSE |
Version: | 0.10.2 |
Built: | 2025-01-06 04:17:19 UTC |
Source: | https://github.com/sjewo/readstata13 |
Convert Stata business calendar dates in readable dates.
as.caldays(buisdays, cal, format = "%Y-%m-%d")
as.caldays(buisdays, cal, format = "%Y-%m-%d")
buisdays |
numeric Vector of business dates |
cal |
data.frame Conversion table for business calendar dates |
format |
character String with date format as in |
Returns a vector of readable dates.
Jan Marvin Garbuszus [email protected]
Sebastian Jeworutzki [email protected]
# read business calendar and data sp500 <- stbcal(system.file("extdata/sp500.stbcal", package="readstata13")) dat <- read.dta13(system.file("extdata/statacar.dta", package="readstata13")) # convert dates and check dat$ldatescal2 <- as.caldays(dat$ldate, sp500) all(dat$ldatescal2==dat$ldatescal)
# read business calendar and data sp500 <- stbcal(system.file("extdata/sp500.stbcal", package="readstata13")) dat <- read.dta13(system.file("extdata/statacar.dta", package="readstata13")) # convert dates and check dat$ldatescal2 <- as.caldays(dat$ldate, sp500) all(dat$ldatescal2==dat$ldatescal)
Retrieve the value labels for a specific Stata label set.
get.label(dat, label.name)
get.label(dat, label.name)
dat |
data.frame. Data.frame created by |
label.name |
character. Name of the Stata label set |
This function returns the table of factor levels which represent
a Stata label set. The name of a label set for a variable can be obtained
by get.label.name
.
Returns a named vector of code numbers
Jan Marvin Garbuszus [email protected]
Sebastian Jeworutzki [email protected]
dat <- read.dta13(system.file("extdata/statacar.dta", package="readstata13")) labname <- get.label.name(dat,"type") get.label(dat, labname)
dat <- read.dta13(system.file("extdata/statacar.dta", package="readstata13")) labname <- get.label.name(dat,"type") get.label(dat, labname)
Retrieves the Stata label set in the dataset for all or an vector of variable names.
get.label.name(dat, var.name = NULL, lang = NA)
get.label.name(dat, var.name = NULL, lang = NA)
dat |
data.frame. Data.frame created by |
var.name |
character vector. Variable names. If |
lang |
character. Label language. Default language defined by
|
Stata stores factor labels in variable independent labels sets. This function retrieves the name of the label set for a variable.
Returns an named vector of variable labels
Jan Marvin Garbuszus [email protected]
Sebastian Jeworutzki [email protected]
Retrieve the value labels for all variables.
get.label.tables(dat)
get.label.tables(dat)
dat |
data.frame. Data.frame created by |
This function returns the factor levels which represent a Stata label set for all variables.
Returns a named list of label tables
Jan Marvin Garbuszus [email protected]
Sebastian Jeworutzki [email protected]
dat <- read.dta13(system.file("extdata/statacar.dta", package="readstata13")) get.label.tables(dat)
dat <- read.dta13(system.file("extdata/statacar.dta", package="readstata13")) get.label.tables(dat)
Displays informations about the defined label languages.
get.lang(dat, print = T)
get.lang(dat, print = T)
dat |
data.frame. Data.frame created by |
print |
logical. If |
Stata allows to define multiple label sets in different languages. This functions reports the available languages and the selected default language.
Returns a list with two components:
Vector of label languages used in the dataset
Name of the actual default label language, otherwise NA
Jan Marvin Garbuszus [email protected]
Sebastian Jeworutzki [email protected]
Recreates the code numbers of a factor as stored in the Stata dataset.
get.origin.codes(x, label.table)
get.origin.codes(x, label.table)
x |
factor. Factor to obtain code for |
label.table |
table. Table with factor levels obtained by
|
While converting numeric variables into factors, the original code
numbers are lost. This function reconstructs the codes from the attribute
label.table
.
Returns an integer with original codes
Jan Marvin Garbuszus [email protected]
Sebastian Jeworutzki [email protected]
dat <- read.dta13(system.file("extdata/statacar.dta", package="readstata13")) labname <- get.label.name(dat,"type") labtab <- get.label(dat, labname) # comparsion get.origin.codes(dat$type, labtab) as.integer(dat$type)
dat <- read.dta13(system.file("extdata/statacar.dta", package="readstata13")) labname <- get.label.name(dat,"type") labtab <- get.label(dat, labname) # comparsion get.origin.codes(dat$type, labtab) as.integer(dat$type)
Stata requires us to provide the maximum size of a charactervector as every row is stored in a bit region of this size.
maxchar(x)
maxchar(x)
x |
vector of data frame |
Ex: If the max chars size is four, _ is no character in this vector: 1. row: four 3. row: one_ 4. row: ____
If a character vector contains only missings or is empty, we will assign it a value of one, since Stata otherwise cannot handle what we write.
read.dta13
reads a Stata dta-file and imports the data into a
data.frame.
read.dta13( file, convert.factors = TRUE, generate.factors = FALSE, encoding = "UTF-8", fromEncoding = NULL, convert.underscore = FALSE, missing.type = FALSE, convert.dates = TRUE, replace.strl = TRUE, add.rownames = FALSE, nonint.factors = FALSE, select.rows = NULL, select.cols = NULL, strlexport = FALSE, strlpath = ".", tz = "GMT" )
read.dta13( file, convert.factors = TRUE, generate.factors = FALSE, encoding = "UTF-8", fromEncoding = NULL, convert.underscore = FALSE, missing.type = FALSE, convert.dates = TRUE, replace.strl = TRUE, add.rownames = FALSE, nonint.factors = FALSE, select.rows = NULL, select.cols = NULL, strlexport = FALSE, strlpath = ".", tz = "GMT" )
file |
character. Path to the dta file you want to import. |
convert.factors |
logical. If |
generate.factors |
logical. If |
encoding |
character. Strings can be converted from Windows-1252 or UTF-8 to system encoding. Options are "latin1" or "UTF-8" to specify target encoding explicitly. Stata 14, 15 and 16 files are UTF-8 encoded and may contain strings which can't be displayed in the current locale. Set encoding=NULL to stop reencoding. |
fromEncoding |
character. We expect strings to be encoded as "CP1252" for Stata Versions 13 and older. For dta files saved with Stata 14 or newer "UTF-8" is used. In some situation the used encoding can differ for Stata 14 files and must be manually set. |
convert.underscore |
logical. If |
missing.type |
logical. Stata knows 27 different missing types:
., .a, .b, ..., .z. If |
convert.dates |
logical. If |
replace.strl |
logical. If |
add.rownames |
logical. If |
nonint.factors |
logical. If |
select.rows |
integer. Vector of one or two numbers. If single value rows from 1:val are selected. If two values of a range are selected the rows in range will be selected. |
select.cols |
character. Vector of variables to select. |
strlexport |
logical. Should strl content be exported as binary files? |
strlpath |
character. Path for strl export. |
tz |
character. time zone specification to be used for POSIXct values. ‘""’ is the current time zone, and ‘"GMT"’ is UTC (Universal Time, Coordinated). |
If the filename is a url, the file will be downloaded as a temporary file and read afterwards.
Stata files are encoded in ansinew. Depending on your system's default encoding certain characters may appear wrong. Using a correct encoding may fix these.
Variable names stored in the dta-file will be used in the resulting
data.frame. Stata types char, byte, and int will become integer; float and
double will become numerics. R only knows a single missing type, while Stata
knows 27, so all Stata missings will become NA in R. If you need to keep
track of Statas original missing types, you may use
missing.type=TRUE
.
Stata dates are converted to R's Date class the same way foreign handles dates.
Stata 13 introduced a new character type called strL. strLs are able to store
strings up to 2 billion characters. While R is able to store
strings of this size in a character vector, the printed representation of
such vectors looks rather cluttered, so it's possible to save only a
reference in the data.frame with option replace.strl=FALSE
.
In R, you may use rownames to store characters (see for instance
data(swiss)
). In Stata, this is not possible and rownames have to be
stored as a variable. If you want to use rownames, set add.rownames to TRUE.
Then the first variable of the dta-file will hold the rownames of the
resulting data.frame.
Reading dta-files of older and newer versions than 13 was introduced with version 0.8.
The function returns a data.frame with attributes. The attributes include
Dataset label
Timestamp of file creation
Stata display formats. May be used with
sprintf
Stata data type (see Stata Corp 2014)
For each variable the name of the associated value labels in "label"
Variable labels
dta file format version
List of value labels.
Character vector with long strings for the new strl string variable type. The name of every element is the identifier.
list providing variable name, characteristic name and the contents of Stata characteristic field.
List of numeric vectors with Stata missing type for each variable.
Byteorder of the dta-file. LSF or MSF.
Dimension recorded inside the dta-file.
read.dta13 uses GPL 2 licensed code by Thomas Lumley and R-core members from foreign::read.dta().
Jan Marvin Garbuszus [email protected]
Sebastian Jeworutzki [email protected]
Stata Corp (2014): Description of .dta file format https://www.stata.com/help.cgi?dta
read.dta
in package foreign
and
memisc
for dta files from Stata
versions < 13 and read_dta
in package haven
for Stata version
>= 13.
## Not run: library(readstata13) r13 <- read.dta13("https://www.stata-press.com/data/r13/auto.dta") ## End(Not run)
## Not run: library(readstata13) r13 <- read.dta13("https://www.stata-press.com/data/r13/auto.dta") ## End(Not run)
Function to read the Stata file format into a data.frame.
If you catch a bug, please do not sue us, we do not have any money.
Marvin Garbuszus [email protected]
Sebastian Jeworutzki [email protected]
read.dta
and memisc
for dta files from
Stata Versions < 13
save.dta13
writes a Stata dta-file bytewise and saves the data
into a dta-file.
save.dta13( data, file, data.label = NULL, time.stamp = TRUE, convert.factors = TRUE, convert.dates = TRUE, tz = "GMT", add.rownames = FALSE, compress = FALSE, version = 117, convert.underscore = FALSE )
save.dta13( data, file, data.label = NULL, time.stamp = TRUE, convert.factors = TRUE, convert.dates = TRUE, tz = "GMT", add.rownames = FALSE, compress = FALSE, version = 117, convert.underscore = FALSE )
data |
data.frame. A data.frame Object. |
file |
character. Path to the dta file you want to export. |
data.label |
character. Name of the dta-file. |
time.stamp |
logical. If |
convert.factors |
logical. If |
convert.dates |
logical. If |
tz |
character. time zone specification to be used for POSIXct values and dates (if convert.dates is TRUE). ‘""’ is the current time zone, and ‘"GMT"’ is UTC (Universal Time, Coordinated). |
add.rownames |
logical. If |
compress |
logical. If |
version |
numeric. Stata format for the resulting dta-file either Stata version number (6 - 16) or the internal Stata dta-format (e.g. 117 for Stata 13). Experimental support for large datasets: Use version="15mp" to save the dataset in the new Stata 15/16 MP file format. This feature is not thoroughly tested yet. |
convert.underscore |
logical. If |
The function writes a dta-file to disk. The following features of the dta file format are supported:
Dataset label
Timestamp of file creation
Stata display formats. May be used with
sprintf
Stata data type (see Stata Corp 2014)
Variable labels
dta file format version
List of character vectors for the new strL string variable type. The first element is the identifier and the second element the string.
Jan Marvin Garbuszus [email protected]
Sebastian Jeworutzki [email protected]
Stata Corp (2014): Description of .dta file format https://www.stata.com/help.cgi?dta
read.dta
in package foreign
and
memisc
for dta files from Stata versions < 13 and read_dta
in
package haven
for Stata version >= 13.
## Not run: library(readstata13) save.dta13(cars, file="cars.dta") ## End(Not run)
## Not run: library(readstata13) save.dta13(cars, file="cars.dta") ## End(Not run)
Compression can reduce numeric vectors as integers if the vector does only contain integer type data.
saveToExport(x)
saveToExport(x)
x |
vector of data frame |
Assign value labels from a Stata label set to a variable. If duplicated labels are found, unique labels will be generated according the following scheme: "label_(integer code)". Levels without labels will become <NA>.
set.label(dat, var.name, lang = NA)
set.label(dat, var.name, lang = NA)
dat |
data.frame. Data.frame created by |
var.name |
character. Name of the variable in the data.frame |
lang |
character. Label language. Default language defined by
|
Returns a labeled factor
dat <- read.dta13(system.file("extdata/statacar.dta", package="readstata13"), convert.factors=FALSE) # compare vectors set.label(dat, "type") dat$type # German label set.label(dat, "type", "de")
dat <- read.dta13(system.file("extdata/statacar.dta", package="readstata13"), convert.factors=FALSE) # compare vectors set.label(dat, "type") dat$type # German label set.label(dat, "type", "de")
Changes default label language for a dataset. Variables with generated labels (option generate.labels=TRUE) are kept unchanged.
set.lang(dat, lang = NA, generate.factors = FALSE)
set.lang(dat, lang = NA, generate.factors = FALSE)
dat |
data.frame. Data.frame created by |
lang |
character. Label language. Default language defined by
|
generate.factors |
logical. If |
Returns a data.frame with value labels in language "lang".
Jan Marvin Garbuszus [email protected]
Sebastian Jeworutzki [email protected]
dat <- read.dta13(system.file("extdata/statacar.dta", package="readstata13")) get.lang(dat) varlabel(dat) # set German label datDE <- set.lang(dat, "de") get.lang(datDE) varlabel(datDE)
dat <- read.dta13(system.file("extdata/statacar.dta", package="readstata13")) get.lang(dat) varlabel(dat) # set German label datDE <- set.lang(dat, "de") get.lang(datDE) varlabel(datDE)
Create conversion table for business calendar dates.
stbcal(stbcalfile)
stbcal(stbcalfile)
stbcalfile |
stbcal-file Stata business calendar file created by Stata. |
Stata 12 introduced business calendar format. Business dates are integer numbers in a certain range of days, weeks, months or years. In this range some days are omitted (e.g. weekends or holidays). If a business calendar was created, a stbcal file matching this calendar was created. This file is required to read the business calendar. This parser reads the stbcal- file and returns a data.frame with dates matching business calendar dates.
A dta-file containing Stata business dates imported with read.stata13() shows in formats which stdcal file is required (e.g. " sp500.stbcal).
Stata allows adding a short description called purpose. This is added as an attribute of the resulting data.frame.
Returns a data.frame with two cols:
The date matching the businessdate. Date format.
The Stata business calendar day. Integer format.
Jan Marvin Garbuszus [email protected]
Sebastian Jeworutzki [email protected]
sp500 <- stbcal(system.file("extdata/sp500.stbcal", package="readstata13"))
sp500 <- stbcal(system.file("extdata/sp500.stbcal", package="readstata13"))
Retrieve or set variable labels for a dataset.
varlabel(dat, var.name = NULL, lang = NA) varlabel(dat) <- value
varlabel(dat, var.name = NULL, lang = NA) varlabel(dat) <- value
dat |
data.frame. Data.frame created by |
var.name |
character vector. Variable names. If NULL, get label for all variables. |
lang |
character. Label language. Default language defined by
|
value |
character vector. Character vector of size ncol(data) with variable names. |
Returns an named vector of variable labels
Jan Marvin Garbuszus [email protected]
Sebastian Jeworutzki [email protected]
dat <- read.dta13(system.file("extdata/statacar.dta", package="readstata13"), convert.factors=FALSE) # display variable labels varlabel(dat) # display german variable labels varlabel(dat, lang="de") # display german variable label for brand varlabel(dat, var.name = "brand", lang="de") # define new variable labels varlabel(dat) <- letters[1:ncol(dat)] # display new variable labels varlabel(dat)
dat <- read.dta13(system.file("extdata/statacar.dta", package="readstata13"), convert.factors=FALSE) # display variable labels varlabel(dat) # display german variable labels varlabel(dat, lang="de") # display german variable label for brand varlabel(dat, var.name = "brand", lang="de") # define new variable labels varlabel(dat) <- letters[1:ncol(dat)] # display new variable labels varlabel(dat)