mlereg_tf function — mlereg

Function to find the Maximum Likelihood Estimates of regression parameters using TensorFlow.

mlereg_tf(
  ydist = y ~ Normal,
  formulas,
  data,
  available_distribution = TRUE,
  fixparam = NULL,
  initparam = NULL,
  link_function = NULL,
  optimizer = "AdamOptimizer",
  hyperparameters = NULL,
  maxiter = 10000,
  tolerance = .Machine$double.eps
)

Arguments

ydist	an object of class "formula" that specifies the distribution of the response variable. The default value is `y ~ Normal`. The available distributions are: `Normal`, `Poisson`, `Weibull`, `Exponential`, `LogNormal`, `Beta` and `Gamma`. If you want to estimate parameters from a distribution different to the ones mentioned above, you must provide the name of an object of class function that contains its probability mass/density function. This `R` function must not contain curly brackets other than those that enclose the function.
formulas	a list containing objects of class "formula". Each element of the list represents the linear predictor for each of the parameters of the regression model. The linear predictor is specified with the name of the parameter and it must contain an `~`. The terms on the right side must be separated by `+`.
data	a data frame containing the response variable and the covariates.
available_distribution	logical. If `TRUE`, the distribution of the response variable is one of the following distributions: `Normal`, `Poisson`, `Weibull`, `Exponential`, `LogNormal`, `Beta` and `Gamma`.
fixparam	a list containing the fixed parameters of the model only if they exist. The parameters values and names must be specified in the list.
initparam	a list with the initial values of the regression coefficients to be estimated. The list must contain the regression coefficients values and names. If you want to use the same initial values for all regression coefficients associated with a specific parameter, you can specify the name of the parameter and the value. If NULL the default initial value is zero.
link_function	a list with names of parameters to be linked and the corresponding link function name. The available link functions are: `log`, `logit`, `squared` and `identity`.
optimizer	a character indicating the name of the TensorFlow optimizer to be used in the optimization process. The default value is `AdamOptimizer`. The available optimizers are: `AdadeltaOptimizer`, `AdagradDAOptimizer`, `AdagradOptimizer`, `AdamOptimizer`, `GradientDescentOptimizer`, `MomentumOptimizer` and `RMSPropOptimizer`.
hyperparameters	a list with the hyperparameters values of the selected TensorFlow optimizer. If the hyperparameters are not specified, their default values will be used in the oprimization process. For more details of the hyperparameters go to this URL: https://www.tensorflow.org/api_docs/python/tf/compat/v1/train
maxiter	a positive integer indicating the maximum number of iterations for the optimization algorithm. The default value is `10000`.
tolerance	a small positive number. When the difference between the loss value or the parameters values from one iteration to another is lower than this value, the optimization process stops. The default value is `.Machine$double.eps`.

Value

This function returns the estimates, standard errors, Z-score and p-values of significance tests of the regression model coefficients as well as some information of the optimization process like the number of iterations needed for convergence.

Details

mlereg_tf computes the log-likelihood function based on the distribution specified in ydist and linear predictors specified in formulas. Then, it finds the values of the regression coefficients that maximizes this function using the TensorFlow opimizer specified in optimizer.

The R function that contains the probability mass/density function must not contain curly brackets. The only curly brackets that the function can contain are those that enclose the function, that is, those that define the beginning and end of the R function.

Note

The summary, print, plot_loss functions can be used with a mlereg_tf object.

Author

Sara Garcés Céspedes sgarcesc@unal.edu.co

Examples

#----------------------------------------------------------------------------------
# Estimation of coefficients of a Poisson regression model

# Data frame with response variable and covariates
counts <- c(18,17,15,20,10,20,25,13,12)
outcome <- gl(3,1,9)
treatment <- gl(3,3)
data <- data.frame(treatment, outcome, counts)

# Use the mlereg_tf function
estimation_1 <- mlereg_tf(ydist =  counts ~ Poisson,
                          formulas = list(lambda = ~ outcome + treatment),
                          data = data,
                          initparam = list(lambda = 1.0),
                          optimizer = "AdamOptimizer",
                          link_function = list(lambda = "log"),
                          hyperparameters = list(learning_rate = 0.1))

# Get the summary of the estimates
summary(estimation_1)
#> Distribution: Poisson 
#> Number of observations: 9 
#> TensorFlow optimizer: AdamOptimizer 
#> Negative log-likelihood: -274.7378 
#> Loss function convergence, 110 iterations needed. 
#> ----------------------------------------------------------------
#> Distributional parameter: lambda
#> ----------------------------------------------------------------
#>              Estimate. Std..Error Z.value Pr...z..    
#> (Intercept)  3.0446310  0.1708925  17.816   <2e-16 ***
#> outcome2    -0.4541281  0.2021639  -2.246   0.0247 *  
#> outcome3    -0.2930921  0.1927491  -1.521   0.1284    
#> treatment2  -0.0001777  0.1999979  -0.001   0.9993    
#> treatment3  -0.0001776  0.1999979  -0.001   0.9993    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> ----------------------------------------------------------------

#----------------------------------------------------------------------------------
# Estimation of coefficients of a linear regression model with one fixed parameter

# Data frame with response variable and covariates
x <- runif(n = 1000, -3, 3)
y <- rnorm(n = 1000, mean = 5 - 2 * x, sd = 3)
data <- data.frame(y = y, x = x)

# Use the mlereg_tf function
estimation_2 <- mlereg_tf(ydist = y ~ Normal,
                          formulas = list(mean = ~ x),
                          data = data,
                          fixparam = list(sd = 3),
                          initparam = list(mean = list(Intercept = 1.0, x = 0.0)),
                          optimizer = "AdamOptimizer",
                          hyperparameters = list(learning_rate = 0.1))

# Get the summary of the estimates
summary(estimation_2)
#> Distribution: Normal 
#> Number of observations: 1000 
#> TensorFlow optimizer: AdamOptimizer 
#> Negative log-likelihood: 652.0222 
#> Loss function convergence, 115 iterations needed. 
#> ----------------------------------------------------------------
#> Distributional parameter: mean
#> ----------------------------------------------------------------
#>             Estimate. Std..Error Z.value Pr...z..    
#> (Intercept)   4.86203    0.09489   51.24   <2e-16 ***
#> x            -1.95539    0.05372  -36.40   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> ----------------------------------------------------------------