Quasi-Binomial Regression Model for the Analysis of Data with Extra-Binomial Variation
Show Abstract
Abstract
Objectives: Developing inference procedures on the quasi-binomial distribution and the regression model. Methods: Score testing and the method of maximum likelihood for regression parameters estimation. Data: Several examples are included, based on published data. Results: A quasi-binomial model is used to model binary response data which exhibit extra-binomial variation. A partial score test on the binomial hypothesis versus the quasi-binomial alternative is developed and illustrated on three data sets. The ex tended logit transformation on the binomial parameter is introduced and the large sample dispersion matrix of the estimated parameters is derived. The Nonlinear Mixed Procedure (NLMIXED) in SAS is shown to be very appropriate for the estimation of nonlinear regression.
|
Mohamed M. Shoukri,
Maha M. Aleid,
|
0 |
Download Full Paper |
0 |
Meta-Analysis of Multi-Arm Trials Using Binomial Approach
Show Abstract
Abstract
Most meta-analysis has concentrated on combining of treatment effect measures based on comparisons of two treatments. Meta-analysis of multi-arm trials is a key component of submission to summarize evidence from all possible studies. In this paper, an exact binomial model is proposed by using logistic regression model to compare different treatment in multi-arm trials. Two approaches such as unconditional maximum likelihood and conditional
maximum likelihood have been determined and compared for the logistic regression model. The proposed models are performed using the data from 27 randomized clinical trials (RCTs) which determine the efficacy of antiplatelet therapy in reduction venous thrombosis and pulmonary embolism .
|
Hathaikan Chootrakool,
Pichet Treewai,
|
2022 |
Download Full Paper |
0 |
An Analysis of Two-Dimensional Image Data Using a Grouping Estimator
Show Abstract
Abstract
Machine learning methods, one type of methods used in artificial intelligence,are now widely used to analyze two-dimensional (2D) images in various fields. In these analyses, estimating the boundary between two regions is basic
but important. If the model contains stochastic factors such as random observation errors, determining the boundary is not easy. When the probability distributions are mis-specified, ordinal methods such as probit and logit maximum likelihood estimators (MLE) have large biases. The grouping estimator is a semiparametric estimator based on the grouping of data that does not require specific probability distributions. For 2D images, the grouping is simple. Monte Carlo experiments show that the grouping estimator clearly improves the probit MLE in many cases. The grouping estimator essentially makes the resolution density lower, and the present findings imply that methods using low-resolution image analyses might not be the proper ones in high-density image analyses. It is necessary to combine and compare the results of high- and low-resolution image analyses. The grouping estimator may provide theoretical justifications for such analysis.
|
Kazumitsu Nawata,
|
2022 |
Download Full Paper |
0 |
Minimum MSE Weighted Estimator to Make Inferences for a Common Risk Ratio across Sparse Meta-Analysis Data
Show Abstract
Abstract
The paper aims to discuss three interesting issues of statistical inferences for a common risk ratio (RR) in sparse meta-analysis data. Firstly, the conventional log-risk ratio estimator encounters a number of problems when the number of events in the experimental or control group is zero in sparse data of a 2 × 2 table. The adjusted log-risk ratio estimator with the continuity correction points ( 1 2 ) 1 1 , , 6 6 c c = based upon the minimum Bayes risk with respect to the uniform prior density over (0, 1) and the Euclidean loss function is proposed. Secondly, the interest is to find the optimal weights ˆj f of the pooled estimate 1ˆ ˆ ˆjw cj j θ θ f = = ∑ that minimize the mean square error (MSE) of ˆw θ subject to the constraint on 1ˆ 1 j jk f = ∑ = where ( ) ( 1 2 ) ( ) ˆ log log ˆ log ˆ cj Rcj c j c j θ = = − R p p ,pˆ cj j 1 11 1 1 =+ + ( Xcn c ) ( j 2 ) , pˆ cj j 2 22 2 2 =+ + ( Xcn c ) ( j 2 ) . Finally, the performance of this minimum MSE weighted estimator adjusted with various values of points 1 2 cc c = = is investigated to compare with other popular estimators, such as the Mantel-Haenszel (MH) estimator and the weighted least squares (WLS) estimator (also equivalently known as the inverse-variance weighted estimator) in senses of point estimation and hypothesis testing via simulation studies. The results of estimation illustrate that regardless of the true values of RR, the MH estimator achieves the best performance with the smallest MSE when the study size is rather large ( k ≥ 16 ) and the sample sizes within each study are small. The MSE of WLS estimator and the proposed-weight estimatoradjusted by c = 1 6 , or c = 1 3 , or c = 1 2 are close together and they are the best when the sample sizes are moderate to large ( 1 16 j n ≥ and 2 16 j n ≥ ) while the study size is rather small.
|
Chukiat Viwatwongkasem,
Pichitpong Soontornpipit,
Jutatip Sillabutra,
Pratana Satitvipawee,
Prasong Kitidamrongsuk,
Hathaikan Chootrakool,
Sutthisak Srisawad,
|
2022 |
Download Full Paper |
0 |
Constructing Statistical Intervals for Small Area Estimates Based on Generalized Linear Mixed Model in Health Surveys
Show Abstract
Abstract
Generalized Linear Mixed Model (GLMM) has been widely used in small area estimation for health indicators. Bayesian estimation is usually used to construct statistical intervals, however, its computational intensity is a big challenge for large complex surveys. Frequentist approaches, such as bootstrapping, and Monte Carlo (MC) simulation, are also applied but not evaluated in terms of the interval magnitude, width, and the computational time consumed. The 2013 Florida Behavioral Risk Factor Surveillance System data was used as a case study. County-level estimated prevalence of three health-related outcomes was obtained through a GLMM; and their 95% confidence intervals (CIs) were generated from bootstrapping and MC simulation. The intervals were compared to 95% credential intervals through a hierarchial Bayesian model. The results showed that 95% CIs for county-level estimates of each
outcome by using MC simulation were similar to the 95% credible intervals generated by Bayesian estimation and were the most computationally efficient. It could be a viable option for constructing statistical intervals for small
area estimation in public health practice.
|
Yan Wang,
Hua Lu,
Janet B. Croft,
Kurt J. Greenlund,
Xingyou Zhang,
|
2022 |
Download Full Paper |
0 |
Probability Models with Discrete and Continuous Parts
Show Abstract
Abstract
In mathematical statistics courses, students learn that the quadratic function (( ) ) 2 EXx – is minimized when x is the mean of the random variable X, and that the graphs of this function for any two distributions of X are simply
translates of each other. We focus on the problem of minimizing the function defined by yx E X x ( ) = ( – ) in the context of mixtures of probability distributions of the discrete, absolutely continuous, and singular continuous types. This problem is important, for example, in Bayesian statistics, when one attempts to compute the decision function, which minimizes the expected risk with respect to an absolute error loss function. Although the literature considers this problem, it does so only under restrictive conditions on the distribution of the random variable X, by, for example, assuming that the corresponding cumulative distribution function is discrete or absolutely continuous. By using Riemann-Stieltjes integration, we prove a theorem, which solves this minimization problem under completely general conditions on the distribution of X. We also illustrate our result by presenting examples involving mixtures of istributions of the discrete and absolutely continuous types, and for the Cantor distribution, in which case the cumulative distribution function is singular continuous. Finally, we prove a theorem that evaluates the function y(x) when X has the Cantor distribution.
|
James E. Marengo,
David L. Farnsworth,
|
2022 |
Download Full Paper |
0 |
Improvement of Misclassification Rates of Classifying Objects under Box Cox Transformation and Bootstrap Approach
Show Abstract
Abstract
Discrimination and classification rules are based on different types of assumptions. Also, all most statistical methods are based on some necessary assumptions. Parametric methods are the best choice if it follows all the underlying assumptions. When assumptions are violated, parametric approaches do not provide a better solution and nonparametric techniques are preferred. After Box-Cox transformation, when assumptions are satisfied, parametric methods provide fewer misclassification rates. With this problem in mind, our concern is to compare the classification accuracy of parametric and non-parametric approaches with the aid of Box-Cox transformation and Bootstrapping. We carried Support Vector Machines (SVMs) and different discrimination and classification rules to classify objects. The attention is to critically compare the SVMs with Linear discrimination Analysis (LDA), and Quadratic discrimination Analysis (QDA) for measuring the performance of these techniques before and after Box-Cox transformation using misclassification rates. From the apparent error rates, we observe that before Box-Cox transformation, SVMs perform better than existing classification techniques, on the other hand, after Box-Cox transformation, parametric techniques provide fewer misclassification rates compared to nonparametric method. We also investigated the performances of classification techniques using the Bootstrap approach and observed that Bootstrap-based classification techniques significantly reduce the classification error rate than the usual techniques of small samples. Thus, this paper proposes to apply classification techniques under the Bootstrap approach for classifying objects in case of small sample. A real and simulated datasets application is carried out to see the performance.
|
Mst Sharmin Akter Sumy,
Md Yasin Ali Parh,
Ajit Kumar Majumder,
Nayeem Bin Saifuddin,
|
2022 |
Download Full Paper |
0 |
Modeling the Impact of Girl Child Empowerment amid Boy Child Neglect on Singlehood in Kakamega County
Show Abstract
Abstract
In Kenya today, we are experiencing an increase in the number of boy child dropping out of school thus making the female figure to be ahead as far as development agenda is concerned. There is increase in singlehood due to
one’s own choice, separation and divorce because boy child does not feel empowered to take up their roles as heads of families later in life. A lot is documented about single hood but information about impact of girl child empowerment amid boy child neglect on singlehood is minimally captured in literature. This study empirically modeled the impact of girl child empowerment amid boy child neglect on single hood in Kakamega County. The study employed sample survey method of data collection. Single hoods of age 30 years and above formed the study population. The data was collected using questionnaires and analyzed by use of Chi-Squares to test the degree of relationship between the variables of study and singlehood. Linear regression analysis was also used to come up with a model of how the variables of study influenced singlehood in the county. The study revealed negative correlation between girl child empowerment and singlehood and a positive correlation between boy child neglect and singlehood. The findings are expected to be useful to stakeholders in Kakamega County when designing appropriate plan of action to empower both boy child and girl child alike.
|
Odero Everlyne Akoth,
|
2022 |
Download Full Paper |
0 |
Testing for Normality from the Parametric Seven-Number Summary
Show Abstract
Abstract
The objective of this study is to propose the Parametric Seven-Number Summary (PSNS) as a significance test for normality and to verify its accuracy and power in comparison with two well-known tests, such as Royston’s W
test and D’Agostino-Belanger-D’Agostino K-squared test. An experiment with 384 conditions was simulated. The conditions were generated by crossing 24 sample sizes and 16 types of continuous distributions: one normal and
15 non-normal. The percentage of success in maintaining the null hypothesis of normality against normal samples and in rejecting the null hypothesis against non-normal samples (accuracy) was calculated. In addition, the type
II error against normal samples and the statistical power against normal samples were computed. Comparisons of percentage and means were performed using Cochran’s Q-test, Friedman’s test, and repeated measures analysis of
variance. With sample sizes of 150 or greater, high accuracy and mean power or type II error (≥0.70 and ≥0.80, respectively) were achieved. All three normality tests were similarly accurate; however, the PSNS-based test showed
lower mean power than K-squared and W tests, especially against non-normal samples of symmetrical-platykurtic distributions, such as the uniform, semicircle, and arcsine distributions. It is concluded that the PSNS-based omnibus
test is accurate and powerful for testing normality with samples of at least 150 observations.
|
José Moral De La Rubia,
|
2022 |
Download Full Paper |
0 |
The Temporal Making of a Great Literary Corpus by a XX-Century Mystic: Statistics of Daily Words and Writing Time
Show Abstract
Abstract
Maria Valtorta (1897-1961, Italian mystic)—bedridden since 1934 because paralyzed—wrote in Italian 13,193 pages of 122 school notebooks concerning alleged mystical visions on Jesus’ life, during World War II and few following
years. The contents—about 2.64 million words—are now scattered in different books. She could write from 2 to 6 hours without pausing, with steady speed, and twice in the same day. She never made corrections and was very
proficient in Italian. We have studied her writing activity concerning her alleged mystical experience with the main scope of establishing the time sequence of daily writing. This is possible because she diligently annotated the
date of almost every text. We have reconstructed the time series of daily words and have converted them into time series of writing time, by assuming a realistic speed of 20 words per minute, a reliable average value of fast handwriting speed, applicable to Maria Valtorta. She wrote for 1340 days, about 3.67 years of equivalent contiguous writing time, mostly concentrated in the years 1943 to 1948. This study is a first approach in evaluating the effort done, in terms of writing time, by a mystic turned out to be a very effective literary author, whose texts are interesting to read per se, beyond any judgement—not of concern here—on her alleged visions.
|
Emilio Matricciani,
|
2022 |
Download Full Paper |
0 |
Uniformly Minimum-Variance Unbiased Estimator (UMVUE) for the Gamma Cumulative Distribution Function with Known and Integer Scale Parameter
Show Abstract
Abstract
Uniformly minimum-variance unbiased estimator (UMVUE) for the gamma cumulative distribution function with known and integer scale parameter. This paper applies Rao-Blackwell and Lehmann-Scheffeé Theorems to deduce the
uniformly minimum-variance unbiased estimator (UMVUE) for the gamma cumulative distribution function with known and integer scale parameters. The paper closes with an example comparing the empirical distribution function with the UMVUE estimates.
|
Jessica Kubrusly,
|
2022 |
Download Full Paper |
0 |
Estimation of a Linear Model in Terms of Intra-Class Correlations of the Residual Error and the Regressors
Show Abstract
Abstract
Objectives: The objective is to analyze the interaction of the correlation structure and values of the regressor variables in the estimation of a linear model when there is a constant, possibly negative, intra-class correlation of residual
errors and the group sizes are equal. Specifically: 1) How does the variance of the generalized least squares (GLS) estimator (GLSE) depend on the regressor values? 2) What is the bias in estimated variances when ordinary least squares (OLS) estimator is used? 3) In what cases are OLS and GLS equivalent. 4) How can the best linear unbiased estimator (BLUE) be constructed when the covariance matrix is singular? The purpose is to make general matrix results understandable. Results: The effects of the regressor values can be expressed in terms of the intra-class correlations of the regressors. If the intra-class correlation of residuals is large, then it is beneficial to have small intra-class correlations of the regressors, and vice versa. The algebraic presentation of GLS shows how the GLSE gives different weight to the between-group effects and the within-group effects, in what cases OLSE is equal to GLSE, and how BLUE can be constructed when the residual covariance matrix is singular. Different situations arise when the intra-class correlations of the regressors get their extreme values or intermediate values. The derivations lead to BLUE
combining OLS and GLS weighting in an estimator, which can be obtained also using general matrix theory. It is indicated how the analysis can be generalized to non-equal group sizes. The analysis gives insight to models where
between-group effects and within-group effects are used as separate regressors.
|
Juha Lappi,
|
2022 |
Download Full Paper |
0 |
A Modified Regression Estimator for Single Phase Sampling in the Presence of Observational Errors
Show Abstract
Abstract
In this paper, a regression method of estimation has been used to derive the mean estimate of the survey variable using simple random sampling without replacement in the presence of observational errors. Two covariates were used
and a case where the observational errors were in both the survey variable and the covariates was considered. The inclusion of observational errors was due to the fact that data collected through surveys are often not free from errors that occur during observation. These errors can occur due to over-reporting ,under-reporting, memory failure by the respondents or use of imprecise tools of data collection. The expression of mean squared error (MSE) based on the
obtained estimator has been derived to the first degree of approximation. The results of a simulation study show that the derived modified regression mean estimator under observational errors is more efficient than the mean per unit
estimator and some other existing estimators. The proposed estimator can therefore be used in estimating a finite population mean, while considering observational errors that may occur during a study.
|
Nujayma M. A. Salim,
Christopher O. Onyango,
|
2022 |
Download Full Paper |
0 |
A Preliminary Outline of the Statistical Inference Process in Genetic Association Studies
Show Abstract
Abstract
The genome-wide association study (GWAS) is a powerful experimental design that is applied to detect disease susceptible genetic variants. The main goal of these studies is to provide a better understanding of the biology of
disease, which further facilitates prevention or better treatment. A statistical inferential process is finally carried out in this study, where an association is usually observed between the single-nucleotide polymorphism (SNPs) and
the traits in a case-control setting. To detect the disease responsible loci correctly, the investigation of the statistical association should be carefully conducted along with the other necessary steps. This research provides an introductory guideline for conducting such statistical association tests for these studies using SNP genotype data.
|
Tapati Basak,
Nipa Roy,
|
2022 |
Download Full Paper |
0 |
Extended Oracle Properties of Adaptive Lasso Estimators
Show Abstract
Abstract
We study the asymptotic properties of adaptive lasso estimators when some components of the parameter of interest β are strictly different than zero ,while other components may be zero or may converge to zero with rate n−δ ,
with δ > 0 , where n denotes the sample size. To achieve this objective, we analyze the convergence/divergence rates of each term in the first-order conditions of adaptive lasso estimators. First, we derive conditions that allow selecting tuning parameters in order to ensure that adaptive lasso estimates of n−δ -components indeed collapse to zero. Second, in this case, we also derive asymptotic distributions of adaptive lasso estimators for nonzero components. When δ > 1 2 , we obtain the usual 1 2 n -asymptotic normal distribution, while when 0 < ≤ δ 1 2 , we show nδ -consistency combined with (biased) 1 2 n −δ -asymptotic normality for nonzero components. We call these properties, Extended Oracle Properties. These results allow practitioners to exclude in their model the asymptotically negligible variables and make inferences on the asymptotically relevant variables.
|
Lorenzo Camponovo,
|
2022 |
Download Full Paper |
0 |
Quasi-Negative Binomial: Properties, Parametric Estimation, Regression Model and Application to RNA-SEQ Data
Show Abstract
Abstract
Background: The Poisson and the Negative Binomial distributions are commonly used to model count data. The Poisson is characterized by the equality of mean and variance whereas the Negative Binomial has a variance larger than the mean and therefore both models are appropriate to model over-dispersed count data. Objectives: A new two-parameter probability distribution called the Quasi-Negative Binomial Distribution (QNBD) is being studied in this
paper, generalizing the well-known negative binomial distribution. This model turns out to be quite flexible for analyzing count data. Our main objectives are to estimate the parameters of the proposed distribution and to discuss its
applicability to genetics data. As an application, we demonstrate that the QNBD regression representation is utilized to model genomics data sets. Results: The new distribution is shown to provide a good fit with respect to the “Akaike
Information Criterion”, AIC, considered a measure of model goodness of fit. The proposed distribution may serve as a viable alternative to other distributions available in the literature for modeling count data exhibiting overdispersion, arising in various fields of scientific investigation such as genomics and biomedicine.
|
Mohamed M. Shoukri,
Maha M. Aleid,
|
2022 |
Download Full Paper |
0 |
Survival Rate Analysis on Breast Cancer Cases at Univesity College Hospital, Ibadan, Nigeria
Show Abstract
Abstract
Breast cancer is one of the leading diseases that affect women’s lives. It affects their lives in so many ways by denying them the required standard of health needed to carry out all of their daily activities for some days, weeks, months
or years before eventually causing death. This research estimates the survival rate of breast cancer patients and investigates the effects of stage of tumor, gender, age, ethnic group, occupation, marital status and type of cancer upon the survival of patients. Data used for the study were extracted from the case file of patients in the Radiation Oncology Department, University College Hospital, Ibadan using a well-structured pro forma in which 74 observations
were censored and 30 events occurred. The Kaplan-Meier estimator was used to estimate the overall survival probability of breast cancer patients following their recruitment into the study and determine the mean and median survival times of breast cancer patients following their time of recruitment into the study. Since there are different groups with respect to the stages of tumor at the time of diagnosis, the log-rank test was used to compare the survival
curve of the stages of tumor with considering p-values below 0.05 as statistically significant. Multivariate Cox regression was used to investigate the effects of some variables on the survival of patients. The overall cumulative
survival probability obtained is 0.175 (17.5%). The overall estimated mean time until death is 28.751 weeks while the median time between admission and death is 23 weeks. As the p-value (0.000032) of the log-rank test for comparing stages of tumor is less than 0.05, it is concluded that there is significant evidence of a difference in survival times for the stages of tumor. The survival function plot for the stages of tumor shows that patients with stage III tumor are less likely to survive. From the estimated mean time until death for the stages of tumor, it was deduced that stage I tumor patients have an increased chance of survival. Types of cancer, gender, marital status, ethnic group, occupation and patient’s age at entry into the study are not important predictors of chances of survival.
|
Olatayo Olusegun Alabi,
Aminat Yetunde Saula,
Ezra Gayawan,
Hamidu Abimbola Bello,
Victor Samuel Alabi,
Rasaq Yinka Akinbo,
Taiwo Abideen Lasisi,
|
2022 |
Download Full Paper |
0 |
A Comparison of the Estimators of the Scale Parameter of the Errors Distribution in the L1 Regression
Show Abstract
Abstract
The L1 regression is a robust alternative to the least squares regression whenever there are outliers in the values of the response variable, or the errors follow a long-tailed distribution. To calculate the standard errors of the L1 estimators, construct confidence intervals and test hypotheses about the parameters of the model, or to calculate a robust coefficient of determination, it is necessary to have an estimate of a scale parameter τ. This parameter is such
that τ2/n is the variance of the median of a sample of size n from the errors distribution. [1] proposed the use of τˆ , a consistent, and so, an asymptotically unbiased estimator of τ. However, this estimator is not stable in small samples, in the sense that it can increase with the introduction of new independent variables in the model. When the errors follow the Laplace distribution, the maximum likelihood estimator of τ, say τˆ∗ , is the mean absolute error, that is, the mean of the absolute residuals. This estimator always decreases when new independent variables are added to the model. Our objective is to develop asymptotic properties of τˆ∗ under several errors distributions analytically. We also performed a simulation study to compare the distributions of both estimators in small samples with the objective to establish
conditions in which τˆ∗ is a good alternative to τˆ for such situations.
|
Carmen D. Saldiva de André,
Silvia Nagib Elian,
|
2022 |
Download Full Paper |
0 |
Artificial Neural Networks for COVID-19 Time Series Forecasting
Show Abstract
Abstract
Today, COVID-19 pandemic has become the greatest worldwide threat, as it spreads rapidly among individuals in most countries around the world. This study concerns the problem of daily prediction of new COVID-19 cases in Italy, aiming to find the best predictive model for daily infection number in countries with a large number of confirmed cases. Finding the most accurate forecasting model would help allocate medical resources, handle the spread of the pandemic and get more prepared in terms of health care systems. We compare the forecasting performance of linear and nonlinear forecasting models using daily COVID-19 data for the period between 22 February 2020 and 10 January 2022. We discuss various forecasting approaches, including an Autoregressive Integrated Moving Average (ARIMA) model, a Nonlinear Autoregressive Neural Network (NARNN) model, a TBATS model and Exponential Smoothing on the data collected from 22 February 2020 to 10 January 2022 and compared their accuracy using the data collected from 26
March 2020 to 04 April 2020, choosing the model with the lowest Mean Absolute Percentage Error (MAPE) value. Since the linear models seem not to easily follow the nonlinear patterns of daily confirmed COVID-19 cases, Artificial Neural Network (ANN) has been successfully applied to solve problems of forecasting nonlinear models. The model has been used for daily prediction of COVID-19 cases for the next 20 days without any additional intervention. The prediction model can be applied to other countries struggling with the COVID-19 pandemic and to any possible future pandemics.
|
Lorena Saliaj,
Eugenia Nissi,
|
2022 |
Download Full Paper |
0 |
On Sample Size Determination When Comparing Two Independent Spearman or Kendall Coefficients
Show Abstract
Abstract
One of the most commonly used statistical methods is bivariate correlation analysis. However, it is usually the case that little or no attention is given to power and sample size considerations when planning a study in which correlation will be the primary analysis. In fact, when we reviewed studies published in clinical research journals in 2014, we found that none of the 111 articles that presented results of correlation analyses included a sample size justification. It is sometimes of interest to compare two correlation coefficients between independent groups. For example, one may wish to compare diabetics and non-diabetics in terms of the correlation of systolic blood pressure with age. Tools for performing power and sample size calculations for the comparison of two independent Pearson correlation coefficients are widely available; however, we were unable to identify any easily accessible tools for power and sample size calculations when comparing two independent Spearman rank correlation coefficients or two independent Kendall coefficients of concordance. In this article, we provide formulas and charts that can be used to calculate the sample size that is needed when testing the hypothesis that two independent Spearman or Kendall coefficients are equal.
|
Justine O. May,
Stephen W. Looney,
|
2022 |
Download Full Paper |
0 |