P. Chen
Please Note
13 records found
1
Battery lifetime prediction is crucial in industrial applications. However, the lack of diversity in training data often poses challenges regarding the robustness and generalization of lifetime predictions for batteries from different batches. Motivated by the early cycle data from lithium-ion batteries, this article proposes a robust transfer learning method by employing a model average framework, where the weights are determined based on the distance between the source domain and the target domain. Kernel regression is used to build the prediction of battery lifetime using early cycle data, and transfer component analysis is utilized to transfer knowledge between different domains. The case study on lithium-ion phosphate/graphite cells demonstrates that the proposed method can mitigate the impact of negative transfer and has superior performance compared to traditional methods.
Process capability analysis plays a critical role in quality control by evaluating how well manufacturing processes meet defined specifications. However, traditional process capability indices (PCIs) rely on assumptions of symmetric tolerances and normally distributed data, which often do not hold in real-world applications and can lead to misleading conclusions. To overcome these limitations, we propose two novel classes of PCIs designed specifically for asymmetric tolerances, complemented by parametric estimation procedures and asymptotic confidence limits. To address the issue of non-normal data, we further employ an inverse transformation via constrained B-spline regression, which removes the need for the normality assumption. We demonstrate that our proposed PCIs reduce to traditional indices under symmetric conditions and normal data while extending applicability to a broader range of cases. Numerical simulations and a real-world application in an electronics company confirm the effectiveness and practical utility of our approach.
The multicategory support vector machine (MSVM) has been widely used for multicategory classification. Despite its widespread popularity, regular MSVM cannot provide direct probabilistic results and suffers from excessive computational cost, as it is formulated on the hinge loss function and it solves a sum-to-zero constrained quadratic programming problem. In this study, we propose a general refinement of regular MSVM, termed as the simplex-based proximal MSVM (SPMSVM). Our SPMSVM uses a novel family of squared error loss functions in place of the hinge loss and it removes the explicit sum-to-zero constraint by the simplex structure. Consequently, the SPMSVM only requires solving an unconstrained linear system, leading to closed-form solutions. In addition, the SPMSVM can be cast into a weighted regression problem so that it is scalable for large-scale applications. Moreover, the SPMSVM naturally yields an estimate of the conditional category probability, which is more informative than regular MSVM. Theoretically, the SPMSVM is shown to include many existing MSVMs as its special cases, and its asymptotic and finite-sample statistical properties are well established. Simulations and real examples show that the proposed SPMSVM is a stable, scalable and competitive classifier.
Multinomial logistic regression models are popular in multicategory classification analysis, but existing models suffer several intrinsic drawbacks. In particular, the parameters cannot be determined uniquely because of the over-specification. Although additional constraints have been imposed to refine the model, such modifications can be inefficient and complicated. In this paper, we propose a novel and efficient simplex-based multinomial logistic regression technique, seamlessly connecting binomial and multinomial cases under a unified framework. Compared with existing models, our model has fewer parameters, is free of any constraints, and can be solved efficiently using the Fisher scoring algorithm. In addition, the proposed model enjoys several theoretical advantages, including Fisher consistency and sharp comparison inequality. Under mild conditions, we establish the asymptotical normality and convergence for the new model, even when the numbers of categories and covariates increase with the sample size. The proposed framework is illustrated by means of extensive simulations and real applications.
Return of products within the warranty coverage induces additional cost and loss of reputation to manufacturers. It is of practical interest to predict the return rate by experimental means before introducing a product to the market. In this paper, we propose to optimize accelerated reliability tests to achieve the goal within limited time. To describe the heterogeneity in the customers’ usage mode, a discrete random variable is employed to model the degradation rate in addition to the continuous stress variable. To further characterize the heterogeneity in the customers’ behavior, two models of product return are investigated: one assumes that customers return products once the degradation level reaches the minimum eligible return threshold and the other assumes that the threshold varies among different customers. Optimal reliability tests are planned under the large-sample assumption with two novel test schemes: global optimal planning and stress constrained planning. Insights regarding the optimal plans are gleaned to ameliorate the test planning procedure and verify the optimality. A real example from the battery industry is then presented along with the simulation study and sensitivity analysis to demonstrate the methods. We find that the randomness in return level results in different test plans. Furthermore, the constrained optimal plans offer more robustness to the compromise plans.
The rate reduction models have been widely used to model the recurrent failure data for their capabilities in quantifying the repair effects. Despite the widespread popularity, there have been limited studies on statistical inference of most failure rate reduction models. In view of this fact, this study proposes a semiparametric estimation framework for a general class of such models, called extended geometric failure rate reduction (EGFRR) models. Covariates are considered in our analysis and their effects are modeled as a log-linear factor on the baseline failure rate. Unlike the existing inference methods for the EGFRR models that assume the failure data are censored at a fixed number of failures, our study considers covariates and time-censoring, which are more common in practice. The semiparametric maximum likelihood (ML) estimators are obtained by carefully constructing the likelihood function. Asymptotic properties including consistency and weak convergence of the ML estimators are established by using the properties of the martingale process. In addition, we show that the semiparametric estimators are asymptotically efficient. A real example from the automobile industry illustrates the usefulness of the proposed framework and extensive simulations show its outstanding performance when comparing with the existing methods.
A comprehensive toolbox for the gamma distribution
The gammadist package
The gamma distribution is one of the most important parametric models in probability theory and statistics. Although a multitude of studies have theoretically investigated the properties of the gamma distribution in the literature, there is still a serious lack of tailored statistical tools to facilitate its practical applications. To fill the gap, this paper develops a comprehensive R package for the gamma distribution. In specific, the R package focuses on the following three important tasks: generate the gamma random variables, estimate the model parameters, and construct statistical limits, including confidence limits, prediction limits, and tolerance limits based on the gamma random variables. The proposed package encompasses the state-of-the-art methods of the gamma distribution in the literature and its usage is illustrated by a real application.
This study proposes a framework to analyze accelerated degradation testing (ADT) data in the presence of inspection effects. Motivated by a real dataset from the electric industry, we study two types of effects induced by inspections. After each inspection, the system degradation level instantaneously reduces by a random value. Meanwhile, the degrading rate is elevated afterwards. Considering the absence of observations due to practical reasons, we employ the expectation–maximization (EM) algorithm to analytically estimate the unknown parameters in a stepwise Wiener degradation process with covariates. Moreover, to maintain the level of generality for the adaption of the method in various scenarios, a confidence density approach is utilized to hierarchically estimate the parameters in the acceleration link function. The proposed methods can provide efficient parameter estimation under complex link functions with multiple stress factors. Further, confidence intervals are derived based on the large-sample approximation. Simulation studies and a case study from Schneider Electric are used to illustrate the proposed methods. The results show that the proposed model yields a remarkably better fit to the Schneider data in comparison to the conventional Wiener ADT model.
Coronavirus disease-2019 (COVID-19) poses a significant threat to the population and urban sustainability worldwide. The surge mitigation is complicated and associates many factors, including the pandemic status, policy, socioeconomics and resident behaviours. Modelling and analytics with spatial-temporal big urban data are required to assist the mitigation of the pandemic. This study proposes a novel perspective to analyse the spatial-temporal potential exposure risk of residents by capturing human behaviours based on spatial-temporal car park availability data. Near real-time data from 1,904 residential car parks in Singapore, a classical megacity, are collected to analyse car mobility and its spatial-temporal heat map. The implementation of the circuit breaker, a COVID-19 measure, in Singapore has reduced the mobility and heat (daily frequency of mobility) significantly at about 30.0%. It contributes to a 44.3%–55.4% reduction in the transportation-related air emissions under two scenarios of travelling distance reductions. Urban sustainability impacts in both environment and economy are discussed. The spatial-temporal potential exposure risk mapping with space-time interactions is further investigated via an extended Bayesian spatial-temporal regression model. The maximal reduction rate of the defined potential exposure risk lowers to 37.6% by comparison with its peak value. The big data analytics of changes in car mobility behaviour and the resultant potential exposure risks can provide insights to assist in (a) designing a flexible circuit breaker exit strategy, (b) precise management via identifying and tracing hotspots on the mobility heat map, and (c) making timely decisions by fitting curves dynamically in different phases of COVID-19 mitigation. The proposed method has the potential to be used by decision-makers worldwide with available data to make flexible regulations and planning.
The performance of units in the same batch can exhibit considerable heterogeneity due to the variation in the raw materials and fluctuation in the manufacturing process. For products suffering performance degradation in their use, such heterogeneity often results in an increase in the dispersion of the degradation paths of units in a population. The degradation rate of products can be unit-specific and often treated as random effects. This paper develops a novel random-effects Wiener process model to account for the unit-to-unit heterogeneity in the degradation, where the generalized inverse Gaussian (GIG) distribution is used to model the unit-specific degradation rate. The GIG distribution is a very general distribution with broad applications, which includes the inverse Gaussian (IG) distribution and the Gamma distribution as special cases. We investigate the model properties and develop an expectation maximization (EM) algorithm for parameter estimation. By comparing the proposed model with existing models on two real degradation datasets of the infrared LEDs and the GaAs lasers, we show that the proposed model is quite effective for degradation modeling with heterogeneous rates.
Dengue has been as an endemic with year-round presence in Singapore. In the recent years 2013, 2014, and 2016, there were several severe dengue outbreaks, posing serious threat to the public health. To proactively control and mitigate the disease spread, early warnings of dengue outbreaks, at which there are rapid and large-scale spread of dengue incidences, are extremely helpful. In this study, a two-step framework is proposed to predict dengue outbreaks and it is evaluated based on the dengue incidences in Singapore during 2012 to 2017. First, a generalized additive model (GAM) is trained based on the weekly dengue incidence data during 2006 to 2011. The proposed GAM is a one-week-ahead forecasting model, and it inherently accounts for the possible correlation among the historical incidence data, making the residuals approximately normally distributed. Then, an exponentially weighted moving average (EWMA) control chart is proposed to sequentially monitor the weekly residuals during 2012 to 2017. Our investigation shows that the proposed two-step framework is able to give persistent signals at the early stage of the outbreaks in 2013, 2014, and 2016, which provides early alerts of outbreaks and wins time for the early interventions and the preparation of necessary public health resources. In addition, extensive simulations show that the proposed method is comparable to other potential outbreak detection methods and it is robust to the underlying data-generating mechanisms.
The remaining useful lifetime (RUL) estimated from the in-situ degradation data has shown to be useful for online predictive maintenance. In the literature, the RUL is often estimated by assuming a soft-failure threshold for the degradation data. In practice, however, systems may not be subject to the degradation-induced soft failures. Instead, the systems are deemed to be fail when they cannot perform the intended function, and such failures are known as hard failures. Because there are no fixed thresholds for hard failures, the corresponding RUL estimation is not an easy task, which causes difficulties in finding the optimal maintenance schedule. In this study, a Weibull proportional hazards model is proposed to jointly model the degradation data and the failure time data. The degradation data are treated as the time-varying covariates so that the degradation does not directly lead to system failures, but increases the hazard rate of hard failures. A random-effects Wiener process is proposed to model the degradation data by considering the system heterogeneities. Based on the developed proportional hazards model, closed-form distribution of the RUL is derived upon each inspection and the optimal maintenance schedule is then obtained by minimizing the system maintenance cost. The proposed maintenance strategy is successfully applied to predictive maintenance of lead-acid batteries.
Field data provide important information about product quality and reliability. Many large organizations have developed ambitious reliability databases to trace field failure data of a variety of components on the systems they operate and maintain. Due to the exponential distribution assumption for the component lifetimes, the data in these databases are often aggregated. Specifically, individual lifetimes of the components are not available. Instead, each recorded data point is the cumulative operating time of one component position from system installation to the last component replacement, and the number of replacements in between. In the literature, the gamma distribution and the inverse Gaussian (IG) distribution have been used to fit the aggregate data, while the operating environment of different systems is often assumed the same. In order to capture possible heterogeneities among the systems, this study proposes the gamma random effects model and the IG random effects model. The expectation-maximization algorithm is used for point estimation of the parameters and an algorithm based on the generalized fiducial inference method is proposed for interval estimation. Simulation studies are conducted to assess the performance of the proposed inference methods. A real aggregate dataset is used for illustration.