Robust Well-Test Interpretation by Using Nonlinear Regression With Parameter and Data Transformations

SPE Journal ◽  
2011 ◽  
Vol 16 (03) ◽  
pp. 698-712 ◽  
Author(s):  
Aysegul Dastan ◽  
Roland N. Horne

Summary Nonlinear regression is a well-established technique in well-test interpretation. However, this widely used technique is vulnerable to issues commonly observed in real data sets—specifically, sensitivity to noise, parameter uncertainty, and dependence on starting guess. In this paper, we show significant improvements in nonlinear regression by using transformations on the parameter space and the data space. Our techniques improve the accuracy of parameter estimation substantially. The techniques also provide faster convergence, reduced sensitivity to starting guesses, automatic noise reduction, and data compression. In the first part of the paper, we show, for the first time, that Cartesian parameter transformations are necessary for correct statistical representation of physical systems (e.g., the reservoir). Using true Cartesian parameters enables nonlinear regression to search for the optimal solution homogeneously on the entire parameter space, which results in faster convergence and increases the probability of convergence for a random starting guess. Nonlinear regression using Cartesian parameters also reveals inherent ambiguities in a data set, which may be left concealed when using existing techniques, leading to incorrect conclusions. We proposed suitable Cartesian transform pairs for common reservoir parameters and used a Monte Carlo technique to verify that the transform pairs generate Cartesian parameters. The second part of the paper discusses nonlinear regression using the wavelet transformation of the data set. The wavelet transformation is a process that can compress and denoise data automatically. We showed that only a few wavelet coefficients are sufficient for an improved performance and direct control of nonlinear regression. By using regression on a reduced wavelet basis rather than the original pressure data points, we achieved improved performance in terms of likelihood of convergence and narrower confidence intervals. The wavelet components in the reduced basis isolate the key contributors to the response and, hence, use only the relevant elements in the pressure-transient signal. We investigated four different wavelet strategies, which differ in the method of choosing a reduced wavelet basis. Combinations of the techniques discussed in this paper were used to analyze 20 data sets to find the technique or combination of techniques that works best with a particular data set. Using the appropriate combination of our techniques provides very robust and novel interpretation techniques, which will allow for reliable estimation of reservoir parameters using nonlinear regression.

Geophysics ◽  
2000 ◽  
Vol 65 (3) ◽  
pp. 791-803 ◽  
Author(s):  
Weerachai Siripunvaraporn ◽  
Gary Egbert

There are currently three types of algorithms in use for regularized 2-D inversion of magnetotelluric (MT) data. All seek to minimize some functional which penalizes data misfit and model structure. With the most straight‐forward approach (exemplified by OCCAM), the minimization is accomplished using some variant on a linearized Gauss‐Newton approach. A second approach is to use a descent method [e.g., nonlinear conjugate gradients (NLCG)] to avoid the expense of constructing large matrices (e.g., the sensitivity matrix). Finally, approximate methods [e.g., rapid relaxation inversion (RRI)] have been developed which use cheaply computed approximations to the sensitivity matrix to search for a minimum of the penalty functional. Approximate approaches can be very fast, but in practice often fail to converge without significant expert user intervention. On the other hand, the more straightforward methods can be prohibitively expensive to use for even moderate‐size data sets. Here, we present a new and much more efficient variant on the OCCAM scheme. By expressing the solution as a linear combination of rows of the sensitivity matrix smoothed by the model covariance (the “representers”), we transform the linearized inverse problem from the M-dimensional model space to the N-dimensional data space. This method is referred to as DASOCC, the data space OCCAM’s inversion. Since generally N ≪ M, this transformation by itself can result in significant computational saving. More importantly the data space formulation suggests a simple approximate method for constructing the inverse solution. Since MT data are smooth and “redundant,” a subset of the representers is typically sufficient to form the model without significant loss of detail. Computations required for constructing sensitivities and the size of matrices to be inverted can be significantly reduced by this approximation. We refer to this inversion as REBOCC, the reduced basis OCCAM’s inversion. Numerical experiments on synthetic and real data sets with REBOCC, DASOCC, NLCG, RRI, and OCCAM show that REBOCC is faster than both DASOCC and NLCG, which are comparable in speed. All of these methods are significantly faster than OCCAM, but are not competitive with RRI. However, even with a simple synthetic data set, we could not always get RRI to converge to a reasonable solution. The basic idea behind REBOCC should be more broadly applicable, in particular to 3-D MT inversion.


2019 ◽  
Vol 115 (3/4) ◽  
Author(s):  
Douw G. Breed ◽  
Tanja Verster

Segmentation of data for the purpose of enhancing predictive modelling is a well-established practice in the banking industry. Unsupervised and supervised approaches are the two main types of segmentation and examples of improved performance of predictive models exist for both approaches. However, both focus on a single aspect – either target separation or independent variable distribution – and combining them may deliver better results. This combination approach is called semi-supervised segmentation. Our objective was to explore four new semi-supervised segmentation techniques that may offer alternative strengths. We applied these techniques to six data sets from different domains, and compared the model performance achieved. The original semi-supervised segmentation technique was the best for two of the data sets (as measured by the improvement in validation set Gini), but others outperformed for the other four data sets. Significance: We propose four newly developed semi-supervised segmentation techniques that can be used as additional tools for segmenting data before fitting a logistic regression. In all comparisons, using semi-supervised segmentation before fitting a logistic regression improved the modelling performance (as measured by the Gini coefficient on the validation data set) compared to using unsegmented logistic regression.


2014 ◽  
Vol 7 (5) ◽  
pp. 2303-2311 ◽  
Author(s):  
M. Martinez-Camara ◽  
B. Béjar Haro ◽  
A. Stohl ◽  
M. Vetterli

Abstract. Emissions of harmful substances into the atmosphere are a serious environmental concern. In order to understand and predict their effects, it is necessary to estimate the exact quantity and timing of the emissions from sensor measurements taken at different locations. There are a number of methods for solving this problem. However, these existing methods assume Gaussian additive errors, making them extremely sensitive to outlier measurements. We first show that the errors in real-world measurement data sets come from a heavy-tailed distribution, i.e., include outliers. Hence, we propose robustifying the existing inverse methods by adding a blind outlier-detection algorithm. The improved performance of our method is demonstrated on a real data set and compared to previously proposed methods. For the blind outlier detection, we first use an existing algorithm, RANSAC, and then propose a modification called TRANSAC, which provides a further performance improvement.


2011 ◽  
Vol 14 (03) ◽  
pp. 345-356 ◽  
Author(s):  
Abeeb A. Awotunde ◽  
Roland N. Horne

Summary With recent advances in permanent downhole technology, well-test analysts now have to deal with enormous amounts of data. Using these data requires the development of efficient algorithms that are able to extract the relevant information from the data at minimal cost. We present a multiresolution wavelet approach to estimate the spatial distribution of reservoir parameters, by performing the nonlinear least-squares regression in the wavelet domains of both time and space. Wavelet transforms have the ability to reveal important events in time signals or spatial images. Thus, we transformed both the model space and the time-series pressure data into spatial wavelet and time wavelet domains, respectively, and used a thresholding to select a subset of wavelet coefficients from each of the transformed domains. These subsets were used subsequently in nonlinear regression to estimate the appropriate description of reservoir parameters. The appropriate subset is not only smaller; the problem is also reduced to the consideration of only the important components of the measured data and only the part of the reservoir description that depends on them. As a test of the approach, we first applied the model to well-test problems involving 1D (radially composite) reservoir systems. The inverse problem was solved to estimate the distributed permeability values by performing the nonlinear least-squares regression in the wavelet domains (time and space). Results obtained were compared with those obtained from the conventional nonlinear-regression approach, using all the pressure-time data and the full set of spatial reservoir parameters. The time/space wavelet approach proved to be efficient. By reducing the dimensions of the model and data spaces, the approach eliminates redundancy in the reservoir description and in the data set. Significantly, the approach reveals the true number of reservoir parameters that can be appropriately estimated from a given data set and also reveals which components of the full data set are active in constraining the reservoir model. Thus, the approach provides a good means to integrate different data properly while avoiding the inclusion of irrelevant data during nonlinear regression.


2008 ◽  
Vol 2 ◽  
pp. BBI.S358 ◽  
Author(s):  
Jing Hu ◽  
Changhui Yan

α-helical transmembrane (TM) proteins play important and diverse functional roles in cells. The ability to predict the topology of these proteins is important for identifying functional sites and inferring function of membrane proteins. This paper presents a Hidden Markov Model (referred to as HMM_RA) that can predict the topology of α-helical transmembrane proteins with improved performance. HMM_RA adopts the same structure as the HMMTOP method, which has five modules: inside loop, inside helix tail, membrane helix, outside helix tail and outside loop. Each module consists of one or multiple states. HMM_RA allows using reduced alphabets to encode protein sequences. Thus, each state of HMM_RA is associated with n emission probabilities, where n is the size of the reduced alphabet set. Direct comparisons using two standard data sets show that HMM_RA consistently outperforms HMMTOP and TMHMM in topology prediction. Specifically, on a high-quality data set of 83 proteins, HMM_RA outperforms HMMTOP by up to 7.6% in topology accuracy and 6.4% in α-helices location accuracy. On the same data set, HMM_RA outperforms TMHMM by up to 6.4% in topology accuracy and 2.9% in location accuracy. Comparison also shows that HMM_RA achieves comparable performance as Phobius, a recently published method.


2018 ◽  
Vol 154 (2) ◽  
pp. 149-155
Author(s):  
Michael Archer

1. Yearly records of worker Vespula germanica (Fabricius) taken in suction traps at Silwood Park (28 years) and at Rothamsted Research (39 years) are examined. 2. Using the autocorrelation function (ACF), a significant negative 1-year lag followed by a lesser non-significant positive 2-year lag was found in all, or parts of, each data set, indicating an underlying population dynamic of a 2-year cycle with a damped waveform. 3. The minimum number of years before the 2-year cycle with damped waveform was shown varied between 17 and 26, or was not found in some data sets. 4. Ecological factors delaying or preventing the occurrence of the 2-year cycle are considered.


Sign in / Sign up

Export Citation Format

Share Document