The below is by Karen Grace-Martin and gives excellent comprehensible advice. It is copied from here.

# When the Hessian Matrix goes Wacky

(by Karen Grace-Martin)

If you have run mixed models much at all, you have undoubtedly been haunted by some version of this very obtuse warning: “The Hessian (or G or D) Matrix is not positive definite. Convergence has stopped.” Or “The Model has not Converged. Parameter Estimates from the last iteration are displayed.” What on earth does that mean? Let’s start with some background. If you’ve never taken matrix algebra, these concepts can be overwhelming, so I’m going to simplify them into the basic issues that arise for you, the data analyst. If you’d like a more mathematical and thorough answer, see one of the references.

The D Matrix (called G by SAS) is the matrix of the variances and covariances of the random effects. The variances are listed on the diagonal of the matrix and the covariances are on the off-diagonal. So a model with a random intercept and random slope (two random effects) would have a 2×2 D matrix. The variances of the intercept and slope would be on the diagonal and their covariance would be in the one non-diagonal place. Without getting into the math, a matrix can only be positive definite if the entries on the main diagonal are non-zero and positive. This makes sense for a D matrix, because we definitely want variances to be positive (remember variances are squared values). The Hessian Matrix is based on the D Matrix, and is used to compute the standard errors of the covariance parameters. The iterative algorithms that estimate these parameters are pretty complex, and they get stuck if the Hessian Matrix doesn’t have those same positive diagonal entries. The result is you can’t trust the reported results, no matter how much they look like the results you usually get. The software was unable to come up with stable estimates. It means that for some reason, the model you specified can’t be estimated properly with your data. Whatever you do, don’t ignore this warning. As cryptic as it is, it’s important. What do I do about it?

One simple solution is to check the scaling of your predictor variables. If they’re on vastly different scales, the model might have trouble calculating variances. So if they differ by an order of magnitude (or more), you may need to simply change the scaling of a predictor. Third, when this warning appears, you will often notice some covariance estimates are either 0 or have no estimate or no standard errors at all. (In my experience, this is almost always the cause). This is important information. If the best estimate for a variance is 0, it means there really isn’t any variation in the data for that effect. For example, perhaps the slopes don’t really differ across individuals, and a random intercept captures all the variation. When this happens, you need to respecify the random parts of the model. It may mean you need to remove a random effect. Sometimes even when a random effect ought to be included because of the design, there just isn’t any variation in the data. Or you may need to use a simpler covariance structure with fewer unique parameters. Here’s an example that came up recently in consulting. The researcher surveyed parents about their kids’ experience in school. The parents were sampled within classrooms, and the design indicated including a random intercept for class, to account for the fact that parents of kids in the same class may be more similar to each other than would be the case in a simple random sample. If that were true, we’d want to estimate the variance among classrooms. But it wasn’t.

It turned out that the responses of parents from the same classroom were not any more similar than parents from different classrooms. The variance for classroom was 0—the model was unable to uniquely estimate any variation from classroom to classroom, above and beyond the residual variance from parent to parent. Another option, if the design and your hypotheses allow it, is to run a population-averaged model instead of a mixed model. Population averaged models don’t actually contain random effects, but they will account for correlations within multiple responses by individuals, and they have less strict mathematical requirements. Population-averaged models can be implemented in both SAS and SPSS by using a Repeated Statement instead of a Random statement in Mixed.

For more information and more options, read:

West, B., Welch, K, and Galecki, A. (2007). Linear Mixed Models: A Practical Guide Using Statistical Software. Chapman & Hall. Long, J.S. (1997). Regression models for categorical and limited dependent variables. Sage Publications. Gill, J. & King, G. (2004). SOCIOLOGICAL METHODS & RESEARCH, (33)1, 54-87. http://gking.harvard.edu/files/help.pdf