Analysis of Categorical Panel Data

s In some categorical tables, one of the classifying variables may be at least ordinal (ranked) arising from a follow-up or any similar study. The other classifying variab le(s) may be that which separates the population into groups using variables such as gender, race or location, or a combination of some of them. The counts obtained this way are analyzed recognizing that one of the variables is nearly metric and must be used and interpretation becomes easier when appropriate model is fitted to the arising product multinomial. An example o f such an approach is provided using the data from Tuberculosis Management in a Teaching Hospital. We observed that the recovery rate of females was faster than their males counterpart on the assumption that those discharged through management system follows an exponential distribution.


Introduction
Categorical data are obtained when the variables wh ich are discrete in nature are cross-classified and subjects having the same levels of the cross-classification are aggregated to form counts. Clearly such variables are at most ordinal in nature. Variables that are purely metric are reduced appropriately fo r categorical data analysis to be effected. In a follow-up (longitudinal) study the progression of positive outcome is critical and should be examined.
Cross-classified data can have any of full-mult inomial, hypergeometric, independent Poisson or product mult inomial distributions, Bishop, Feinberg and Ho lland [1], Agresti [2], Sanni and Jo layemi [3], Adeju mo [4] among many authors. All these distributions have fixed, but unknown, parameters. Each underlying distribution is dictated by the sampling scheme, even though the parameter estimates within each are identical as demonstrated by Birch [5], see also Jolayemi et al [6]. It is possible, however, that the parameters involved in the categorical data, have a specific pattern, especially when one or mo re of the categorical variables are metric but of constant interval. A statistical analysis approach for such data may be appropriate to use some models fo r probability outcomes. The model used, if appropriate can then be used to determine termination of management. This approach is in focus in this work.
In this research, the main objective is to examine a model fitting-algorith m for a longitudinal categorical data.
The follow-up data of this form beco mes a panel data if the period for reassessment is constant.
, where (1) Within the foregoing, assume the product mu ltino mial distribution for . Thus (2) where is as represented in 1.1, and such that and Furthermore, assume that for each i, the vector P i has a known or suspected pattern . The mixture model is with a compelling assumption if each is unique, see  Brooks et al.[7], when the variable characterizing the colu mn is ordinal.
The main aim of this study is to test some hypothesis regarding . In particu lar, we assume that is exponential in this research paper with parameters In this formulat ion, where j=1,2, …, c; indicating the outcome of the colu mn variable. If β i < 0, the probability reduces over j (usually indexing time) or over jth follow-up time of constant period. What may be of interest here are various hypotheses regarding . So me of these include.
(i) which represents all r rows are identical before fo llo w-up (ii) wh ich can be interpreted to be identical react ions of the r subpopulation for the intervention of the follow-up.
(iii) is the combination of (i) and (ii) above.
Note that other forms of are possible. Such other forms includes which is essentially used when the response is quadratic. It is also used for studying medical intervention. Let be the likelihood function for . Then, so that the log likelihood L under the constraint in equation (1) is given by (5) where λ is the Langrange mult iplier (indicating the boundary limit ). Clearly the log likelihood of equation (5) [4], has the chi-square distribution with (k-m) degrees of freedom, where k and m are the number of parameters estimated under Ω and Ho respectively.

Esti mation of Parameters
First consider the log likelihood function of equation (5) and let the null hypothesis H o be given by This is equivalent to which represents gender insensitivity. Other forms of H o can be used.
The likelihood function L Ho is given by (6) The normal equations from equation (6) are obtained as follows: Fro m equation (7) it is clear that λ = -n .. Thus equations (7) and (8)    It is easy to note that the vector is given by

And the cell values of the Hessian matrix is given by
Under Ω, the above procedure is obtained for each i. Thus

If under H o is given as
And under Ω is given as

Then
And -2log∆ is given as The above is a demonstration of how to produce software to perform the process for execution.

Empirical Results, Discussions and Conclusions
The method of application of mixture models for the 2-dimensional categorical data is demonstrated using a data set fro m a disease management fro m a hospital, the Univer-  ∑∑ sity of Ilorin Teaching Hospital (UITH), Nigeria, spanning the period between 1996 and 1998 on the management of Tuberculosis patients. The data excluded those who were lost to follow-up, so that, those who were successfully discharged were considered in Tab le 1 using approximated periods. The analysis of the data fo llo wed equation (2) and the imposed models in equation (4). Using the tolerance limit δ = 0.001 for maximu m d ifference in the parameter estimates as dictated by of equation (13), the following estimates were obtained: The likelihood ratio test statistic for of G 2 =-2log∆=15.24 with d=2 degrees of freedom with p -value of 0.001 provided a bad fit This implies that a uniform d istribution cannot be used for both males and females. Consequently, different models e xisted for males and females wh ich were θ 1 and θ 2 . This showed that the period of treatment was gender sensitive.
While males would be treated for seven months the female counterpart would be treated for 4 months.