ML&PR_4: Multinomial Variables | 2.2.
-
date_range 08/05/2020 23:12 infosortMachine_Learning_n_Pattern_Recognitionlabelmlbishopstat
2.2. Multinomial Variables
Binary variables can only describe quantities one of two possible values. Now I will introduce to you the case of possible values. In order to conveniently, we represent them by a -dimensional vector in which the element equals and all remaining elements equal .
For instance, we represent where :
Obviously we have
If we denote the probability of by , the distribution of is:
where , and .
It means:
and:
Now consider a data set , we can show that:
where .
We can consider the joint distribution of the quantities , conditioned on and the total of number observations:
Formula is known that Multinomial distribution, where:
and note that:
2.2.1. The Dirichlet distribution
We now introduce a prior distribution for parameter of multinomial distribution. By inspection of the form of the multinomial distribution, we see that conjugate prior:
where denotes .
We can normalize :
called the Dirichlet distribution.
Multiplying the prior by the likelihood function , we obtain the posterior distribution:
Normalize , we obtain a other Dirichlet distribution:
where .
Reference:
- 2.2 | Pattern Recognition and Machine Learning | C.M.Bishop.