menu

ML&PR_4: Multinomial Variables | 2.2.

2.2. Multinomial Variables

2.2. Multinomial Variables
  • Binary variables can only describe quantities one of two possible values. Now I will introduce to you the case of possible values. In order to conveniently, we represent them by a -dimensional vector in which the element equals and all remaining elements equal .

    For instance, we represent where :

    Obviously we have

  • If we denote the probability of by , the distribution of is:

    where , and .

    It means:

    and:

  • Now consider a data set , we can show that:

    where .

  • We can consider the joint distribution of the quantities , conditioned on and the total of number observations:

    Formula is known that Multinomial distribution, where:

    and note that:

2.2.1. The Dirichlet distribution
  • We now introduce a prior distribution for parameter of multinomial distribution. By inspection of the form of the multinomial distribution, we see that conjugate prior:

    where denotes .

  • We can normalize :

    called the Dirichlet distribution.

  • Multiplying the prior by the likelihood function , we obtain the posterior distribution:

    Normalize , we obtain a other Dirichlet distribution:

     

where .

 

Reference:
  • 2.2 | Pattern Recognition and Machine Learning | C.M.Bishop.