Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Laura Dietz <dietz <at> cs.umass.edu>
Subject: Re: Topics over time questions
Newsgroups: gmane.comp.ai.mallet.devel
Date: Friday 24th February 2012 19:26:14 UTC (over 5 years ago)
Hi Dan, Hi Corey,

As timestamps t_{di} are drawn from a Beta, they have to be normalized
to [0,1]. (see Step 2c in the generative process)
The Beta distribution has some configurations for which sampling values
0 or 1 are actually fairly high. (Wikipedia has some examples) Hope is
that the parameters are learned to capture whether a topic is hot near
the ends of your time range.

Method of moments estimators game me quite some head ache in terms of
stability/robustness. Sometimes I get extreme values eventually
resulting in NaN. I switched to one of the other hyperparameter
estimation method (e.g. Tom Minka's histogramm method), which are also
part of mallet.

Cheers,
Laura


On 2/24/12 2:15 PM, dan wrote:
>
> On Fri, Feb 24, 2012 at 9:33 AM, Corey Arnold  > wrote:
>
>     I am aware there is no implementation of Topics Over Time (Wang and
>     McCallum, 2006) in MALLET, but I thought this may be a good place to
>     ask questions about it nonetheless.
>
>     1. The paper does not provide much detail on how document timestamps
>     are normalized. My thought was that they are scaled to [0,1], but I
am
>     then unsure of how to handle documents with 0 and 1 timestamps so
that
>     they have some probability.
>
> For this, I just chose some fixed values some small epsilon from 0 and 1.
> For example, I set any timestamp equal to 0 to 0.00001 and any timestamp
> equal to 1.0 to 0.99999.
>  
>
>
>     2. When updating the parameters for the beta distribution using the
>     method of moments I get negative values for seemingly reasonable
>     average timestamps and variances. Have others run into this? Would
>     someone recommend an alternate parameterization?
>
> The method of moments fails in two cases:
> 1) when the variance becomes 0 then the method of moments calculation
>     has a division by zero.  This is actually fairly common during the
> early
>     stages of inference, in the case where all of the tokens assigned
> to a topic
>     end up coming from the same document.
> 2) when the variance is greater than the mean, the MOM produces
>     negative-valued estimates for the shape parameters (which is
> invalid for the
>     Beta distribution).
>  
>
>
> --dan
>
>
>     Thank you,
>     Corey
>     ---------------------------------------------------
>     ---------------------------------------------------
>     CONTRIBUTIONS: Mail to [email protected]
>     
>     UNSUBSCRIBE: Send "unsubscribe mallet-dev"  to
>     [email protected] 
>     PROBLEMS: Report to [email protected]
>     
>     TO SUBSCRIBE: Send "subscribe mallet-dev" to
>     [email protected] 
>
>
 
CD: 3ms