Subject: hlda command line problem
Date: Sunday 26th February 2012 17:27:02 UTC (over 5 years ago)
Hi all, I am a newbie to mallet. My ultimate goal is to use hierarchical LDA to tag incoming text and generate hierarchical topics to categorize them accordingly. While, when I try the hlda command line as the tutorial (with the sample-data "en" folder), I get the result as follow which I do not not how to interpret: [seg from hlda.out] .................................................. 850 220/13 time including years united career day century mother english dust 224/9 yard equipartition theorem average energy system london parks kinetic years 177/2 rings thylacine tasmanian ring uranus system tiger extinct moons narrow 157/5 national gunnhild norway park king sullivan gilbert standards service wilderness 117/2 zinta role hindi actress film indian grossing naa ho kehna 114/2 battle union army grant gen line position men fighting maj 102/2 hawes confederate kentucky tennessee confederates commonwealth war forces ceremony bragg 65/2 paper thomas storey east newspaper edward world newsprint layout broadsheet 105/2 sunderland test echo cricket hill south australian ended innings record [\seg] .................................................. 900 1.I think the words are the keywords there and what confuse me here is what is the numbers (220/13,177/2 etc.) represents? (There are node.totalTokens and node.customers which I do not know what are those) 2. I tried with --num-top-words 4, but it seems generate same result (with same length of keywords), if I am using a wrong parameter here? 3. How can I tell the keywords generated is corresponding to which part of input file? 4. How is the number of iteration affect the result? Do I need only the result of last iteration (in this case 1000) or every time of it to fully interpret the result? 5. If there is a straightforward way, how can I run hlda on the incoming data directly? (I mean could you help me describe the procedure here? The input is plain text with no meaningful file name and contains different topics. Could you tell me if the following is correct : 1. Do data-import to convert it into .mallet file 2. run hlda command on it 3. How to interpret the result that segment the result to different part of topics) Thanks in advance.