Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: Qitong Hu <huqitong <at> videosemantics.com>
Subject: hlda command line problem
Newsgroups: gmane.comp.ai.mallet.devel
Date: Sunday 26th February 2012 17:27:02 UTC (over 5 years ago)
Hi all,
I am a newbie to mallet. My ultimate goal is to use hierarchical LDA to tag
incoming text and generate hierarchical topics to categorize them
accordingly.
While, when I try the hlda command line as the tutorial (with the
sample-data "en" folder), I get the result as follow which I do not not how
to interpret:

[seg from hlda.out]
.................................................. 850
220/13 time including years united career day century mother english dust
  224/9 yard equipartition theorem average energy system london parks
kinetic years
    177/2 rings thylacine tasmanian ring uranus system tiger extinct moons
narrow
    157/5 national gunnhild norway park king sullivan gilbert standards
service wilderness
    117/2 zinta role hindi actress film indian grossing naa ho kehna
  114/2 battle union army grant gen line position men fighting maj
    102/2 hawes confederate kentucky tennessee confederates commonwealth
war forces ceremony bragg
  65/2 paper thomas storey east newspaper edward world newsprint layout
broadsheet
    105/2 sunderland test echo cricket hill south australian ended innings
record
[\seg]
.................................................. 900

1.I think the words are the keywords there and what confuse me here is what
is the numbers (220/13,177/2 etc.) represents? (There are node.totalTokens
and node.customers which I do not know what are those)
2. I tried with --num-top-words 4, but it seems generate same result (with
same length of keywords), if I am using a wrong parameter here?
3. How can I tell the keywords generated is corresponding to which part of
input file?
4. How is the number of iteration affect the result? Do I need only the
result of last iteration (in this case 1000) or every time of it to fully
interpret the result?
5. If there is a straightforward way, how can I run hlda on the incoming
data directly? (I mean could you help me describe the procedure here? The
input is plain text with no meaningful file name and contains different
topics. Could you tell me if the following is correct : 1. Do data-import
to convert it into .mallet file 2. run hlda command on it 3. How to
interpret the result that segment the result to different part of topics)

Thanks in advance.
 
CD: 2ms