Home
Reading
Searching
Subscribe
Sponsors
Statistics
Posting
Contact
Spam
Lists
Links
About
Hosting
Filtering
Features Download
Marketing
Archives
FAQ
Blog
 
Gmane
From: David Campos <david.marques.campos <at> gmail.com>
Subject: Semi-supervised CRF
Newsgroups: gmane.comp.ai.mallet.devel
Date: Friday 24th February 2012 19:18:25 UTC (over 5 years ago)
Hi all,
I'm training a semi-supervised CRF using the CRFTrainerByLikelihoodAndGE.
I'm using OneLabelKLGEConstraints, using the code provided in the
documentation webpage (http://mallet.cs.umass.edu/semi-sup-fst.php).
Thus, I generate constraints as: IniCap B:0.5 I:0.4 O:0.1.

I'm using 2000 constraints with a data alphabet of 5000 features.

Sometimes I can train the semi-supervised CRF with no problems.
However, when I change the probabilities of the constraints or I use a
higher number of constraints (e.g., 3000), I get the following error:

Gathering constraints...
Creating 8 threads for updating gradient...
getValue() (loglikelihood, optimizable by label likelihood)
=-859.6896206400652
Done computing lattices.
Done computing expectations.
Done computing gradient.
Done computing regularization.
GE Value = -Infinity
getValue() = -Infinity
Exception in thread "Thread-3" java.lang.AssertionError
Ended the NERO experiment.
at
cc.mallet.optimize.BackTrackLineSearch.optimize(BackTrackLineSearch.java:91)
 at
cc.mallet.optimize.LimitedMemoryBFGS.optimize(LimitedMemoryBFGS.java:142)
at
cc.mallet.fst.semi_supervised.CRFTrainerByLikelihoodAndGE.train(CRFTrainerByLikelihoodAndGE.java:128)
 at pt.ua.ieeta.nero.crf.CRFModel.train(CRFModel.java:147)
at
pt.ua.ieeta.nero.crf.CRFFitnessAssessor.getFitness(CRFFitnessAssessor.java:98)
 at
pt.ua.ieeta.nero.sa.SimulatedAnnealing.calculateEnergy(SimulatedAnnealing.java:214)
at
pt.ua.ieeta.nero.sa.SimulatedAnnealing.runSimulatedAnnealing(SimulatedAnnealing.java:136)
 at pt.ua.ieeta.nero.sa.SimulatedAnnealing.run(SimulatedAnnealing.java:284)
at java.lang.Thread.run(Thread.java:680)


Any idea?

Thanks for your help.
Cheers,
--
David Campos
PhD student
Bioinformatics Group, IEETA
University of Aveiro
www.davidcampos.org
 
CD: 3ms