|
FreeLing
3.0
|
Class probabilities sets lexical probabilities for each PoS tag of each word in a sentence. More...
#include <probabilities.h>


Public Member Functions | |
| probabilities (const std::wstring &, const std::wstring &, double) | |
| Constructor. | |
| void | annotate_word (word &) |
| Assign probabilities for each analysis of given word. | |
| void | set_activate_guesser (bool) |
| Turn guesser on/of. | |
| void | analyze (sentence &) |
| Assign probabilities to tags for each word in sentence. | |
| void | analyze (std::list< sentence > &) |
| Assign probabilities to tags for each word in sentences. | |
| sentence | analyze (const sentence &) |
| Assign probabilities to tags for each word in sentence, return copy. | |
| std::list< sentence > | analyze (const std::list< sentence > &) |
| Assign probabilities to tags for each word in sentences, return copy. | |
Private Member Functions | |
| void | smoothing (word &) |
| Smooth probabilities for the analysis of given word. | |
| double | compute_probability (const std::wstring &, double, const std::wstring &) |
| Compute p(tag|suffix) using recursively shorter suffixes. | |
| double | guesser (word &, double) |
| Guess possible tags, keeping some mass for previously assigned tags. | |
Private Attributes | |
| boost::u32regex | RE_PunctNum |
| Auxiliary regexps. | |
| double | ProbabilityThreshold |
| Probability threshold for unknown words tags. | |
| std::wstring | Language |
| double | BiassSuffixes |
| Interpolation factor to favor suffix probabilities versus ambiguity-class probabilities when smoothing known but unobserved words. | |
| double | LidstoneLambda |
| lambda parameter for smoothing via Lidstone's Law | |
| bool | activate_guesser |
| whether to use guesser for unknown words. | |
| std::map< std::wstring, double > | single_tags |
| unigram probabilities | |
| std::map< std::wstring, std::map< std::wstring, double > > | class_tags |
| probabilities for usual ambiguity classes | |
| std::map< std::wstring, std::map< std::wstring, double > > | lexical_tags |
| lexical probabilities for frequent words | |
| std::map< std::wstring, double > | unk_tags |
| list of tags and probabilities to assign to unknown words | |
| std::map< std::wstring, std::map< std::wstring, double > > | unk_suffs |
| list of tag frequencies for unknown word suffixes | |
| double | theeta |
| unknown words suffix smoothing parameter; | |
| std::wstring::size_type | long_suff |
| length of longest suffix | |
Class probabilities sets lexical probabilities for each PoS tag of each word in a sentence.
| probabilities::probabilities | ( | const std::wstring & | Lang, |
| const std::wstring & | probFile, | ||
| double | Threshold | ||
| ) |
Constructor.
Create a probability assignation module, loading appropriate file.
References ERROR_CRASH, util::open_utf8_file(), RE_FZ, TRACE, and util::wstring2double().
| void probabilities::analyze | ( | sentence & | se | ) | [virtual] |
Assign probabilities to tags for each word in sentence.
Annotate probabilities for each analysis of each word in given sentence, using given options.
Implements processor.
References TRACE_SENTENCE.
Referenced by maco::analyze().
| void probabilities::analyze | ( | std::list< sentence > & | ) |
Assign probabilities to tags for each word in sentences.
Reimplemented from processor.
| sentence probabilities::analyze | ( | const sentence & | s | ) |
Assign probabilities to tags for each word in sentence, return copy.
Add probabilities to words in given sentence, return copy.
Reimplemented from processor.
References processor::analyze().
| std::list<sentence> probabilities::analyze | ( | const std::list< sentence > & | ) |
Assign probabilities to tags for each word in sentences, return copy.
Reimplemented from processor.
| void probabilities::annotate_word | ( | word & | w | ) |
Assign probabilities for each analysis of given word.
Annotate probabilities for each analysis of given word.
References word::find_tag_match(), word::found_in_dict(), word::get_form(), word::get_n_analysis(), word::has_retokenizable(), word::select_all_analysis(), and TRACE.
| double probabilities::compute_probability | ( | const std::wstring & | tag, |
| double | prob, | ||
| const std::wstring & | s | ||
| ) | [private] |
Compute p(tag|suffix) using recursively shorter suffixes.
Compute probability of a tag given a word suffix.
References util::double2wstring(), and TRACE.
| double probabilities::guesser | ( | word & | w, |
| double | mass | ||
| ) | [private] |
Guess possible tags, keeping some mass for previously assigned tags.
References word::add_analysis(), util::double2wstring(), word::get_lc_form(), word::get_n_analysis(), analysis::get_short_tag(), word::set_analysis(), analysis::set_prob(), and TRACE.
| void probabilities::set_activate_guesser | ( | bool | b | ) |
Turn guesser on/of.
Turn guesser on/off.
| void probabilities::smoothing | ( | word & | w | ) | [private] |
Smooth probabilities for the analysis of given word.
if using backoff, combine with suffix information to get better estimation
References word::get_form(), word::get_lc_form(), word::get_n_analysis(), TRACE, and WARNING.
bool probabilities::activate_guesser [private] |
whether to use guesser for unknown words.
double probabilities::BiassSuffixes [private] |
Interpolation factor to favor suffix probabilities versus ambiguity-class probabilities when smoothing known but unobserved words.
std::map<std::wstring,std::map<std::wstring,double> > probabilities::class_tags [private] |
probabilities for usual ambiguity classes
std::wstring probabilities::Language [private] |
std::map<std::wstring,std::map<std::wstring,double> > probabilities::lexical_tags [private] |
lexical probabilities for frequent words
double probabilities::LidstoneLambda [private] |
lambda parameter for smoothing via Lidstone's Law
std::wstring::size_type probabilities::long_suff [private] |
length of longest suffix
double probabilities::ProbabilityThreshold [private] |
Probability threshold for unknown words tags.
boost::u32regex probabilities::RE_PunctNum [private] |
Auxiliary regexps.
std::map<std::wstring,double> probabilities::single_tags [private] |
unigram probabilities
double probabilities::theeta [private] |
unknown words suffix smoothing parameter;
std::map<std::wstring,std::map<std::wstring,double> > probabilities::unk_suffs [private] |
list of tag frequencies for unknown word suffixes
std::map<std::wstring,double> probabilities::unk_tags [private] |
list of tags and probabilities to assign to unknown words
1.7.6.1