net.sf.textkit4j.matching
Class NGramFactory
java.lang.Object
net.sf.textkit4j.matching.NGramFactory
- Direct Known Subclasses:
- CharacterNGramFactory, WordNGramFactory
public abstract class NGramFactory
- extends java.lang.Object
Generates a map of n-grams and their count in the supplied text. The supplied
text can be lower-cased, white-space can be collapsed into a single
white-space, and punctuation can be filtered out, depending on how this is
configured.
- Author:
- rich
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
NGramFactory
public NGramFactory()
unigrams
public NGrams unigrams(java.lang.String text)
- Parameters:
text
-
- Returns:
bigrams
public NGrams bigrams(java.lang.String text)
- Parameters:
text
-
- Returns:
trigrams
public NGrams trigrams(java.lang.String text)
- Parameters:
text
-
- Returns:
isLowerCase
public boolean isLowerCase()
setLowerCase
public void setLowerCase(boolean lowerCase)
isCollapseWhiteSpace
public boolean isCollapseWhiteSpace()
setCollapseWhiteSpace
public void setCollapseWhiteSpace(boolean collapseWhiteSpace)
isStripPunctuation
public boolean isStripPunctuation()
setStripPunctuation
public void setStripPunctuation(boolean stripPunctuation)
Copyright © 2009 All Eight, LLC. All Rights Reserved.