Class TdbTextSeparators
java.lang.Object
com.tietoenator.trip.jxp.database.TdbTextSeparators
- All Implemented Interfaces:
com.tietoenator.trip.jxp.internal.utils.DOMSink
public class TdbTextSeparators
extends Object
implements com.tietoenator.trip.jxp.internal.utils.DOMSink
Container for indexing rules, i.e. the classes and characters that
"separate" linguistic entities such as sentences and paragraphs.
-
Constructor Summary
ConstructorsConstructorDescriptionConstructor, creates a usuable blank containerCopy constructor -
Method Summary
Modifier and TypeMethodDescriptionvoidclear()Initializes the instance, removes any pre-existing statevoidCopy member data from another instanceRetrieve any characters that should be ignored whilst parsing sentence boundaries.booleanCheck if paragraphs are only considered valid if they begin with a validly formed start of sentence.booleanCheck if a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.Retrieve the specification (in class terms) for the minimum boundary between paragraphs.booleanCheck if this database uses paragraph parsing when indexing.booleanCheck if this database uses sentence parsing when indexing.Retrieve any "special" characters that can constitute a valid introduction to a new sentence.Retrieve the specification (in class terms) for the beginning of a sentence, i.e. the character types that trigger the beginning of a new sentence.Retrieve any characters that can constitute a valid ending to a sentence.Retrieve the specification (in class terms) for sentence separators, i.e. character types that are skipped after the end of a sentence when looking for the beginning of the next.voidsetIgnoreChars(String chars) Establish any characters that should be ignored whilst parsing sentence boundaries.voidsetParagraphNeedsBeginOfSentence(boolean mode) Establish whether paragraphs are only to be considered valid if they begin with a validly formed start of sentence.voidsetParagraphNeedsEndOfSentence(boolean mode) Establish whether a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.voidEstablish the specification (in class terms) for the minimum boundary between paragraphs.voidsetRequiresParagraphCheck(boolean mode) Establish whether this database will use paragraph parsing when indexing.voidsetRequiresSentenceCheck(boolean mode) Establish whether this database is to use sentence parsing when indexing.voidsetSentenceBeginChars(String chars) Establish any "special" characters that can constitute a valid introduction to a new sentence.voidsetSentenceBeginSpec(String spec) Establish the specification (in class terms) for the beginning of a sentence, i.e. the character types that trigger the beginning of a new sentence.voidsetSentenceEndChars(String chars) Establish any characters that can constitute a valid ending to a sentence.voidEstablish the specification (in class terms) for sentence separators, i.e. character types that are skipped after the end of a sentence when looking for the beginning of the next.
-
Constructor Details
-
TdbTextSeparators
public TdbTextSeparators()Constructor, creates a usuable blank container -
TdbTextSeparators
Copy constructor- Parameters:
src- Source from which to copy
-
-
Method Details
-
getRequiresSentenceCheck
public boolean getRequiresSentenceCheck()Check if this database uses sentence parsing when indexing.- Returns:
- true if sentence parsing is turned on
-
setRequiresSentenceCheck
public void setRequiresSentenceCheck(boolean mode) Establish whether this database is to use sentence parsing when indexing.- Parameters:
mode- true if sentence parsing is to be used
-
getSentenceBeginSpec
Retrieve the specification (in class terms) for the beginning of a sentence, i.e. the character types that trigger the beginning of a new sentence. Default is UB (upper case characters and "special" sentence beginning characters (seegetSentenceBeginChars)- Returns:
- Sentence begin specification
-
setSentenceBeginSpec
Establish the specification (in class terms) for the beginning of a sentence, i.e. the character types that trigger the beginning of a new sentence. Default is UB (upper case characters and "special" sentence beginning characters (seesetSentenceBeginChars)- Parameters:
spec- Sentence begin specification
-
getSentenceBeginChars
Retrieve any "special" characters that can constitute a valid introduction to a new sentence. Default is (<[{«'"- Returns:
- Special sentence introductory characters
-
setSentenceBeginChars
Establish any "special" characters that can constitute a valid introduction to a new sentence. Default is (<[{«'"- Parameters:
chars- Special sentence introductory characters
-
getSentenceSeparatorSpec
Retrieve the specification (in class terms) for sentence separators, i.e. character types that are skipped after the end of a sentence when looking for the beginning of the next. Default is SN (white space and newline characters).- Returns:
- Sentence separation specification
-
setSentenceSeparatorSpec
Establish the specification (in class terms) for sentence separators, i.e. character types that are skipped after the end of a sentence when looking for the beginning of the next. Default is SN (white space and newline characters).- Parameters:
spec- Sentence separation specification
-
getSentenceEndChars
Retrieve any characters that can constitute a valid ending to a sentence. Default is .!?- Returns:
- Sentence ending characters
-
setSentenceEndChars
Establish any characters that can constitute a valid ending to a sentence. Default is .!?- Parameters:
chars- Sentence ending characters
-
getIgnoreChars
Retrieve any characters that should be ignored whilst parsing sentence boundaries. Default is )>]}»- Returns:
- Ignored characters
-
setIgnoreChars
Establish any characters that should be ignored whilst parsing sentence boundaries. Default is )>]}»- Parameters:
chars- Ignored characters
-
getRequiresParagraphCheck
public boolean getRequiresParagraphCheck()Check if this database uses paragraph parsing when indexing.- Returns:
- true if the database parses paragraphs
-
setRequiresParagraphCheck
public void setRequiresParagraphCheck(boolean mode) Establish whether this database will use paragraph parsing when indexing.- Parameters:
mode- true if the database is to parse paragraphs
-
getParagraphNeedsBeginOfSentence
public boolean getParagraphNeedsBeginOfSentence()Check if paragraphs are only considered valid if they begin with a validly formed start of sentence.- Returns:
- true if a valid start of sentence is required to start a new paragraph
-
setParagraphNeedsBeginOfSentence
public void setParagraphNeedsBeginOfSentence(boolean mode) Establish whether paragraphs are only to be considered valid if they begin with a validly formed start of sentence.- Parameters:
mode- true if a valid start of sentence is required to start a new paragraph
-
getParagraphSeparatorSpec
Retrieve the specification (in class terms) for the minimum boundary between paragraphs. Default is 2N, i.e. two newline equivalents.- Returns:
- Paragraph separator spec
-
setParagraphSeparatorSpec
Establish the specification (in class terms) for the minimum boundary between paragraphs. Default is 2N, i.e. two newline equivalents.- Parameters:
spec- Paragraph separator spec
-
getParagraphNeedsEndOfSentence
public boolean getParagraphNeedsEndOfSentence()Check if a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.- Returns:
- true if paragraph endings require valid sentence endings
-
setParagraphNeedsEndOfSentence
public void setParagraphNeedsEndOfSentence(boolean mode) Establish whether a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.- Parameters:
mode- true if paragraph endings require valid sentence endings
-
clear
public void clear()Initializes the instance, removes any pre-existing state -
copyFrom
Copy member data from another instance- Parameters:
src- Source of copy operation
-