Class TdbTextSeparators
java.lang.Object
com.tietoenator.trip.jxp.database.TdbTextSeparators
- All Implemented Interfaces:
com.tietoenator.trip.jxp.internal.utils.DOMSink
public class TdbTextSeparators
extends Object
implements com.tietoenator.trip.jxp.internal.utils.DOMSink
Container for indexing rules, i.e. the classes and characters that
"separate" linguistic entities such as sentences and paragraphs.
-
Constructor Summary
ConstructorsConstructorDescriptionConstructor, creates a usuable blank containerCopy constructor -
Method Summary
Modifier and TypeMethodDescriptionvoid
clear()
Initializes the instance, removes any pre-existing statevoid
Copy member data from another instanceRetrieve any characters that should be ignored whilst parsing sentence boundaries.boolean
Check if paragraphs are only considered valid if they begin with a validly formed start of sentence.boolean
Check if a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.Retrieve the specification (in class terms) for the minimum boundary between paragraphs.boolean
Check if this database uses paragraph parsing when indexing.boolean
Check if this database uses sentence parsing when indexing.Retrieve any "special" characters that can constitute a valid introduction to a new sentence.Retrieve the specification (in class terms) for the beginning of a sentence, i.e.Retrieve any characters that can constitute a valid ending to a sentence.Retrieve the specification (in class terms) for sentence separators, i.e.void
setIgnoreChars
(String chars) Establish any characters that should be ignored whilst parsing sentence boundaries.void
setParagraphNeedsBeginOfSentence
(boolean mode) Establish whether paragraphs are only to be considered valid if they begin with a validly formed start of sentence.void
setParagraphNeedsEndOfSentence
(boolean mode) Establish whether a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.void
Establish the specification (in class terms) for the minimum boundary between paragraphs.void
setRequiresParagraphCheck
(boolean mode) Establish whether this database will use paragraph parsing when indexing.void
setRequiresSentenceCheck
(boolean mode) Establish whether this database is to use sentence parsing when indexing.void
setSentenceBeginChars
(String chars) Establish any "special" characters that can constitute a valid introduction to a new sentence.void
setSentenceBeginSpec
(String spec) Establish the specification (in class terms) for the beginning of a sentence, i.e.void
setSentenceEndChars
(String chars) Establish any characters that can constitute a valid ending to a sentence.void
Establish the specification (in class terms) for sentence separators, i.e.
-
Constructor Details
-
TdbTextSeparators
public TdbTextSeparators()Constructor, creates a usuable blank container -
TdbTextSeparators
Copy constructor- Parameters:
src
- Source from which to copy
-
-
Method Details
-
getRequiresSentenceCheck
public boolean getRequiresSentenceCheck()Check if this database uses sentence parsing when indexing.- Returns:
- true if sentence parsing is turned on
-
setRequiresSentenceCheck
public void setRequiresSentenceCheck(boolean mode) Establish whether this database is to use sentence parsing when indexing.- Parameters:
mode
- true if sentence parsing is to be used
-
getSentenceBeginSpec
Retrieve the specification (in class terms) for the beginning of a sentence, i.e. the character types that trigger the beginning of a new sentence. Default is UB (upper case characters and "special" sentence beginning characters (seegetSentenceBeginChars
)- Returns:
- Sentence begin specification
-
setSentenceBeginSpec
Establish the specification (in class terms) for the beginning of a sentence, i.e. the character types that trigger the beginning of a new sentence. Default is UB (upper case characters and "special" sentence beginning characters (seesetSentenceBeginChars
)- Parameters:
spec
- Sentence begin specification
-
getSentenceBeginChars
Retrieve any "special" characters that can constitute a valid introduction to a new sentence. Default is (<[{«'"- Returns:
- Special sentence introductory characters
-
setSentenceBeginChars
Establish any "special" characters that can constitute a valid introduction to a new sentence. Default is (<[{«'"- Parameters:
chars
- Special sentence introductory characters
-
getSentenceSeparatorSpec
Retrieve the specification (in class terms) for sentence separators, i.e. character types that are skipped after the end of a sentence when looking for the beginning of the next. Default is SN (white space and newline characters).- Returns:
- Sentence separation specification
-
setSentenceSeparatorSpec
Establish the specification (in class terms) for sentence separators, i.e. character types that are skipped after the end of a sentence when looking for the beginning of the next. Default is SN (white space and newline characters).- Parameters:
spec
- Sentence separation specification
-
getSentenceEndChars
Retrieve any characters that can constitute a valid ending to a sentence. Default is .!?- Returns:
- Sentence ending characters
-
setSentenceEndChars
Establish any characters that can constitute a valid ending to a sentence. Default is .!?- Parameters:
chars
- Sentence ending characters
-
getIgnoreChars
Retrieve any characters that should be ignored whilst parsing sentence boundaries. Default is )>]}»- Returns:
- Ignored characters
-
setIgnoreChars
Establish any characters that should be ignored whilst parsing sentence boundaries. Default is )>]}»- Parameters:
chars
- Ignored characters
-
getRequiresParagraphCheck
public boolean getRequiresParagraphCheck()Check if this database uses paragraph parsing when indexing.- Returns:
- true if the database parses paragraphs
-
setRequiresParagraphCheck
public void setRequiresParagraphCheck(boolean mode) Establish whether this database will use paragraph parsing when indexing.- Parameters:
mode
- true if the database is to parse paragraphs
-
getParagraphNeedsBeginOfSentence
public boolean getParagraphNeedsBeginOfSentence()Check if paragraphs are only considered valid if they begin with a validly formed start of sentence.- Returns:
- true if a valid start of sentence is required to start a new paragraph
-
setParagraphNeedsBeginOfSentence
public void setParagraphNeedsBeginOfSentence(boolean mode) Establish whether paragraphs are only to be considered valid if they begin with a validly formed start of sentence.- Parameters:
mode
- true if a valid start of sentence is required to start a new paragraph
-
getParagraphSeparatorSpec
Retrieve the specification (in class terms) for the minimum boundary between paragraphs. Default is 2N, i.e. two newline equivalents.- Returns:
- Paragraph separator spec
-
setParagraphSeparatorSpec
Establish the specification (in class terms) for the minimum boundary between paragraphs. Default is 2N, i.e. two newline equivalents.- Parameters:
spec
- Paragraph separator spec
-
getParagraphNeedsEndOfSentence
public boolean getParagraphNeedsEndOfSentence()Check if a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.- Returns:
- true if paragraph endings require valid sentence endings
-
setParagraphNeedsEndOfSentence
public void setParagraphNeedsEndOfSentence(boolean mode) Establish whether a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.- Parameters:
mode
- true if paragraph endings require valid sentence endings
-
clear
public void clear()Initializes the instance, removes any pre-existing state -
copyFrom
Copy member data from another instance- Parameters:
src
- Source of copy operation
-