Class TdbTextSeparators
- java.lang.Object
-
- com.tietoenator.trip.jxp.database.TdbTextSeparators
-
- All Implemented Interfaces:
com.tietoenator.trip.jxp.internal.utils.DOMSink
public class TdbTextSeparators extends java.lang.Object implements com.tietoenator.trip.jxp.internal.utils.DOMSink
Container for indexing rules, i.e. the classes and characters that "separate" linguistic entities such as sentences and paragraphs.
-
-
Constructor Summary
Constructors Constructor Description TdbTextSeparators()
Constructor, creates a usuable blank containerTdbTextSeparators(TdbTextSeparators src)
Copy constructor
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
clear()
Initializes the instance, removes any pre-existing statevoid
copyFrom(TdbTextSeparators src)
Copy member data from another instancejava.lang.String
getIgnoreChars()
Retrieve any characters that should be ignored whilst parsing sentence boundaries.boolean
getParagraphNeedsBeginOfSentence()
Check if paragraphs are only considered valid if they begin with a validly formed start of sentence.boolean
getParagraphNeedsEndOfSentence()
Check if a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.java.lang.String
getParagraphSeparatorSpec()
Retrieve the specification (in class terms) for the minimum boundary between paragraphs.boolean
getRequiresParagraphCheck()
Check if this database uses paragraph parsing when indexing.boolean
getRequiresSentenceCheck()
Check if this database uses sentence parsing when indexing.java.lang.String
getSentenceBeginChars()
Retrieve any "special" characters that can constitute a valid introduction to a new sentence.java.lang.String
getSentenceBeginSpec()
Retrieve the specification (in class terms) for the beginning of a sentence, i.e. the character types that trigger the beginning of a new sentence.java.lang.String
getSentenceEndChars()
Retrieve any characters that can constitute a valid ending to a sentence.java.lang.String
getSentenceSeparatorSpec()
Retrieve the specification (in class terms) for sentence separators, i.e. character types that are skipped after the end of a sentence when looking for the beginning of the next.void
setIgnoreChars(java.lang.String chars)
Establish any characters that should be ignored whilst parsing sentence boundaries.void
setParagraphNeedsBeginOfSentence(boolean mode)
Establish whether paragraphs are only to be considered valid if they begin with a validly formed start of sentence.void
setParagraphNeedsEndOfSentence(boolean mode)
Establish whether a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.void
setParagraphSeparatorSpec(java.lang.String spec)
Establish the specification (in class terms) for the minimum boundary between paragraphs.void
setRequiresParagraphCheck(boolean mode)
Establish whether this database will use paragraph parsing when indexing.void
setRequiresSentenceCheck(boolean mode)
Establish whether this database is to use sentence parsing when indexing.void
setSentenceBeginChars(java.lang.String chars)
Establish any "special" characters that can constitute a valid introduction to a new sentence.void
setSentenceBeginSpec(java.lang.String spec)
Establish the specification (in class terms) for the beginning of a sentence, i.e. the character types that trigger the beginning of a new sentence.void
setSentenceEndChars(java.lang.String chars)
Establish any characters that can constitute a valid ending to a sentence.void
setSentenceSeparatorSpec(java.lang.String spec)
Establish the specification (in class terms) for sentence separators, i.e. character types that are skipped after the end of a sentence when looking for the beginning of the next.
-
-
-
Constructor Detail
-
TdbTextSeparators
public TdbTextSeparators()
Constructor, creates a usuable blank container
-
TdbTextSeparators
public TdbTextSeparators(TdbTextSeparators src)
Copy constructor- Parameters:
src
- Source from which to copy
-
-
Method Detail
-
getRequiresSentenceCheck
public boolean getRequiresSentenceCheck()
Check if this database uses sentence parsing when indexing.- Returns:
- true if sentence parsing is turned on
-
setRequiresSentenceCheck
public void setRequiresSentenceCheck(boolean mode)
Establish whether this database is to use sentence parsing when indexing.- Parameters:
mode
- true if sentence parsing is to be used
-
getSentenceBeginSpec
public java.lang.String getSentenceBeginSpec()
Retrieve the specification (in class terms) for the beginning of a sentence, i.e. the character types that trigger the beginning of a new sentence. Default is UB (upper case characters and "special" sentence beginning characters (seegetSentenceBeginChars
)- Returns:
- Sentence begin specification
-
setSentenceBeginSpec
public void setSentenceBeginSpec(java.lang.String spec)
Establish the specification (in class terms) for the beginning of a sentence, i.e. the character types that trigger the beginning of a new sentence. Default is UB (upper case characters and "special" sentence beginning characters (seesetSentenceBeginChars
)- Parameters:
spec
- Sentence begin specification
-
getSentenceBeginChars
public java.lang.String getSentenceBeginChars()
Retrieve any "special" characters that can constitute a valid introduction to a new sentence. Default is (<[{«'"- Returns:
- Special sentence introductory characters
-
setSentenceBeginChars
public void setSentenceBeginChars(java.lang.String chars)
Establish any "special" characters that can constitute a valid introduction to a new sentence. Default is (<[{«'"- Parameters:
chars
- Special sentence introductory characters
-
getSentenceSeparatorSpec
public java.lang.String getSentenceSeparatorSpec()
Retrieve the specification (in class terms) for sentence separators, i.e. character types that are skipped after the end of a sentence when looking for the beginning of the next. Default is SN (white space and newline characters).- Returns:
- Sentence separation specification
-
setSentenceSeparatorSpec
public void setSentenceSeparatorSpec(java.lang.String spec)
Establish the specification (in class terms) for sentence separators, i.e. character types that are skipped after the end of a sentence when looking for the beginning of the next. Default is SN (white space and newline characters).- Parameters:
spec
- Sentence separation specification
-
getSentenceEndChars
public java.lang.String getSentenceEndChars()
Retrieve any characters that can constitute a valid ending to a sentence. Default is .!?- Returns:
- Sentence ending characters
-
setSentenceEndChars
public void setSentenceEndChars(java.lang.String chars)
Establish any characters that can constitute a valid ending to a sentence. Default is .!?- Parameters:
chars
- Sentence ending characters
-
getIgnoreChars
public java.lang.String getIgnoreChars()
Retrieve any characters that should be ignored whilst parsing sentence boundaries. Default is )>]}»- Returns:
- Ignored characters
-
setIgnoreChars
public void setIgnoreChars(java.lang.String chars)
Establish any characters that should be ignored whilst parsing sentence boundaries. Default is )>]}»- Parameters:
chars
- Ignored characters
-
getRequiresParagraphCheck
public boolean getRequiresParagraphCheck()
Check if this database uses paragraph parsing when indexing.- Returns:
- true if the database parses paragraphs
-
setRequiresParagraphCheck
public void setRequiresParagraphCheck(boolean mode)
Establish whether this database will use paragraph parsing when indexing.- Parameters:
mode
- true if the database is to parse paragraphs
-
getParagraphNeedsBeginOfSentence
public boolean getParagraphNeedsBeginOfSentence()
Check if paragraphs are only considered valid if they begin with a validly formed start of sentence.- Returns:
- true if a valid start of sentence is required to start a new paragraph
-
setParagraphNeedsBeginOfSentence
public void setParagraphNeedsBeginOfSentence(boolean mode)
Establish whether paragraphs are only to be considered valid if they begin with a validly formed start of sentence.- Parameters:
mode
- true if a valid start of sentence is required to start a new paragraph
-
getParagraphSeparatorSpec
public java.lang.String getParagraphSeparatorSpec()
Retrieve the specification (in class terms) for the minimum boundary between paragraphs. Default is 2N, i.e. two newline equivalents.- Returns:
- Paragraph separator spec
-
setParagraphSeparatorSpec
public void setParagraphSeparatorSpec(java.lang.String spec)
Establish the specification (in class terms) for the minimum boundary between paragraphs. Default is 2N, i.e. two newline equivalents.- Parameters:
spec
- Paragraph separator spec
-
getParagraphNeedsEndOfSentence
public boolean getParagraphNeedsEndOfSentence()
Check if a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.- Returns:
- true if paragraph endings require valid sentence endings
-
setParagraphNeedsEndOfSentence
public void setParagraphNeedsEndOfSentence(boolean mode)
Establish whether a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.- Parameters:
mode
- true if paragraph endings require valid sentence endings
-
clear
public void clear()
Initializes the instance, removes any pre-existing state
-
copyFrom
public void copyFrom(TdbTextSeparators src)
Copy member data from another instance- Parameters:
src
- Source of copy operation
-
-