Class TdbTextSeparators

java.lang.Object
com.tietoenator.trip.jxp.database.TdbTextSeparators
All Implemented Interfaces:
com.tietoenator.trip.jxp.internal.utils.DOMSink

public class TdbTextSeparators extends Object implements com.tietoenator.trip.jxp.internal.utils.DOMSink
Container for indexing rules, i.e. the classes and characters that "separate" linguistic entities such as sentences and paragraphs.
  • Constructor Details

    • TdbTextSeparators

      public TdbTextSeparators()
      Constructor, creates a usuable blank container
    • TdbTextSeparators

      public TdbTextSeparators(TdbTextSeparators src)
      Copy constructor
      Parameters:
      src - Source from which to copy
  • Method Details

    • getRequiresSentenceCheck

      public boolean getRequiresSentenceCheck()
      Check if this database uses sentence parsing when indexing.
      Returns:
      true if sentence parsing is turned on
    • setRequiresSentenceCheck

      public void setRequiresSentenceCheck(boolean mode)
      Establish whether this database is to use sentence parsing when indexing.
      Parameters:
      mode - true if sentence parsing is to be used
    • getSentenceBeginSpec

      public String getSentenceBeginSpec()
      Retrieve the specification (in class terms) for the beginning of a sentence, i.e. the character types that trigger the beginning of a new sentence. Default is UB (upper case characters and "special" sentence beginning characters (see getSentenceBeginChars)
      Returns:
      Sentence begin specification
    • setSentenceBeginSpec

      public void setSentenceBeginSpec(String spec)
      Establish the specification (in class terms) for the beginning of a sentence, i.e. the character types that trigger the beginning of a new sentence. Default is UB (upper case characters and "special" sentence beginning characters (see setSentenceBeginChars)
      Parameters:
      spec - Sentence begin specification
    • getSentenceBeginChars

      public String getSentenceBeginChars()
      Retrieve any "special" characters that can constitute a valid introduction to a new sentence. Default is (<[{«'"
      Returns:
      Special sentence introductory characters
    • setSentenceBeginChars

      public void setSentenceBeginChars(String chars)
      Establish any "special" characters that can constitute a valid introduction to a new sentence. Default is (<[{«'"
      Parameters:
      chars - Special sentence introductory characters
    • getSentenceSeparatorSpec

      public String getSentenceSeparatorSpec()
      Retrieve the specification (in class terms) for sentence separators, i.e. character types that are skipped after the end of a sentence when looking for the beginning of the next. Default is SN (white space and newline characters).
      Returns:
      Sentence separation specification
    • setSentenceSeparatorSpec

      public void setSentenceSeparatorSpec(String spec)
      Establish the specification (in class terms) for sentence separators, i.e. character types that are skipped after the end of a sentence when looking for the beginning of the next. Default is SN (white space and newline characters).
      Parameters:
      spec - Sentence separation specification
    • getSentenceEndChars

      public String getSentenceEndChars()
      Retrieve any characters that can constitute a valid ending to a sentence. Default is .!?
      Returns:
      Sentence ending characters
    • setSentenceEndChars

      public void setSentenceEndChars(String chars)
      Establish any characters that can constitute a valid ending to a sentence. Default is .!?
      Parameters:
      chars - Sentence ending characters
    • getIgnoreChars

      public String getIgnoreChars()
      Retrieve any characters that should be ignored whilst parsing sentence boundaries. Default is )>]}»
      Returns:
      Ignored characters
    • setIgnoreChars

      public void setIgnoreChars(String chars)
      Establish any characters that should be ignored whilst parsing sentence boundaries. Default is )>]}»
      Parameters:
      chars - Ignored characters
    • getRequiresParagraphCheck

      public boolean getRequiresParagraphCheck()
      Check if this database uses paragraph parsing when indexing.
      Returns:
      true if the database parses paragraphs
    • setRequiresParagraphCheck

      public void setRequiresParagraphCheck(boolean mode)
      Establish whether this database will use paragraph parsing when indexing.
      Parameters:
      mode - true if the database is to parse paragraphs
    • getParagraphNeedsBeginOfSentence

      public boolean getParagraphNeedsBeginOfSentence()
      Check if paragraphs are only considered valid if they begin with a validly formed start of sentence.
      Returns:
      true if a valid start of sentence is required to start a new paragraph
    • setParagraphNeedsBeginOfSentence

      public void setParagraphNeedsBeginOfSentence(boolean mode)
      Establish whether paragraphs are only to be considered valid if they begin with a validly formed start of sentence.
      Parameters:
      mode - true if a valid start of sentence is required to start a new paragraph
    • getParagraphSeparatorSpec

      public String getParagraphSeparatorSpec()
      Retrieve the specification (in class terms) for the minimum boundary between paragraphs. Default is 2N, i.e. two newline equivalents.
      Returns:
      Paragraph separator spec
    • setParagraphSeparatorSpec

      public void setParagraphSeparatorSpec(String spec)
      Establish the specification (in class terms) for the minimum boundary between paragraphs. Default is 2N, i.e. two newline equivalents.
      Parameters:
      spec - Paragraph separator spec
    • getParagraphNeedsEndOfSentence

      public boolean getParagraphNeedsEndOfSentence()
      Check if a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.
      Returns:
      true if paragraph endings require valid sentence endings
    • setParagraphNeedsEndOfSentence

      public void setParagraphNeedsEndOfSentence(boolean mode)
      Establish whether a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.
      Parameters:
      mode - true if paragraph endings require valid sentence endings
    • clear

      public void clear()
      Initializes the instance, removes any pre-existing state
    • copyFrom

      public void copyFrom(TdbTextSeparators src)
      Copy member data from another instance
      Parameters:
      src - Source of copy operation