Class TdbTextSeparators

  • All Implemented Interfaces:
    com.tietoenator.trip.jxp.internal.utils.DOMSink

    public class TdbTextSeparators
    extends java.lang.Object
    implements com.tietoenator.trip.jxp.internal.utils.DOMSink
    Container for indexing rules, i.e. the classes and characters that "separate" linguistic entities such as sentences and paragraphs.
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      void clear()
      Initializes the instance, removes any pre-existing state
      void copyFrom​(TdbTextSeparators src)
      Copy member data from another instance
      java.lang.String getIgnoreChars()
      Retrieve any characters that should be ignored whilst parsing sentence boundaries.
      boolean getParagraphNeedsBeginOfSentence()
      Check if paragraphs are only considered valid if they begin with a validly formed start of sentence.
      boolean getParagraphNeedsEndOfSentence()
      Check if a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.
      java.lang.String getParagraphSeparatorSpec()
      Retrieve the specification (in class terms) for the minimum boundary between paragraphs.
      boolean getRequiresParagraphCheck()
      Check if this database uses paragraph parsing when indexing.
      boolean getRequiresSentenceCheck()
      Check if this database uses sentence parsing when indexing.
      java.lang.String getSentenceBeginChars()
      Retrieve any "special" characters that can constitute a valid introduction to a new sentence.
      java.lang.String getSentenceBeginSpec()
      Retrieve the specification (in class terms) for the beginning of a sentence, i.e. the character types that trigger the beginning of a new sentence.
      java.lang.String getSentenceEndChars()
      Retrieve any characters that can constitute a valid ending to a sentence.
      java.lang.String getSentenceSeparatorSpec()
      Retrieve the specification (in class terms) for sentence separators, i.e. character types that are skipped after the end of a sentence when looking for the beginning of the next.
      void setIgnoreChars​(java.lang.String chars)
      Establish any characters that should be ignored whilst parsing sentence boundaries.
      void setParagraphNeedsBeginOfSentence​(boolean mode)
      Establish whether paragraphs are only to be considered valid if they begin with a validly formed start of sentence.
      void setParagraphNeedsEndOfSentence​(boolean mode)
      Establish whether a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.
      void setParagraphSeparatorSpec​(java.lang.String spec)
      Establish the specification (in class terms) for the minimum boundary between paragraphs.
      void setRequiresParagraphCheck​(boolean mode)
      Establish whether this database will use paragraph parsing when indexing.
      void setRequiresSentenceCheck​(boolean mode)
      Establish whether this database is to use sentence parsing when indexing.
      void setSentenceBeginChars​(java.lang.String chars)
      Establish any "special" characters that can constitute a valid introduction to a new sentence.
      void setSentenceBeginSpec​(java.lang.String spec)
      Establish the specification (in class terms) for the beginning of a sentence, i.e. the character types that trigger the beginning of a new sentence.
      void setSentenceEndChars​(java.lang.String chars)
      Establish any characters that can constitute a valid ending to a sentence.
      void setSentenceSeparatorSpec​(java.lang.String spec)
      Establish the specification (in class terms) for sentence separators, i.e. character types that are skipped after the end of a sentence when looking for the beginning of the next.
      • Methods inherited from class java.lang.Object

        equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • TdbTextSeparators

        public TdbTextSeparators()
        Constructor, creates a usuable blank container
      • TdbTextSeparators

        public TdbTextSeparators​(TdbTextSeparators src)
        Copy constructor
        Parameters:
        src - Source from which to copy
    • Method Detail

      • getRequiresSentenceCheck

        public boolean getRequiresSentenceCheck()
        Check if this database uses sentence parsing when indexing.
        Returns:
        true if sentence parsing is turned on
      • setRequiresSentenceCheck

        public void setRequiresSentenceCheck​(boolean mode)
        Establish whether this database is to use sentence parsing when indexing.
        Parameters:
        mode - true if sentence parsing is to be used
      • getSentenceBeginSpec

        public java.lang.String getSentenceBeginSpec()
        Retrieve the specification (in class terms) for the beginning of a sentence, i.e. the character types that trigger the beginning of a new sentence. Default is UB (upper case characters and "special" sentence beginning characters (see getSentenceBeginChars)
        Returns:
        Sentence begin specification
      • setSentenceBeginSpec

        public void setSentenceBeginSpec​(java.lang.String spec)
        Establish the specification (in class terms) for the beginning of a sentence, i.e. the character types that trigger the beginning of a new sentence. Default is UB (upper case characters and "special" sentence beginning characters (see setSentenceBeginChars)
        Parameters:
        spec - Sentence begin specification
      • getSentenceBeginChars

        public java.lang.String getSentenceBeginChars()
        Retrieve any "special" characters that can constitute a valid introduction to a new sentence. Default is (<[{«'"
        Returns:
        Special sentence introductory characters
      • setSentenceBeginChars

        public void setSentenceBeginChars​(java.lang.String chars)
        Establish any "special" characters that can constitute a valid introduction to a new sentence. Default is (<[{«'"
        Parameters:
        chars - Special sentence introductory characters
      • getSentenceSeparatorSpec

        public java.lang.String getSentenceSeparatorSpec()
        Retrieve the specification (in class terms) for sentence separators, i.e. character types that are skipped after the end of a sentence when looking for the beginning of the next. Default is SN (white space and newline characters).
        Returns:
        Sentence separation specification
      • setSentenceSeparatorSpec

        public void setSentenceSeparatorSpec​(java.lang.String spec)
        Establish the specification (in class terms) for sentence separators, i.e. character types that are skipped after the end of a sentence when looking for the beginning of the next. Default is SN (white space and newline characters).
        Parameters:
        spec - Sentence separation specification
      • getSentenceEndChars

        public java.lang.String getSentenceEndChars()
        Retrieve any characters that can constitute a valid ending to a sentence. Default is .!?
        Returns:
        Sentence ending characters
      • setSentenceEndChars

        public void setSentenceEndChars​(java.lang.String chars)
        Establish any characters that can constitute a valid ending to a sentence. Default is .!?
        Parameters:
        chars - Sentence ending characters
      • getIgnoreChars

        public java.lang.String getIgnoreChars()
        Retrieve any characters that should be ignored whilst parsing sentence boundaries. Default is )>]}»
        Returns:
        Ignored characters
      • setIgnoreChars

        public void setIgnoreChars​(java.lang.String chars)
        Establish any characters that should be ignored whilst parsing sentence boundaries. Default is )>]}»
        Parameters:
        chars - Ignored characters
      • getRequiresParagraphCheck

        public boolean getRequiresParagraphCheck()
        Check if this database uses paragraph parsing when indexing.
        Returns:
        true if the database parses paragraphs
      • setRequiresParagraphCheck

        public void setRequiresParagraphCheck​(boolean mode)
        Establish whether this database will use paragraph parsing when indexing.
        Parameters:
        mode - true if the database is to parse paragraphs
      • getParagraphNeedsBeginOfSentence

        public boolean getParagraphNeedsBeginOfSentence()
        Check if paragraphs are only considered valid if they begin with a validly formed start of sentence.
        Returns:
        true if a valid start of sentence is required to start a new paragraph
      • setParagraphNeedsBeginOfSentence

        public void setParagraphNeedsBeginOfSentence​(boolean mode)
        Establish whether paragraphs are only to be considered valid if they begin with a validly formed start of sentence.
        Parameters:
        mode - true if a valid start of sentence is required to start a new paragraph
      • getParagraphSeparatorSpec

        public java.lang.String getParagraphSeparatorSpec()
        Retrieve the specification (in class terms) for the minimum boundary between paragraphs. Default is 2N, i.e. two newline equivalents.
        Returns:
        Paragraph separator spec
      • setParagraphSeparatorSpec

        public void setParagraphSeparatorSpec​(java.lang.String spec)
        Establish the specification (in class terms) for the minimum boundary between paragraphs. Default is 2N, i.e. two newline equivalents.
        Parameters:
        spec - Paragraph separator spec
      • getParagraphNeedsEndOfSentence

        public boolean getParagraphNeedsEndOfSentence()
        Check if a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.
        Returns:
        true if paragraph endings require valid sentence endings
      • setParagraphNeedsEndOfSentence

        public void setParagraphNeedsEndOfSentence​(boolean mode)
        Establish whether a paragraph is only considered to have completed if that completion occurred along with a valid end of sentence.
        Parameters:
        mode - true if paragraph endings require valid sentence endings
      • clear

        public void clear()
        Initializes the instance, removes any pre-existing state
      • copyFrom

        public void copyFrom​(TdbTextSeparators src)
        Copy member data from another instance
        Parameters:
        src - Source of copy operation