GoodocSummaryStats

GoogleApi.ContentWarehouse.V1.Model.GoodocSummaryStats


Table of Contents ▼

Jump to a specific part of the page:

Description

Goodoc stats for a range of elements, such as one page or a whole book. These stats can be computed using the SummaryStatsCollector class. Some range stats are pre-computed and stored in goodocs/volumes (eg., Page.stats below, and Ocean's CA_VolumeResult.goodoc_stats).

Attributes List

This module has the following attributes (case-insensitive ascending order):

View Attributes

Attributes

  1. numParagraphs (type: integer(), default: nil)
    - ------ Paragraph stats Median symbols and words omit junk, header and footer blocks; they are intended to be a measure of the typical "content" paragraph. There can still be substantial differences between means and medians, particularly if a table is present (every cell is a paragraph).
  2. medianSymbolsPerParagraph (type: integer(), default: nil)
    -
  3. estimatedFontSizes (type: boolean(), default: nil)
    - This flag is set if the histogram above has been derived by estimating font sizes from CharLabel.CharacterHeight; that happens if the FontSize field is constant, as has happened with Abbyy 9.
  4. numLineSpaces (type: integer(), default: nil)
    - Lines (out of num_lines) that have a successor line within their para
  5. medianSymbolsPerBlock (type: integer(), default: nil)
    -
  6. numWords (type: integer(), default: nil)
    - ------ Word stats
  7. medianSymbolsPerWord (type: integer(), default: nil)
    -
  8. meanSymbolsPerWord (type: integer(), default: nil)
    -
  9. numNonGraphicBlocks (type: integer(), default: nil)
    -
  10. medianFullOddPrintedBox (type: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox, default: nil)
    -
  11. medianWordsPerLine (type: integer(), default: nil)
    -
  12. medianLineSpan (type: integer(), default: nil)
    - top to next top in para
  13. medianWidth (type: integer(), default: nil)
    -
  14. medianWordsPerParagraph (type: integer(), default: nil)
    -
  15. meanWordsPerBlock (type: integer(), default: nil)
    -
  16. medianParagraphIndent (type: integer(), default: nil)
    - leading space on first line
  17. medianOddPrintedBox (type: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox, default: nil)
    - 1,3,5..
  18. medianSymbolsPerLine (type: integer(), default: nil)
    -
  19. meanSymbolsPerLine (type: integer(), default: nil)
    -
  20. numLines (type: integer(), default: nil)
    - ------ Line stats "top" corresponds to the highest ascender and "bottom" to the lowest descender.
  21. medianParagraphSpace (type: integer(), default: nil)
    - bottom to next top in block
  22. numParagraphSpaces (type: integer(), default: nil)
    - paras that have a successor para within their block
  23. medianPrintedBox (type: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox, default: nil)
    - Each median*_printed_box excludes page header/footer and all graphic blocks
  24. numPages (type: integer(), default: nil)
    - ------ Page stats.
  25. medianHorizontalDpi (type: integer(), default: nil)
    -
  26. meanSymbolsPerParagraph (type: integer(), default: nil)
    -
  27. medianVerticalDpi (type: integer(), default: nil)
    -
  28. medianFullPrintedBox (type: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox, default: nil)
    - Each median_full*_printed_box includes page header/footer but still excludes all graphic blocks
  29. fontSizeHistogram (type: list(GoogleApi.ContentWarehouse.V1.Model.GoodocFontSizeStats), default: nil)
    - Symbol counts (and other attributes) for each distinct CharLabel.FontId and FontSize; histogram is in decreasing order of symbol count
  30. medianBlockSpace (type: integer(), default: nil)
    - bottom to next top in flow on page
  31. medianLineHeight (type: integer(), default: nil)
    - top to bottom
  32. medianHeight (type: integer(), default: nil)
    -
  33. medianFullEvenPrintedBox (type: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox, default: nil)
    -
  34. meanWordsPerParagraph (type: integer(), default: nil)
    -
  35. meanWordsPerLine (type: integer(), default: nil)
    -
  36. medianEvenPrintedBox (type: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox, default: nil)
    - 0,2,4..
  37. medianLineSpace (type: integer(), default: nil)
    - bottom to next top in para
  38. numSymbols (type: integer(), default: nil)
    - ------ Symbol stats
  39. numBlocks (type: integer(), default: nil)
    - ------ Block stats Median symbols and words omit junk, header and footer blocks; they are intended to be a measure of the typical "content" block. There can still be substantial differences between means and medians; however, block values will generally exceed paragraph values (not the case when headers and footers are included).
  40. medianWordsPerBlock (type: integer(), default: nil)
    -
  41. numBlockSpaces (type: integer(), default: nil)
    - blocks that have a successor block within their flow on their page
  42. meanSymbolsPerBlock (type: integer(), default: nil)
    -

Type

@type t() :: %GoogleApi.ContentWarehouse.V1.Model.GoodocSummaryStats{
estimatedFontSizes: boolean() | nil,
fontSizeHistogram: [GoogleApi.ContentWarehouse.V1.Model.GoodocFontSizeStats.t()] | nil,
meanSymbolsPerBlock: integer() | nil,
meanSymbolsPerLine: integer() | nil,
meanSymbolsPerParagraph: integer() | nil,
meanSymbolsPerWord: integer() | nil,
meanWordsPerBlock: integer() | nil,
meanWordsPerLine: integer() | nil,
meanWordsPerParagraph: integer() | nil,
medianBlockSpace: integer() | nil,
medianEvenPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianFullEvenPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianFullOddPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianFullPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianHeight: integer() | nil,
medianHorizontalDpi: integer() | nil,
medianLineHeight: integer() | nil,
medianLineSpace: integer() | nil,
medianLineSpan: integer() | nil,
medianOddPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianParagraphIndent: integer() | nil,
medianParagraphSpace: integer() | nil,
medianPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianSymbolsPerBlock: integer() | nil,
medianSymbolsPerLine: integer() | nil,
medianSymbolsPerParagraph: integer() | nil,
medianSymbolsPerWord: integer() | nil,
medianVerticalDpi: integer() | nil,
medianWidth: integer() | nil,
medianWordsPerBlock: integer() | nil,
medianWordsPerLine: integer() | nil,
medianWordsPerParagraph: integer() | nil,
numBlockSpaces: integer() | nil,
numBlocks: integer() | nil,
numLineSpaces: integer() | nil,
numLines: integer() | nil,
numNonGraphicBlocks: integer() | nil,
numPages: integer() | nil,
numParagraphSpaces: integer() | nil,
numParagraphs: integer() | nil,
numSymbols: integer() | nil,
numWords: integer() | nil
}

Function

@spec decode(struct(), keyword()) :: struct()

Data sourced from HexDocs : GoogleApi.ContentWarehouse.V1.Model.GoodocSummaryStats