GoogleApi.ContentWarehouse.V1.Model.GoodocSummaryStats
Table of Contents ▼
Jump to a specific part of the page:
Description
Goodoc stats for a range of elements, such as one page or a whole book. These stats can be computed using the SummaryStatsCollector class. Some range stats are pre-computed and stored in goodocs/volumes (eg., Page.stats below, and Ocean's CA_VolumeResult.goodoc_stats).
Attributes List
This module has the following attributes (case-insensitive ascending order):
View Attributes
- estimatedFontSizes
- fontSizeHistogram
- meanSymbolsPerBlock
- meanSymbolsPerLine
- meanSymbolsPerParagraph
- meanSymbolsPerWord
- meanWordsPerBlock
- meanWordsPerLine
- meanWordsPerParagraph
- medianBlockSpace
- medianEvenPrintedBox
- medianFullEvenPrintedBox
- medianFullOddPrintedBox
- medianFullPrintedBox
- medianHeight
- medianHorizontalDpi
- medianLineHeight
- medianLineSpace
- medianLineSpan
- medianOddPrintedBox
- medianParagraphIndent
- medianParagraphSpace
- medianPrintedBox
- medianSymbolsPerBlock
- medianSymbolsPerLine
- medianSymbolsPerParagraph
- medianSymbolsPerWord
- medianVerticalDpi
- medianWidth
- medianWordsPerBlock
- medianWordsPerLine
- medianWordsPerParagraph
- numBlocks
- numBlockSpaces
- numLines
- numLineSpaces
- numNonGraphicBlocks
- numPages
- numParagraphs
- numParagraphSpaces
- numSymbols
- numWords
Attributes
-
numParagraphs
(type:integer()
, default:nil
)
- ------ Paragraph stats Median symbols and words omit junk, header and footer blocks; they are intended to be a measure of the typical "content" paragraph. There can still be substantial differences between means and medians, particularly if a table is present (every cell is a paragraph). -
medianSymbolsPerParagraph
(type:integer()
, default:nil
)
- -
estimatedFontSizes
(type:boolean()
, default:nil
)
- This flag is set if the histogram above has been derived by estimating font sizes from CharLabel.CharacterHeight; that happens if the FontSize field is constant, as has happened with Abbyy 9. -
numLineSpaces
(type:integer()
, default:nil
)
- Lines (out of num_lines) that have a successor line within their para -
medianSymbolsPerBlock
(type:integer()
, default:nil
)
- -
numWords
(type:integer()
, default:nil
)
- ------ Word stats -
medianSymbolsPerWord
(type:integer()
, default:nil
)
- -
meanSymbolsPerWord
(type:integer()
, default:nil
)
- -
numNonGraphicBlocks
(type:integer()
, default:nil
)
- -
medianFullOddPrintedBox
(type:GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox
, default:nil
)
- -
medianWordsPerLine
(type:integer()
, default:nil
)
- -
medianLineSpan
(type:integer()
, default:nil
)
- top to next top in para -
medianWidth
(type:integer()
, default:nil
)
- -
medianWordsPerParagraph
(type:integer()
, default:nil
)
- -
meanWordsPerBlock
(type:integer()
, default:nil
)
- -
medianParagraphIndent
(type:integer()
, default:nil
)
- leading space on first line -
medianOddPrintedBox
(type:GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox
, default:nil
)
- 1,3,5.. -
medianSymbolsPerLine
(type:integer()
, default:nil
)
- -
meanSymbolsPerLine
(type:integer()
, default:nil
)
- -
numLines
(type:integer()
, default:nil
)
- ------ Line stats "top" corresponds to the highest ascender and "bottom" to the lowest descender. -
medianParagraphSpace
(type:integer()
, default:nil
)
- bottom to next top in block -
numParagraphSpaces
(type:integer()
, default:nil
)
- paras that have a successor para within their block -
medianPrintedBox
(type:GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox
, default:nil
)
- Each median*_printed_box excludes page header/footer and all graphic blocks -
numPages
(type:integer()
, default:nil
)
- ------ Page stats. -
medianHorizontalDpi
(type:integer()
, default:nil
)
- -
meanSymbolsPerParagraph
(type:integer()
, default:nil
)
- -
medianVerticalDpi
(type:integer()
, default:nil
)
- -
medianFullPrintedBox
(type:GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox
, default:nil
)
- Each median_full*_printed_box includes page header/footer but still excludes all graphic blocks -
fontSizeHistogram
(type:list(GoogleApi.ContentWarehouse.V1.Model.GoodocFontSizeStats)
, default:nil
)
- Symbol counts (and other attributes) for each distinct CharLabel.FontId and FontSize; histogram is in decreasing order of symbol count -
medianBlockSpace
(type:integer()
, default:nil
)
- bottom to next top in flow on page -
medianLineHeight
(type:integer()
, default:nil
)
- top to bottom -
medianHeight
(type:integer()
, default:nil
)
- -
medianFullEvenPrintedBox
(type:GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox
, default:nil
)
- -
meanWordsPerParagraph
(type:integer()
, default:nil
)
- -
meanWordsPerLine
(type:integer()
, default:nil
)
- -
medianEvenPrintedBox
(type:GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox
, default:nil
)
- 0,2,4.. -
medianLineSpace
(type:integer()
, default:nil
)
- bottom to next top in para -
numSymbols
(type:integer()
, default:nil
)
- ------ Symbol stats -
numBlocks
(type:integer()
, default:nil
)
- ------ Block stats Median symbols and words omit junk, header and footer blocks; they are intended to be a measure of the typical "content" block. There can still be substantial differences between means and medians; however, block values will generally exceed paragraph values (not the case when headers and footers are included). -
medianWordsPerBlock
(type:integer()
, default:nil
)
- -
numBlockSpaces
(type:integer()
, default:nil
)
- blocks that have a successor block within their flow on their page -
meanSymbolsPerBlock
(type:integer()
, default:nil
)
-
Type
@type t() :: %GoogleApi.ContentWarehouse.V1.Model.GoodocSummaryStats{
estimatedFontSizes: boolean() | nil,
fontSizeHistogram: [GoogleApi.ContentWarehouse.V1.Model.GoodocFontSizeStats.t()] | nil,
meanSymbolsPerBlock: integer() | nil,
meanSymbolsPerLine: integer() | nil,
meanSymbolsPerParagraph: integer() | nil,
meanSymbolsPerWord: integer() | nil,
meanWordsPerBlock: integer() | nil,
meanWordsPerLine: integer() | nil,
meanWordsPerParagraph: integer() | nil,
medianBlockSpace: integer() | nil,
medianEvenPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianFullEvenPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianFullOddPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianFullPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianHeight: integer() | nil,
medianHorizontalDpi: integer() | nil,
medianLineHeight: integer() | nil,
medianLineSpace: integer() | nil,
medianLineSpan: integer() | nil,
medianOddPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianParagraphIndent: integer() | nil,
medianParagraphSpace: integer() | nil,
medianPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianSymbolsPerBlock: integer() | nil,
medianSymbolsPerLine: integer() | nil,
medianSymbolsPerParagraph: integer() | nil,
medianSymbolsPerWord: integer() | nil,
medianVerticalDpi: integer() | nil,
medianWidth: integer() | nil,
medianWordsPerBlock: integer() | nil,
medianWordsPerLine: integer() | nil,
medianWordsPerParagraph: integer() | nil,
numBlockSpaces: integer() | nil,
numBlocks: integer() | nil,
numLineSpaces: integer() | nil,
numLines: integer() | nil,
numNonGraphicBlocks: integer() | nil,
numPages: integer() | nil,
numParagraphSpaces: integer() | nil,
numParagraphs: integer() | nil,
numSymbols: integer() | nil,
numWords: integer() | nil
}
estimatedFontSizes: boolean() | nil,
fontSizeHistogram: [GoogleApi.ContentWarehouse.V1.Model.GoodocFontSizeStats.t()] | nil,
meanSymbolsPerBlock: integer() | nil,
meanSymbolsPerLine: integer() | nil,
meanSymbolsPerParagraph: integer() | nil,
meanSymbolsPerWord: integer() | nil,
meanWordsPerBlock: integer() | nil,
meanWordsPerLine: integer() | nil,
meanWordsPerParagraph: integer() | nil,
medianBlockSpace: integer() | nil,
medianEvenPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianFullEvenPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianFullOddPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianFullPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianHeight: integer() | nil,
medianHorizontalDpi: integer() | nil,
medianLineHeight: integer() | nil,
medianLineSpace: integer() | nil,
medianLineSpan: integer() | nil,
medianOddPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianParagraphIndent: integer() | nil,
medianParagraphSpace: integer() | nil,
medianPrintedBox: GoogleApi.ContentWarehouse.V1.Model.GoodocBoundingBox.t() | nil,
medianSymbolsPerBlock: integer() | nil,
medianSymbolsPerLine: integer() | nil,
medianSymbolsPerParagraph: integer() | nil,
medianSymbolsPerWord: integer() | nil,
medianVerticalDpi: integer() | nil,
medianWidth: integer() | nil,
medianWordsPerBlock: integer() | nil,
medianWordsPerLine: integer() | nil,
medianWordsPerParagraph: integer() | nil,
numBlockSpaces: integer() | nil,
numBlocks: integer() | nil,
numLineSpaces: integer() | nil,
numLines: integer() | nil,
numNonGraphicBlocks: integer() | nil,
numPages: integer() | nil,
numParagraphSpaces: integer() | nil,
numParagraphs: integer() | nil,
numSymbols: integer() | nil,
numWords: integer() | nil
}
Function
@spec decode(struct(), keyword()) :: struct()Data sourced from HexDocs : GoogleApi.ContentWarehouse.V1.Model.GoodocSummaryStats