NlpSaftToken – Google API Leak

GoogleApi.ContentWarehouse.V1.Model.NlpSaftToken

Table of Contents ▼

Jump to a specific part of the page:

Description
Attribute List
Attributes
Type
Function
Related links
Modules linked to
Modules linked from
Possibly related Modules

Description

A document token marks a span of bytes in the document text as a token or word. Next available index: 16.

Attributes List

This module has the following attributes (case-insensitive ascending order):

View Attributes

breakLevel
breakSkippedText
category
end
head
info
label
lemma
morph
scriptCode
start
tag
tagConfidence
textProperties
word

Attributes

breakLevel (type: String.t, default: nil)
-
breakSkippedText (type: boolean(), default: nil)
- Whether the break skipped over non-tag text (excluding script/style).
category (type: String.t, default: nil)
- Coarse-grained word category for token. See README.categories for category inventory.
end (type: integer(), default: nil)
-
head (type: integer(), default: nil)
- Head of this token in the dependency tree: the id of the token which has an arc going to this one. If it is the root token of a sentence, then it is set to -1.
info (type: GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet, default: nil)
- Annotation for this token.
label (type: String.t, default: nil)
- Label for dependency relation between this token and its head. See README.labels for label inventory.
lemma (type: String.t, default: nil)
- Word lemma. This is only filled if the lemma is different from the word form.
morph (type: GoogleApi.ContentWarehouse.V1.Model.NlpSaftMorphology, default: nil)
- Morphology information.
scriptCode (type: String.t, default: nil)
- A string representation (typically four letters, sometimes longer) of the token's Unicode script code, based on BCP 47/CLDR, capitalized according to ISO 15924. See i18n/identifiers/scriptcode.h for details.
start (type: integer(), default: nil)
- [start, end] describe the inclusive byte range of the UTF-8 encoded token in document.text. End gives the index of the last byte, which may be a UTF-8 continuation byte, and the length in bytes is end - start + 1. begin/end options are for goldmine AnnotationsFinder to locate the offsets of saft tokens. Start is inclusive by default and end is marked.
tag (type: String.t, default: nil)
- Part-of-speech tag for token. See README.tags for tag inventory.
tagConfidence (type: number(), default: nil)
- Confidence score for the tag prediction -- should be interpreted as a probability estimate that the tag is correct.
textProperties (type: integer(), default: nil)
-
word (type: String.t, default: nil)
- Token word form. This may not be identical to the original. For example, in goldmine annotation we do UTF-8 normalization and punctuation normalization. The punctuation normalization includes inferring the directionality of straight doublequotes -- that is, we map " to open quote (``) or close quote (''), and sometimes we get it wrong. SAFT processing in other contexts (such as queries in qrewrite) involves different normalizations.

Type

@type t() :: %GoogleApi.ContentWarehouse.V1.Model.NlpSaftToken{
breakLevel: String.t() | nil,
breakSkippedText: boolean() | nil,
category: String.t() | nil,
end: integer() | nil,
head: integer() | nil,
info: GoogleApi.ContentWarehouse.V1.Model.Proto2BridgeMessageSet.t() | nil,
label: String.t() | nil,
lemma: String.t() | nil,
morph: GoogleApi.ContentWarehouse.V1.Model.NlpSaftMorphology.t() | nil,
scriptCode: String.t() | nil,
start: integer() | nil,
tag: String.t() | nil,
tagConfidence: number() | nil,
textProperties: integer() | nil,
word: String.t() | nil
}

Function

@spec decode(struct(), keyword()) :: struct()

Data sourced from HexDocs : GoogleApi.ContentWarehouse.V1.Model.NlpSaftToken

Possibly related Links

From Attribute & Type data

Any module referenced in the Attributes and/or the Type sections.

Linking Modules

Any module that references NlpSaftToken in their Attributes and/or Type sections.

Related Modules

Modules that may be related (based on start of module name).