Schema CheatSheet
Integration Columns
Our integrations provide additional structured metadata that can be useful to leverage during analysis. See our Report Filters Cheatsheet for some examples of leveraging these fields during analysis of earnings transcripts, news, and product reviews.
column_name | column_type | description | integrations |
---|---|---|---|
published | VARCHAR | Date string. | all |
title | VARCHAR | The title associated with the document. Used for UI polishing on our automatic dashboards. | all |
ticker | VARCHAR | stock ticker e.g. GOOG, META | earnings_transcripts |
quarter | VARCHAR | fiscal quarter. self reported by each company | earnings_transcripts |
pub_quarter | VARCHAR | normalized quarter by publication date. Recommended for cross-company analysis | earnings_transcripts |
product_name | VARCHAR | title of product | Product Reviews (Home Depot, Apple App Store, Walmart) |
rating | INT | rating 1-5 | Product Reviews (Home Depot, Apple App Store, Walmart) |
Generic Columns
These are a few basic standard columns that are created in the paragraph
table for every index.
column_name | column_type | description |
---|---|---|
doc_id | VARCHAR | Unique Identifier for each document. Either provided by the user or a sha of the content. |
paragraph_id | BIGINT | The paragraph that an excerpt of text belongs to. In order from 0 to N. |
row_id | VARCHAR | Unique identifier for each doc-paragraph combination |
text | VARCHAR | The raw text of the paragraph. |
nu_n_tokens | BIGINT | The number of tokens in the document |
nu_payload_size | BIGINT | The size in bytes of each document and its metadata |
Sparse Topic Columns
These are a few advanced columns that are created in the paragraph
table for every index.These columns contain sparsely stored topical semantic information for every document at the document and paragraph level.
See our quickstart for an in-depth tutorial on our topic information and our sparse SQL reference page for documentation of how to leverage these semantic fields directly in SQL.
column_name | column_type | description |
---|---|---|
c_mean_avg_vals | FLOAT[] | Paragraph level topics values. Denotes the number of words that belong to each topic |
c_mean_avg_inds | USMALLINT[] | Indicies of the paragraph level topic values |
sum_topic_counts_vals | FLOAT[] | Document level topic values. |
sum_topic_counts_inds | USMALLINT[] | Indicies of the document level topic counts |