Schema CheatSheet

Integration Columns

Our integrations provide additional structured metadata that can be useful to leverage during analysis. See our Report Filters Cheatsheet for some examples of leveraging these fields during analysis of earnings transcripts, news, and product reviews.

column_name column_type description integrations
published VARCHAR Date string. all
title VARCHAR The title associated with the document. Used for UI polishing on our automatic dashboards. all
ticker VARCHAR stock ticker e.g. GOOG, META earnings_transcripts
quarter VARCHAR fiscal quarter. self reported by each company earnings_transcripts
pub_quarter VARCHAR normalized quarter by publication date. Recommended for cross-company analysis earnings_transcripts
product_name VARCHAR title of product Product Reviews (Home Depot, Apple App Store, Walmart)
rating INT rating 1-5 Product Reviews (Home Depot, Apple App Store, Walmart)

Generic Columns

These are a few basic standard columns that are created in the paragraph table for every index.

column_name column_type description
doc_id VARCHAR Unique Identifier for each document. Either provided by the user or a sha of the content.
paragraph_id BIGINT The paragraph that an excerpt of text belongs to. In order from 0 to N.
row_id VARCHAR Unique identifier for each doc-paragraph combination
text VARCHAR The raw text of the paragraph.
nu_n_tokens BIGINT The number of tokens in the document
nu_payload_size BIGINT The size in bytes of each document and its metadata

Sparse Topic Columns

These are a few advanced columns that are created in the paragraph table for every index.These columns contain sparsely stored topical semantic information for every document at the document and paragraph level.

See our quickstart for an in-depth tutorial on our topic information and our sparse SQL reference page for documentation of how to leverage these semantic fields directly in SQL.

column_name column_type description
c_mean_avg_vals FLOAT[] Paragraph level topics values. Denotes the number of words that belong to each topic
c_mean_avg_inds USMALLINT[] Indicies of the paragraph level topic values
sum_topic_counts_vals FLOAT[] Document level topic values.
sum_topic_counts_inds USMALLINT[] Indicies of the document level topic counts