Schema CheatSheet

Integration Columns

Our integrations provide additional structured metadata that can be useful to leverage during analysis. See our Report Filters Cheatsheet for some examples of leveraging these fields during analysis of earnings transcripts, news, and product reviews.

column_name	column_type	description	integrations
published	VARCHAR	Date string.	all
title	VARCHAR	The title associated with the document. Used for UI polishing on our automatic dashboards.	all
ticker	VARCHAR	stock ticker e.g. GOOG, META	earnings_transcripts
quarter	VARCHAR	fiscal quarter. self reported by each company	earnings_transcripts
pub_quarter	VARCHAR	normalized quarter by publication date. Recommended for cross-company analysis	earnings_transcripts
product_name	VARCHAR	title of product	Product Reviews (Home Depot, Apple App Store, Walmart)
rating	INT	rating 1-5	Product Reviews (Home Depot, Apple App Store, Walmart)

Generic Columns

These are a few basic standard columns that are created in the paragraph table for every index.

column_name	column_type	description
doc_id	VARCHAR	Unique Identifier for each document. Either provided by the user or a sha of the content.
paragraph_id	BIGINT	The paragraph that an excerpt of text belongs to. In order from 0 to N.
row_id	VARCHAR	Unique identifier for each doc-paragraph combination
text	VARCHAR	The raw text of the paragraph.
nu_n_tokens	BIGINT	The number of tokens in the document
nu_payload_size	BIGINT	The size in bytes of each document and its metadata

Sparse Topic Columns

These are a few advanced columns that are created in the paragraph table for every index.These columns contain sparsely stored topical semantic information for every document at the document and paragraph level.

See our quickstart for an in-depth tutorial on our topic information and our sparse SQL reference page for documentation of how to leverage these semantic fields directly in SQL.

column_name	column_type	description
c_mean_avg_vals	FLOAT[]	Paragraph level topics values. Denotes the number of words that belong to each topic
c_mean_avg_inds	USMALLINT[]	Indicies of the paragraph level topic values
sum_topic_counts_vals	FLOAT[]	Document level topic values.
sum_topic_counts_inds	USMALLINT[]	Indicies of the document level topic counts