Download OpenAPI specification:Download
Documentation for the suite Sturdy Statistics API solutions.
Creates a new index.
An index is the core data structure for storing data. Once the index is trained (see documentation for the train
endpoint), an index may also be used to search, query, and analyze data.
If an index with the provided name already exists, no index will be created and the metadata of that index will be returned.
name | string |
{- "name": "string"
}
{- "index_id": "index_a3cd8f52a42b4ee3841dacfe9408d4cd",
- "name": "Index Name",
- "state": "untrained",
- "already_exists": false
}
Returns a list of all indices tied to your API key.
api_key required | string API Key. |
[- {
- "index_id": "index_a3cd8f52a42b4ee3841dacfe9408d4cd",
- "name": "Index_Name_1",
- "state": "untrained",
- "already_exists": false
}
]
Returns all metadata belonging to the specified index.
index_id required | string |
api_key required | string API Key. |
{- "index_id": "index_a3cd8f52a42b4ee3841dacfe9408d4cd",
- "name": "Index_Name_1",
- "state": "untrained"
}
Uploads documents to a temporary staging index for processing and/or storage. Documents are processed by the AI model if the index has been trained (see the train
endpoint), and are stored if the parameter save=true
.
Documents are provided as a list of dictionaries. The content of each document must be plain text and is provided under the required field doc
. You may provide a unique document identifier under the optional field doc_id
. If no doc_id
is provided, we will create an identifier by hashing the contents of the document. Documents can be updated via an upsert mechanism that matches on doc_id
. If doc_id
is not provided and two docs have identical content, the most recently uploaded document will upsert the previously uploaded document.
Any additional fields in the document dictionary are stored within the index as metadata and will become available for training and querying tasks (see documentation for the train
and query
endpoints). While arbitrary metadata may be included and later queried, "binary" and "tag" data are required for supervised training of an index (no metadata is required for unsupervised training). See the documentation for label_field_names
and tag_field_names
in the train
endpoint for more information about the required formats.
If the index has been trained, the response contains predictions
produced by the trained AI model for each document. In order to obtain predictions without locking or mutating the index, set save=false
.
If you wish to update an existing doc's metadata, you can do a shallow update
by leaving the doc
content field either empty or unchanged. The shallow update will the skip the model inference step and only update/append the metadata fields passed into the api. This provides a significant speedup to the api call.
Uploading docs is a locking operation. A client cannot call upload, train or commit while an upload is already in progress. Consequently, the operation is more efficient with batches of documents. The API supports a batch size of up to 1000 documents at a time. The larger the batch size, the more efficient the upload.
Uploaded documents are saved in a staging
index. The index is unaffected until a commit
request is sent.
index_id required | string |
save | boolean If |
Array of objects |
{- "save": true,
- "docs": [
- {
- "doc": "string",
- "doc_id": "string",
- "boolean_metadata_field_1": true,
- "any_name_can_be_used": true,
- "tag_metadata": [
- "string"
], - "tag_metadata_any_name": [
- "string"
], - "metadata_field_26": 0,
- "any_other_metadata_name": [
- 0
]
}
]
}
[- {
- "doc_id": "doc1",
- "predictions": {
- "prediction_class_1": 0.84,
- "prediction_class_2": 0.04
}
}, - {
- "doc_id": "doc2",
- "predictions": {
- "prediction_class_1": 0.84,
- "prediction_class_2": 0.04
}
}
]
Queries the specified index. Queries are flexible, and may contain any combination of:
filters
parameter. Note that this option requires metadata to be stored in the index; see documentation for the upload
endpoint.topic_group_id
) or by an arbitrary list of specific topics (topic_ids
). Note that these options require the index to have been trained. You may filter by either topic_ids
or by topic_group_id
but not both simultaneously.query
parameter.The API returns a dictionary containing three different entries. Each entry provides a distinct view into how the data in the index matches your query, with increasing degrees of abstraction:
sort_by
parameter). Each result contains a short excerpt from the document that evinces your query; see documentation for the summarize_by
parameter for information about this excerpt. Each result also contains all prediction values associated with the document, all metadata fields associated with the document, and an ordered list of the topics associated with the document.topic_id
that can be used for additional downstream queries (see documentation for the topic_ids
parameter).topic_group_id
that can be used for additional downstream queries (see documentation for the topic_group_id
parameter).index_id required | string |
query | string A search query that can be used to filter or sort document objects. By default the search will support a fuzzy match. Any word wrapped in double quotes "word" will be treated as an exact match filter. |
topic_ids | string Supports filtering on a single or multiple topics. Topics are unsupervised granular themes inferred from the clients data at training time that can be used to index the data. Expected input format is a comma separated list of topic_ids eg |
topic_group_id | integer Supports filtering on a single topic group. Topic groups are unsupervised high level (rather than granular) categories learned from the clients data at training time that can be used to index the data. Expected input format is a single integer. |
filters | string
|
sort_by | string Define a field by which to sort. By default, the docs will be sorted by |
ascending | boolean |
context | integer The number of paragraphs above and below the selected excerpt to return. |
limit | integer |
offset | integer |
api_key required | string Api Key |
[- {
- "docs": [
- {
- "doc_id": "doc_id_1",
- "text": "summarize text_1",
- "metadata": {
- "metadata_field_1": "example meta info",
- "another_meta_field": "more example meta info such as date, title, etc"
}, - "predictions": {
- "prediction_field_1": 0.83,
- "another_prediction_field": 0.03
}, - "topics": [
- {
- "topic_id": 32,
- "short_title": "example topic descriptor",
- "prevalence": 0.84,
- "topic_group_id": 3,
- "topic_group_short_title": "Short Group Title"
}, - {
- "topic_id": 68,
- "short_title": "example topic descriptor2",
- "prevalence": 0.05,
- "topic_group_id": 5,
- "topic_group_short_title": "Short Group Title"
}
]
}
]
}
]
Loads a specified set of docs from the index. Supports a comma separated list of up to 500 doc_ids at a time.
Additionally, the api provides the ability to specify the summary style of the docs by providing either a comma separated list of topic_ids
, a topic_group_id
, or a search query
. The api will extract the most thematically relevant section of the doc to return.
The API returns a dictionary containing field docs
that contains a list of document objects in the order specified in the query.
index_id required | string |
doc_id required | string A comma separated list of |
query | string A search query that can be used to summarize document objects. |
topic_ids | string Supports matching on a single or multiple topics for granular summaries Topics are unsupervised granular themes inferred from the clients data at training time that can be used to index the data. Expected input format is a comma separated list of topic_ids eg |
topic_group_id | integer Supports matching on a single topic group for high leve summaries. Topic groups are unsupervised high level (rather than granular) categories learned from the clients data at training time that can be used to index the data. Expected input format is a single integer. |
context | integer The number of paragraphs above and below the selected excerpt to return. |
api_key required | string Api Key |
[- {
- "docs": [
- {
- "doc_id": "doc_id_1",
- "text": "summarize text_1",
- "metadata": {
- "metadata_field_1": "example meta info",
- "another_meta_field": "more example meta info such as date, title, etc"
}, - "predictions": {
- "prediction_field_1": 0.83,
- "another_prediction_field": 0.03
}
}
]
}
]
This api allows you to directly run arbitrary SQL queries against you index's metadata. The availab fields include doc_id
as well as any metadata you uploaded to your documents.
index_id required | string |
query required | string
|
api_key required | string API Key. |
[- {
- "doc_id": "uuid4",
- "pred_sale": 0.4,
- "date": "2024-01-01"
}
]
Permanently applies all changes made to the staging index to the production index. Note that upload
saves new documents only to the staging index; a commit
is necessary to use those documents for querying or for training. This is a locking operation: no data can be uploaded, trained, or committed while a commit is in progress.
index_id required | string |
api_key | string |
{- "api_key": "string"
}
{- "job_id": "a3cd8f52a42b4ee3841dacfe9408d4cd"
}
Reverts all changes to the staging index back, and resets the staging index to match the state of the production index.
index_id required | string |
api_key | string |
{- "api_key": "string"
}
{- "job_id": "a3cd8f52a42b4ee3841dacfe9408d4cd"
}
Create a deep copy of the selected index.
index_id required | string |
new_name | string |
{- "new_name": "string"
}
{- "job_id": "a3cd8f52a42b4ee3841dacfe9408d4cd"
}
Trains an AI model on all documents in the production index. Once an index has been trained, documents are queryable (see documentation for the query
endpoint), and the model automatically processes subsequently uploaded documents (see documentation for the upload
endpoint).
The AI model identifies thematic information in documents, permitting semantic indexing and semantic search. It also enables quantitative analysis of, e.g., topic trends.
The AI model may optionally be supervised using metadata present in the index. Thematic decomposition of the data is not unique; supervision guides the model and aligns the identified topics to your intended application. Supervision also allows the model to make predictions.
Data for supervision may be supplied explicitly using the label_field_names
parameter. Metadata field names listed in this parameter must each store data in a ternary true/false/unknown format. For convenience, supervision data may also be supplied in a sparse "tag" format using the tag_field_names
parameter. Metadata field names listed in this parameter must contain a list of labels for each document. The document is considered "true" for each label listed; it is implicitly considered "false" for each label not listed. Consequently, the "tag" format does not allow for unknown labels. Any combination of label_field_names
and tag_field_names
may be supplied.
index_id required | string |
label_field_names | Array of strings A list of fields that denote binary labels. The model will use these fields as training data and predict their values for all future docs to be uploaded. Valid values for field1 in each doc are Example |
tag_field_names | Array of strings A list of fields that contain tags. E.g. if a doc has an attribute |
doc_hierarchy | Array of strings This is used for adding hierarchy to the indexing model by leveraging attributes present in the uploaded data. This is a more advanced feature for those familiar with Bayesian analysis. |
K | string This parameter sets the maximum number of topics to learn from the data. We support a range of 32-512, with a default of 192. The runtime of training is linear with the number of topics. |
{- "label_field_names": [
- "string"
], - "tag_field_names": [
- "string"
], - "doc_hierarchy": [
- "string"
], - "K": "string"
}
{- "job_id": "a3cd8f52a42b4ee3841dacfe9408d4cd"
}
Returns most conditionally prevalent topics information.
index_id required | string |
q1 required | string
The SQL clause supports any operation available in duckdb. The SQL clause can operate on any metadata you have uploaded and on any prediction values tied to your data. Example q1 filter -- |
q2 | string
|
min_confidence | number Sets a minimum confidence threshold between 0-1 that a topic changed when comparing q1 to q2. |
api_key required | string API Key. |
{- "topics": [
- {
- "topic_id": 12,
- "short_title": "Topic Title",
- "one_sentence_summary": "One sentence summary of the topic",
- "executive_paragraph_summary": "Longer summary of a topic",
- "prevalence_ratio": 8.3,
- "prevalence": 0.13,
- "confidence": 98,
- "topic_group_id": 2,
- "topic_group_short_title": "Group Title"
}, - {
- "topic_id": 12,
- "short_title": "Topic Title",
- "one_sentence_summary": "One sentence summary of the topic",
- "executive_paragraph_summary": "Longer summary of a topic",
- "prevalence_ratio": 8.3,
- "prevalence": 0.13,
- "confidence": 98,
- "topic_group_id": 2,
- "topic_group_short_title": "Group Title"
}
]
}
Returns most conditionally prevalent topics information.
index_id required | string |
query | string A search query that can be used to summarize document objects. |
filters | string
|
doc_id required | string A comma separated list of |
api_key required | string API Key. |
{- "topics": [
- {
- "topic_id": 12,
- "short_title": "Topic Title",
- "one_sentence_summary": "One sentence summary of the topic",
- "executive_paragraph_summary": "Longer summary of a topic",
- "prevalence": 0.13,
- "topic_group_id": 2,
- "topic_group_short_title": "Group Title"
}, - {
- "topic_id": 12,
- "short_title": "Topic Title",
- "one_sentence_summary": "One sentence summary of the topic",
- "executive_paragraph_summary": "Longer summary of a topic",
- "prevalence": 0.13,
- "topic_group_id": 2,
- "topic_group_short_title": "Group Title"
}
]
}
Returns the status of the current job.
job_id required | string |
api_key required | string API Key. |
{- "finishedAt": "2024-11-05T01:41:47.637000+00:00",
- "index_id": "index_05a7cb07da764f1aa399ce65ab06",
- "job_id": "7e5154f4-cfcd-4fc7-94c3-5a168f83",
- "job_name": "commitIndexV2",
- "result": {
- "result": "success"
}, - "startedAt": "2024-11-05T01:41:26.835000+00:00",
- "status": "SUCCEEDED"
}
List all jobs matching the filter criteria
api_key required | string API Key. |
index_id | string Filter on a specified index_id. |
status | string Filter on job status of the job. One of 'PENDING', 'RUNNING', 'CANCELLED', 'SUCCEEDED', 'FAILED'. |
job_name | string Filter on the name of the job. |
[- {
- "finishedAt": "2024-11-05T01:41:47.637000+00:00",
- "index_id": "index_05a7cb07da764f1aa399ce65ab06",
- "job_id": "7e5154f4-cfcd-4fc7-94c3-5a168f83",
- "job_name": "commitIndexV2",
- "result": {
- "result": "success"
}, - "startedAt": "2024-11-05T01:41:26.835000+00:00",
- "status": "SUCCEEDED"
}
]