= 53
TOPIC_ID = index.queryMeta(f"""
df SELECT
pub_quarter,
sum(
(sparse_list_extract({TOPIC_ID+1}, c_mean_avg_inds, c_mean_avg_vals) > 2.00)::INT
) as mentions
FROM paragraph
GROUP BY pub_quarter
ORDER BY pub_quarter
""")
= px.bar(
fig ="pub_quarter", y="mentions",
df, x=f"Mentions of 'AI Model Scaling'",
title# line_shape="hvh",
)=.5).show() procFig(fig, title_x
Structured Semantic Analysis
Our model granularly annotates every word, sentence, paragraph and document with topic information. The structured nature of our semantic data allows us to store this data in a structured tabular format alongside any relevant metadata. This means we can perform complex semantic analyses directly in SQL.
AI Model Scaling
over Time
The SQL statement below is a standard group by. The only new content is the line: (sparse_list_extract({TOPIC_ID+1}, c_mean_avg_inds, c_mean_avg_vals) > 2.00)::INT
. Sturdy Statistics stores thematic content arrays in a sparse format of a list of indices and a list of values. This format provides significant storage and performance optimizations. We use a defined set of sparse functions to work with this data.
Below we give it the fields c_mean_avg_inds
and c_mean_avg_vals
. The original c_mean_avg
array is a count of the number of words in each paragraph that have been assigned to a topic. The mean_avg
denotes that this value has been accumulated over several hundred MCMC samples. This sampling has numerous benefits and is also why our counts are not integers (a very common question we receive).
Broken Down by Company
Because this semantic data is stored directly in a SQL table, we can enrich our semantic analysis with metadata. Below, we are able to break how much each company is dicussing the topic AI Model Scaling
and when they are talking about it.
TOPIC_ID = 53
df = index.queryMeta(f"""
SELECT
pub_quarter,
ticker,
sum(
(sparse_list_extract({TOPIC_ID+1}, c_mean_avg_inds, c_mean_avg_vals) > 2.00)::INT
) as mentions
FROM paragraph
GROUP BY pub_quarter, ticker
ORDER BY pub_quarter
""")
fig = px.bar(
df, x="pub_quarter", y="mentions", color="ticker",
title=f"Mentions of 'AI Model Scaling'",
)
procFig(fig, title_x=.5).show()
With a Search Filter
In addition to storing the thematic content directly in the SQL tables, we integrate our semantic search engine within SQL. Below we pass the semantic search query infrastructure
as a filter for our analysis.
TOPIC_ID = 53
df = index.queryMeta(f"""
SELECT
pub_quarter,
ticker,
sum(
(sparse_list_extract({TOPIC_ID+1}, c_mean_avg_inds, c_mean_avg_vals) > 2.00)::INT
) as mentions
FROM paragraph
GROUP BY pub_quarter, ticker
ORDER BY pub_quarter
""",
search_query="infrastructure")
fig = px.bar(
df, x="pub_quarter", y="mentions", color="ticker",
title=f"Mentions of 'AI Model Scaling'",
)
procFig(fig, title_x=.5).show()
Verification & Insights
As always, any high level insights can be tied back to the underlying data that comprises it. Below, we pull up all examples of AI Model Scaling
that focus on the search term Infrastructure
during Meta’s 2025Q1 earning’s call. We assert that there are 4 examples (the value our bar chart provides) returned and we display the first and last ranked example.
Note that the last example does not explicitly mention Infrastructure
but instead matches on terms such as CapEx and data centers.
df = index.query(topic_id=TOPIC_ID, search_query='infrastructure',
filters="ticker='META' AND pub_quarter='2025Q1'")
displayText(df.iloc[[0, -1]], ["capex", "data", "center", "train", "infrastructure"])
assert len(df) == 4
Result 1/4
META 2025Q1
- Business Growth Strategies: 44%
- Capital Expenditure Trends: 21%
- AI Model Scaling: 13%
- Open Source AI Infrastructure: 10%
- Growth Initiatives: 5%
Douglas Anmuth: Thanks for taking the questions. One for Mark, one for Susan. Mark, just following up on open source as DeepSeek and other models potentially leverage Llama or others to train faster and cheaper. How does this impact in your view? And what could have been for the trajectory of investment required over a multiyear period? And then, Susan, just as we think about the 60 billion to 65 billion CapEx this year, does the composition change much from last year when you talked about servers as the largest part followed by data centers and networking equipment. And how should we think about that mix between like training and inference just following up on Jan’s post this week? Thanks.
…
Result 4/4
META 2025Q1
- Zuckerberg on Business Strategies: 34%
- AI Model Scaling: 31%
- Capital Expenditure Trends: 6%
- AI Capital Investment Strategy: 6%
- AWS Capital Investments: 5%
But overall, I would reiterate what Mark said. We are committed to building leading foundation models and applications. We expect that we’re going to make big investments to support our training and inference objectives, and we don’t know exactly where we are in the cycle of that yet.
Zoom Out
At any point, we can also zoom back out into the high level topic view. Instead of focusing in on the AI Model Scaling
topic, we can instead zoom out and see everything Meta discussed about infrastructure
during 2025Q1’s earning call.