= 53
TOPIC_ID = topic_df.loc[topic_df.topic_id == TOPIC_ID]
row "topic_id", "short_title", "mentions"]] row[[
topic_id | short_title | mentions | |
---|---|---|---|
81 | 53 | AI Model Scaling | 81.0 |
The Topic Search API (along with our other semantic APIs) produce high level insights. In order to both dive deeper into and verify these insights, we provide a mechanism to retrieve the underlying data with our query API. This API shares a unified filtering engine with our higher level semantic APIs. Any semantic rollup or insight aggregation can be instantly “unrolled”.
Let’s take the topic AI Model Scaling. We can uncover the topic metadata below and see that it was mentioned 81 times in the corpus.
We can call the Query API, passing in our topic_id
. We can see that we have 81 mentions returned, lining up exactly with our aggregate APIs.
df = index.query(topic_id=TOPIC_ID, max_excerpts_per_doc=200, limit=200) ## 200 is the single request limit
df.iloc[[0,-1]]
doc_id | text | ticker | quarter | pub_quarter | year | published | title | author | paragraph_id | metadata | predictions | doc_topics | paragraph_topics | search_score | topic_search_score | exact_match_search_score | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 8f55c1161342d07cc3a9a2be55d59dad | Jensen Huang: No. No. I'm gonna just wanna tha... | NVDA | 2025Q4 | 2025Q1 | 2025 | 2025-02-26 | NVDA 2025Q4 | NVDA | 55 | {'ticker': 'NVDA', 'quarter': '2025Q4', 'pub_q... | {} | [{'short_title': 'Accelerated Computing System... | [{'short_title': 'AI Model Scaling', 'topic_gr... | 10.079368 | 0.0 | 2.0 |
80 | 12b7f45254d6b5326307df2bad8def95 | Ross Sandler: Great. Just two quick ones, Mark... | META | 2024Q3 | 2024Q4 | 2024 | 2024-10-30 | META 2024Q3 | META | 36 | {'ticker': 'META', 'quarter': '2024Q3', 'pub_q... | {} | [{'short_title': 'Zuckerberg on Business Strat... | [{'short_title': 'Business Growth Strategies',... | 10.079368 | 0.0 | 2.0 |
Below we display the first and last result of our query results. Unlike neural embeddings, the sparse structure that our Index learns enables us to set hard semantic thresholds.
Because our model annotates every single word in a document, we can extract the specific terms that lead to the high topic score in each excerpt. We leverage
You will notice that accompanying each excerpt is a set of tags. These are the same tags that are returned in our Topic Search API. Here each tag corresponds to the percentage of the paragraph that it comprises.
def display_text(df, highlight):
import re
try:
from IPython.display import display, Markdown
show = lambda x: display(Markdown(x))
except ImportError:
show = print
def fmt(r):
t = "\n".join(f"1. {d['short_title']}: {int(d['prevalence']*100)}%" for d in r["paragraph_topics"][:5])
txt = re.sub(r"[*$]", "", r["text"])
h = lambda m: f"**{m.group()}**" \
if any((len(w) < 4 and m.group().lower() == w.lower()) \
or (len(w) >= 4 and w.lower() in m.group().lower()) for w in highlight) else m.group()
body = re.sub(r"\b\w+\b", h, txt)
return f"<em>\n\n#### Result {r.name+1}/{df.index.max()+1}\n\n##### {r['ticker']} {r['pub_quarter']}\n\n{t}\n\n{body}</em>"
show("\n\n...\n\n".join(df.apply(fmt, axis=1)))
topicWords = index.topicWords()
words_to_highlight = topicWords.loc[topicWords.topic_id == TOPIC_ID].topic_words.explode().tolist()
display_text(df.iloc[[0, -1]], highlight=words_to_highlight)
assert len(df) == row.iloc[0].mentions
Jensen Huang: No. No. I’m gonna just wanna thank you. Up to, Jensen? And, like, the medium, a couple things. I just wanna thank you. Thank you, Colette. Demand for Blackwell is extraordinary. AI is evolving beyond perception. And generative AI into reasoning. With reasoning AI, we’re observing another scaling law. Inference time or test time scaling. The more computation the more the model thinks the smarter the answer. Models like OpenAI’s Grok 3, DeepSeq R1, are reasoning models that apply inference time scale. Reasoning models can consume a hundred times more compute. Future reasoning models can consume much more compute. DeepSeq R1 has ignited global enthusiasm. It’s an excellent innovation. But even more importantly, it has open-sourced a world-class reasoning AI model. Nearly every AI developer is applying R1. Or chain of thought and reinforcement learning techniques like R1. To scale their model’s performance. We now have three scaling laws, as I mentioned earlier. Driving the demand for AI computing. The traditional scaling laws of AI remain intact. Foundation models are being enhanced with multimodality. And pretraining is still growing. But it’s no longer enough. We have two additional scaling dimensions. Post-training scaling, where reinforcement learning fine-tuning, model distillation, require orders of magnitude more compute than pretraining alone.
…
Ross Sandler: Great. Just two quick ones, Mark. You said something along the lines of the more standardized Llama becomes, the more improvements will flow back to the core Meta business. And I guess, could you dig in a little bit more on that? So the series of Llama models are being used by lots of developers building different things in AI. I guess, how are you using that vantage point to incubate new ideas inside Meta? And then second question is, you mentioned on one of the podcasts after the Meta Connect that assuming scaling laws hold up, we may need hundreds of billions of compute CapEx to kind of reach our goals around Gen AI. So I guess how quickly could you conceivably stand up that much infrastructure given some of the constraints around energy or custom ASICs or other factors? Is there any more color on the speed by which we could get that amount of compute online at Meta? Thank you.
Section | Description |
---|---|
Part III | Semantic Analysis in SQL |
Part IV | Statistically Tuned Search |
Part V | Custom Index Creation |