What Will Be the Top AI Model This Month? Kalshi Odds

Tyler Jacobsma

Download App

Updated:

Feb 12, 2026, 10:23 PM EST

Updated:

Feb 12, 2026, 10:23 PM EST

What Will Be the Top AI Model This Month? Kalshi Odds article feature image

7 min read

Credit:

Photo Credit: Megan Mendoza/The Republic / USA TODAY NETWORK via Imagn Images

Kalshi's AI model prediction market has a structural edge hiding in plain sight.

The exact string match element rules mean Google's own success with Gemini 3.1 (or whatever Gemini 3’s successor is called) would destroy the legacy Gemini 3 Pro contract.

The analysis follows, but if you want to trade on this — or any of the numerous other markets that Kalshi offers — be sure to use our Kalshi promo code to get started.

The Setup

Anthropic dropped Claude Opus 4.6 on February 5. Within 48 hours, the LMSYS Chatbot Arena leaderboard reshuffled completely.

Claude Opus 4.6 Thinking now sits at No. 1 globally with an Elo of 1506. The base Opus 4.6 holds at No. 2 at 1502. Google's Gemini 3 Pro, which was commanding 35-40% implied probability on Kalshi just days earlier, nosedived to 2%.

The same day Anthropic launched, OpenAI released GPT-5.3-Codex. Two frontier models dropped within 10 minutes of each other. The AI arms race is in full force.

Here is where the Kalshi market stands right now:

What Will Be the Top AI Model This Month?

Why Gemini 3 Pro NO Has the Strongest Structural Edge

This is a contract mechanics trade, not a bet against Google.

Kalshi's settlement rules require an exact string match on the LMSYS leaderboard. The contract says: "The outcome is Gemini-3-Pro. If Gemini-3-Pro is the top-ranked AI model … then the market resolves to Yes."

Google is almost certainly about to release Gemini 3.1 Pro. The model identifier Gemini-3-Pro preview has already been spotted pinging the Artificial Analysis Arena.

Logan Kilpatrick, Google DeepMind's Lead Product Manager, has been hinting at a February release on social media. Apple's iOS 26.4 delays are creating massive pressure on Google to ship a stable 3.1 architecture for their Siri integration.

Here is the problem for anyone holding Gemini 3 Pro "YES" shares: if Google succeeds, you lose.

A new Gemini 3.1 Pro gets a different string identifier on the leaderboard. It does not roll into the existing contract. If 3.1 takes the top spot, it actively pushes the legacy 3.0 model down in rank.

Compare this to OpenAI's contract, which explicitly states: "The GPT-5.2 exact name will update based on the highest scoring GPT-5.2 model."

OpenAI holders get a safety net. Google and Anthropic holders do not. This asymmetry is where we'll try to find an edge.

Mapping out every scenario:

Google launches Gemini 3.1 Pro, and it takes the No. 1 spot: Gemini 3 Pro resolves no. You win.
Anthropic holds the lead through February 28: Gemini 3 Pro resolves no. You win.
OpenAI drops a surprise GPT-5.3 generalized model: Gemini 3 Pro resolves no. You win.
Nothing changes, current leaderboard holds: Claude Opus 4.6, Thinking maintains a 20-point Elo gap over Gemini 3 Pro: Gemini resolves no. You win.

The only scenario where Gemini 3 Pro "YES" pays out requires three things to happen simultaneously: the legacy model overcomes a 20-point Elo deficit (borderline impossible because of all the already accumulated votes), Google intentionally delays 3.1 (against all business incentives), and both Anthropic and OpenAI freeze model development.

At 98-cents per "NO" share, the raw yield is 2% over 16 days. That doesn't sound like much until you annualize it: 2% compounded over 16-day periods works out to roughly 52% APY.

For context, three-month T-bills are yielding around 4.3%. You are getting 12x the risk-free rate on a position where the structural headwinds facing the "YES" side are seemingly insurmountable.

The Anthropic Directional Play

For traders looking for upside rather than capital preservation, the Claude-opus-4-6 thinking "YES" — at 67 cents — is the play.

Anthropic's 20-point Elo lead is massive. On the LMSYS leaderboard with Style Control removed (which Kalshi requires), Claude Opus 4.6 Thinking holds 1506 Elo. The nearest major competitor, Gemini 3 Pro, sits at 1486. That gap does not simply close in two weeks.

Research shows that moving a top-5 model up by just four rank positions requires upward of 3,127 net-positive votes against highly rated competitors. The statistical lead protecting Anthropic's position is enormous.

The performance numbers back it up:

Terminal-Bench 2.0: 65.4% success rate under max effort configurations
GDPval-AA: Outperformed GPT-5.2 in complex finance and legal reasoning
BrowseComp: New state-of-the-art for deep-web information retrieval
1M token context: Superior needle-in-a-haystack retrieval without degradation

The only realistic threat is a surprise competitor launch that leapfrogs Anthropic before February 28. Google's Gemini 3.1 is the most likely candidate, but Apple integration delays suggest stabilization issues.

Even if a superior model drops on February 25, it may not accumulate enough Arena battles to tighten its confidence interval and mathematically secure the Rank (UB) tiebreaker before the midnight expiration.

At 67 cents, you are getting 67% implied odds on what is probably a 75-80% true probability.

The Macro Context

This trade does not exist in a vacuum. The reason these AI contracts are moving with this kind of velocity is the SaaSpocalypse.

Anthropic's release of open-source plugins for Claude Cowork triggered a repricing event across the entire enterprise software sector.

Thomson Reuters dropped 16% in a single day.
LegalZoom fell nearly 20%.
The Goldman Sachs Software Basket recorded its worst session since April 2025.

An estimated $830 billion in market cap has been erased from the S&P 500 software and services index since January 28.

The market is telling you that "Best AI Model" is no longer an academic distinction. It is directly tied to enterprise adoption and capital allocation. Every lab is under immense pressure to ship updates and make progress.

Cross-referencing Polymarket confirms this thesis. The "Gemini 3.5 released by … ?" contract prices only a 22% chance that Google ships a Gemini 3 successor by March 31. That means the broad market consensus is that Google's next-generation model is more likely to arrive in Q2 than Q1.

But here is the key: even in the 22% scenario where Google accelerates and ships before the Kalshi expiration, the new model gets a different identifier on the LMSYS leaderboard.

Google shipping early is bullish for the Gemini 3 Pro "NO" position, not bearish. The Polymarket data is also a reminder that even a risk scenario actually helps the NO side.

Risk Factors

Kalshi rule change: Kalshi could retroactively amend its settlement criteria to treat Gemini-3.1 as equivalent to Gemini-3. This would be unprecedented and would undermine contract integrity, but the financial incentive to appease retail traders exists.

LMSYS data contamination: A catastrophic data event requiring a leaderboard rollback could invalidate the oracle entirely.

Vote manipulation: With millions of dollars now riding on leaderboard positions, the incentive for sophisticated vote rigging has never been higher. Researchers have demonstrated that omnipresent rigging strategies can manipulate rankings using only hundreds of coordinated accounts.

These are all tail risks. They justify the 1-point deduction from a perfect conviction score, but none of them change the fundamental structural asymmetry of the trade.

The Trade

Position: SHORT Gemini 3 Pro (Buy "NO" at 98-cents) / LONG Claude Opus 4.6 Thinking (Buy "YES" below 70-cents) – Use “Limit” orders to maximize return.
Platform: Kalshi – "What will be the top AI model this month?"
Settles: February 28, 2026, 12:00 AM EST
Conviction: 9/10

Suggested Trade Structure

Here is how to size this as a single portfolio allocation. Assume a $1,000 deployment into this contract's expiration.

Leg A

The structural anchor.

70% of capital goes into the Gemini "NO" position. This is your high conviction, low-yield base. If the contract settles as expected, you collect $14 in 16 days.

Annualized, that 2% yield over a 16-day holding period compounds to roughly 52% APY, more than 12x what short-term treasuries are paying.

Leg B

The directional kicker.

30% of capital goes into the Anthropic "YES" position. If Claude holds #1 through February 28, this leg returns 49% in the same 16 days.

Annualized, that is north of 1,100% APY. The probability-weighted expected value at our estimated 78% true odds give this leg roughly +$115 in EV.

Combined portfolio outcome scenarios:

Anthropic holds No. 1 (base case, ~78% probability): Both legs pay. Total return: +$162.50 on $987.50 deployed. That is 16.4% in 16 days.

A competitor takes No. 1, but it is not Gemini 3 Pro (~15% probability): Leg A pays, Leg B loses. Net return: +$14 minus $301.50 = -$287.50. This is the risk you are taking on the directional component.

Gemini 3 Pro somehow takes No. 1 (<1% probability): Both legs lose. Maximum drawdown: – $987.50. We assess this as an extreme tail event.

The probability-weighted expected return across all scenarios is approximately +$100 on $987.50 deployed, or roughly 10% in 16 days.

The key insight is that Leg A and Leg B are not correlated risks.

Leg A wins in virtually every scenario.
Leg B is the calculated bet that Anthropic specifically holds the top spot, a separate and narrower question.

A Note on Sizing

The two-leg structure above is designed for porƞolios of $1,000+. The Gemini "NO" leg only makes sense at scale, as earning 2% on $85 is $1.70, which is not worth the execution cost. If you are deploying less than $500, skip the structural leg and just buy Claude Opus 4.6.

Thinking YES at 67 cents as a pure directional bet. The 49% upside in 16 days on a position with an estimated 75-80% true probability is a strong standalone trade. The two-leg structure adds risk management, not edge.

What is Kalshi?

Different than a traditional sportsbook and available in most states, Kalshi allows users to make predictions across several unique markets, including sports, entertainment, elections, and even weather.

Kalshi operates on a contract-based system where users buy "contracts" (priced between 1–99 cents) based on whether they believe a specific event will happen. The price of each contract fluctuates in real time based on market sentiment, and like the stock market, traders can sell positions early to lock in profits (or minimize losses).

About the Author

Tyler Jacobsma • Verified Action Expert

This site contains commercial content. We may be compensated for the links provided on this page. The content on this page is for informational purposes only. Action Network makes no representation or warranty as to the accuracy of the information given or the outcome of any game or event.