Scout
Research
← Back

AUGUST 12, 2025

Lightweight Reinforcement Learning for Token Search

pfpEdgar Pavlovsky
Lightweight Reinforcement Learning for Token Search

Searching for tokens is a fascinating problem in crypto. It sounds like such a simple concept: type a name, symbol, or CA and find the token you're looking for. It might be the most commong thing we do in this industry.

In practice, it's not that simple. More than 30,000 new tokens launch a day. Fresh tickers become relevant in minutes, not hours or days. Near matches exist on purpose. Similar tickers, similar logos, similar copy. Sometimes you only have a name or a symbol and no CA. Time is always critical - you don't want to scroll through a wall of results to find the right $DARK.

Search is old. Token search is young.

In tech, Search as a field is mature - there's tons of great solutions out there. Token search, on the other hand, has historically been very basic. The common pipeline looks like this: start with fuzzy results from a search provider likeBirdeye, then sort by verification and liquidity. Sometimes volume gets a small weight. It's simple and fast to ship, but it's left a lot to be desired. A better token search is one of the top 3 things Scout's users have asked us for.

There's something else really exciting: Token Search is the perfect, esoteric search problem to solve if you're building an AI-native product in crypto:

  1. It's a unique search problem:
    • Extremely fast cycles: a launch or news event can flip relevance in minutes. The best source of truth is the crowd (more on this later).
    • Duplicates: deliberate lookalike tokens ship en masse, and carry real risk.
    • Data sparsity: a world of new tokens is a world of little information. Paired with users requiring high confidence, it's a stimulating problem.
  2. At the same time, Token Search is a great problem space to solve for:
    • There's few variables that matter: wallets, trades, liquidity, volume, market cap, verification. Manageable, explainable signals.
    • You get constant feedback: every search and click is a clean label about whether the order helped.
    • There are built in sanity checks: verification exists and should be respected, not hard coded.
    • With small models comes low compute cost: ranking can run anywhere (more on this below).

This is the kind of problem where a lightweight learner can deliver real wins, shipped quickly to production without training beforehand and without heavy infrastructure.

Don't guess. Learn.

Guessing what users want without learning from them has to be at the top of startup sins. Your users can tell you what they want better than anyone. So we built Scout to listen. Our RL Search system learns from what people actually click and confirms what helps them find the right token faster. Users shape it every day, every search, every click.

How it works in plain terms:

  1. We keep a small set of ranking strategies based on various attributes tokens have. One might lean on verification, another on trading activity, and another might prioritize liquidity in wallets.
  2. Each search is a chance to learn. If the first search result returned gets the click, great. If the tenth gets it, we learn that our order can improve.
  3. Scout shifts weight toward strategies that help users succeed more quickly.
  4. This continues forever. Before you know it, we have much better ranking strategies to return tokens than we started with.
  5. If user preferences change, Scout adapts automatically. Zero engineering updates.

Zero training. Evolves in the browser.

We shipped Token Search straight to production - no training. No big infra. Scout started learning immediately (thanks for helping teach it if you've searched tokens through Scout in the past few days). Even cooler: learning runs right in the browser in real time. During navigation, Scout occasionally tries small changes using a tiny genetic algorithm. It's extremely fast (less than 1ms) and invisible to the user. No disruptions, just better results.

What this buys us:

  • Ship fast: no dataset collection phase and no model training cycle. We wrote a small ranker, put it in front of users, and started learning immediately.
  • Low cost: no GPUs, no long running jobs. Compute happens on the client and scales with users, which fits our startup budget.
  • Snappy UX: ranking and the evolution complete in a blink. No extra network calls in the hot path.
  • Resilience: if the network or database is slow, Scout falls back to a safe baseline and keeps working. Learning resumes as soon as signals come back.
  • Safety controls: verification remains a strong prior. Exploration is bounded by UCB1 so the system stays sane while it learns.

Implementation notes:

  • We track only what we need: the position of the clicked result and a lightweight session marker.
  • Bandit math and features are written in TypeScript, so the same code runs in development and production.
  • Evolved strategies are versioned. We can pin or roll back instantly.
  • We validate changes with deterministic A/B buckets.

How it's going

Within just five days we saw about a 25% lift over our baseline sort. People were finding the right token near the top more often. That shows up in daily use.

We also validated something everyone suspects: Verification matters, and it still likely matters the most. As the system learned, strategies that weighted verification higher started to win more frequently - a good sanity check that learning is aligning with safety.

Technical sketch

Short version for the curious:

  • UCB1 balances trying new strategies with using proven ones.
  • Six features carry the signal: wallets, trades, market cap, liquidity, volume, verification. We have ~12 more features in the pipeline.
  • Client side evolution creates small strategy variations and keeps the good ones - this is where genetic evolution comes in.
  • Graceful fallbacks keep ranking responsive if anything fails.

Looking forward

Two directions stand out:

  1. Improve the learner: light context to match intent better, time features for fresh launches, tighter near duplicate handling. My personal view is that there is a lot more work to be done on properly deduplicating and prioritizing legitimate tokens.
  2. Keep a problem-first mindset: in a world full of LLM hype, the point of ML is unchanged. Start with the user problem, pick the simplest tool that works. There is a growing amount of non-LLM ML inside Scout that already moves the needle, and there's a huge opporutnity across crypto products to take similar approaches.

The goal is simple. Build a platform that helps people trade smarter and, as we like to say: see the future.


If you want the full details and math, the technical paper is here:Lightweight Reinforcement Learning for Cryptocurrency Token Search

WeChat