Building a Freelance Job Scout: Playwright, Weighted Scoring, and Telegram Alerts


Job boards are a firehose. By the time you have read ten listings, eight were a bad fit, one was saturated with 40 proposals, and the last one paid $50 for a month of work. The fix is not “read faster” — it is to make a machine read for you and only interrupt you for the listings that are genuinely worth your time.

Here is how I built exactly that: a cron job that scrapes several boards, scores each listing against my real profile, factors in competition and pay, and sends only the top matches to Telegram.

Plain HTTP doesn’t work — and that’s the first lesson

The naive approach is fetch() the search page and parse the HTML. It fails on most modern boards. Workana, for example, returns a degraded page to non-browser clients: HTTP 200, full-size HTML, and zero job listings in it. Even with a complete set of browser headers (User-Agent, Accept-Language, sec-ch-ua, Sec-Fetch-*), the listings never appear — they are rendered by JavaScript and gated behind bot detection.

So the scraper needs a real browser. The lightweight trick is to drive the system Chrome with Playwright instead of downloading a bundled Chromium (which, on a brand-new OS release, may not even have a supported build):

browser = p.chromium.launch(
    headless=True,
    channel="chrome",                       # use /usr/bin/google-chrome
    args=["--no-sandbox", "--disable-dev-shm-usage"],
)

That --disable-dev-shm-usage flag matters on memory-constrained machines: it keeps Chrome from exhausting /dev/shm and getting OOM-killed mid-run.

Dispose the browser, always

A cron that leaks a browser process every few hours will eventually take the box down. The single most important reliability rule is to close the browser in a finally block so it is disposed even when a page throws:

browser = p.chromium.launch(headless=True, channel="chrome")
try:
    page = browser.new_page()
    for url in urls:
        ...
finally:
    browser.close()   # always — no leaked browsers, no OOM

Scoring: the title is the signal

A flat keyword match over the whole listing is noisy. A “Customer Support Agent” role will match backend, python, and docker because the company’s stack is mentioned in the description — and suddenly your scout is pinging you about support jobs.

Two rules fix this:

  1. Weight the title twice as heavily as the description.
  2. Require at least one profile keyword in the title for the listing to qualify at all.
title_hits = {k for k in WEIGHTS if k in title}
all_hits   = title_hits | {k for k in WEIGHTS if k in card}
total = sum(WEIGHTS[k] * (2 if k in title_hits else 1) for k in all_hits) \
        - 3 * len(negatives)
qualifies = total >= THRESHOLD and bool(title_hits)

Keywords are weighted by how rare and valuable they are to me: niche skills (legacy integration, message queues) score higher than commodity ones.

Fit is not value — competition and pay decide

A perfect skill match with 40 existing proposals is a worse use of your time than a good match with 2. So the final rank is a composite:

value = fit
      + 1.5 * budget_tier        # higher pay ranks up
      + competition_bonus        # few bids: +3 … many bids: −4
      − roi_penalty              # huge scope + low/no pay = time sink

The catch: list pages hide the bid count and often the budget. So for the top candidates by fit, the scraper opens each detail page in the same browser session and pulls the real number of proposals, the real budget, and the full description — then recomputes the score with facts instead of guesses. One result of this: a beautifully matched listing with 126 proposals correctly sinks to the bottom.

Delivery: only the good ones, deduped

Finally, the scout dedups against a small seen.json (pruned after 30 days) so you never get the same listing twice, and sends the survivors to Telegram with their score, pay, and competition up front:

⭐ 28  ·  🎯 fit 19
[Workana] Expert Docusign Integration in ASP.NET Core
💵 $45/h  ·  🧰 2 proposals
🔧 .net, integration, rest api, c#, backend
<link>

You open that one.

Takeaways

  • Assume modern boards need a real browser; budget for bot detection.
  • Always dispose the browser in finally; use --disable-dev-shm-usage.
  • Weight the title and require a title match — descriptions over-match.
  • Rank by value (fit + pay − competition), not raw fit. Enrich the top candidates with real data before deciding.

The whole thing is ~250 lines of Python on a cron. The point is not the code — it is that a few hours of automation turns a daily firehose into a short, ranked shortlist that respects your attention.