Skip to content

Scrape Profile Reference

Scrape profiles define what jobs to search for, where, and on which sites. Manage them in the web UI at /scrapes, or via the MCP tools (create_scrape_profile, update_scrape_profile, delete_scrape_profile, list_scrape_profiles).

Creating a profile

In the web UI, go to Scrapes and click New Profile. Fill in the fields and save. The profile is stored in the database and used by the built-in scheduler whenever it runs a scrape.

Fields

Field Required Default Description
name Yes Unique name for the profile (e.g., "backend_berlin_remote"). Used as an identifier.
search_term Yes The job search query (e.g., "backend developer", "Ruby on Rails", "Softwareentwickler").
location Yes City, region, or country (e.g., "Berlin, Germany", "United Kingdom", "Remote").
is_remote No null Filter by remote status. true = remote only, false = non-remote only, null = all.
results_wanted No 100 Maximum number of results to fetch per scrape run. Higher values take longer and use more memory.
site_names No indeed, linkedin Comma-separated list of job sites to scrape. See supported sites below.
country_indeed No Germany Country filter for Indeed and Glassdoor. Must match their country dropdown values.
hours_old No 72 Only fetch jobs posted within this many hours. null = no time filter.
job_type No null Filter by employment type: fulltime, parttime, contract, internship. null = all.
distance No null Radius in miles from the location. null = site default.
google_search_term No null Custom Google Jobs query. If unset, auto-generated from search_term + location.
linkedin_fetch_description No true Fetch full job descriptions from LinkedIn (slower but much better for scoring).

Supported job sites

Scraping is powered by JobSpy. Three scraper backends handle different site groups:

JobSpy backend

Site Market Notes
Indeed Global (country-filtered) Largest general job board. Uses country_indeed to set search region.
LinkedIn Global Professional network listings. May be rate-limited; descriptions fetched by default.
Glassdoor Global (country-filtered) Listings with company reviews and salary data. Uses country_indeed for region.
Google Jobs Global Aggregates from multiple sources. Uses google_search_term or auto-generates one.
ZipRecruiter US & Canada North American board focused on small-to-mid-size employers.
Bayt Middle East & North Africa Leading MENA job board. Ignores city; uses country only.
Naukri India India's largest job board. Set country to India and location to an Indian city.
BDJobs Bangladesh Bangladesh's primary job board.

Adzuna backend

Site Market Notes
Adzuna Global API-based aggregator. Requires ADZUNA_APP_ID and ADZUNA_APP_KEY env vars.

Arbeitnow backend

Site Market Notes
Arbeitnow Germany-focused Free, no auth required. Good for German-language job listings.

Tips for effective profiles

Start broad, then narrow. Create a few general profiles first (e.g., "software engineer" + your city). Review the results, then create more targeted profiles based on what you're seeing.

Use multiple profiles. Jobs found by multiple profiles score higher on the "search terms" signal (up to 5 bonus points). Having overlapping profiles is fine — duplicates are handled via upsert.

Match the site to the market. Indeed and LinkedIn work globally. Adzuna is strong in the UK, Australia, and Germany. Arbeitnow focuses on Germany. Bayt, Naukri, and BDJobs serve specific regional markets.

Set hours_old to manage volume. The default of 72 hours works well for daily/frequent scraping. Set higher (168 = 1 week) if you scrape less often, lower (24) if you want only fresh postings.

Use is_remote intentionally. Setting is_remote=true filters at the scraper level — you'll miss hybrid and onsite jobs entirely. If you want to see everything but prefer remote, leave is_remote unset and configure remote: preferred in your match profile instead. The scorer will give remote jobs higher scores without hiding the rest.

Relationship to scoring

Scrape profiles determine which jobs enter your database. The match profile determines how they're scored. The two work together:

  1. Scrape profiles fetch jobs matching your search terms and locations
  2. The scorer evaluates every fetched job against your match profile
  3. Jobs found by multiple scrape profiles get bonus "search terms" points
  4. The search_terms field on each job records which profiles found it

If you're getting lots of low-scoring jobs, the issue is usually with the match profile (tune skills, role types, deal-breakers), not the scrape profiles. Scrape profiles should cast a reasonably wide net; let the scorer do the filtering.