Scrape Profile Reference¶

Scrape profiles define what jobs to search for, where, and on which sites. Manage them in the web UI at /scrapes, or via the MCP tools (create_scrape_profile, update_scrape_profile, delete_scrape_profile, list_scrape_profiles).

Creating a profile¶

In the web UI, go to Scrapes and click New Profile. Fill in the fields and save. The profile is stored in the database and used by the built-in scheduler whenever it runs a scrape.

Fields¶

Field	Required	Default	Description
`name`	Yes	—	Unique name for the profile (e.g., "backend_berlin_remote"). Used as an identifier.
`search_term`	Yes	—	The job search query (e.g., "backend developer", "Ruby on Rails", "Softwareentwickler").
`location`	Yes	—	City, region, or country (e.g., "Berlin, Germany", "United Kingdom", "Remote").
`is_remote`	No	`null`	Filter by remote status. `true` = remote only, `false` = non-remote only, `null` = all.
`results_wanted`	No	100	Maximum number of results to fetch per scrape run. Higher values take longer and use more memory.
`site_names`	No	`indeed, linkedin`	Comma-separated list of job sites to scrape. See supported sites below.
`country_indeed`	No	`Germany`	Country filter for Indeed and Glassdoor. Must match their country dropdown values.
`hours_old`	No	72	Only fetch jobs posted within this many hours. `null` = no time filter.
`job_type`	No	`null`	Filter by employment type: `fulltime`, `parttime`, `contract`, `internship`. `null` = all.
`distance`	No	`null`	Radius in miles from the location. `null` = site default.
`google_search_term`	No	`null`	Custom Google Jobs query. If unset, auto-generated from `search_term` + `location`.
`linkedin_fetch_description`	No	`true`	Fetch full job descriptions from LinkedIn (slower but much better for scoring).

Supported job sites¶

Scraping is powered by JobSpy. Three scraper backends handle different site groups:

JobSpy backend¶

Site	Market	Notes
Indeed	Global (country-filtered)	Largest general job board. Uses `country_indeed` to set search region.
LinkedIn	Global	Professional network listings. May be rate-limited; descriptions fetched by default.
Glassdoor	Global (country-filtered)	Listings with company reviews and salary data. Uses `country_indeed` for region.
Google Jobs	Global	Aggregates from multiple sources. Uses `google_search_term` or auto-generates one.
ZipRecruiter	US & Canada	North American board focused on small-to-mid-size employers.
Bayt	Middle East & North Africa	Leading MENA job board. Ignores city; uses country only.
Naukri	India	India's largest job board. Set country to India and location to an Indian city.
BDJobs	Bangladesh	Bangladesh's primary job board.

Adzuna backend¶

Site	Market	Notes
Adzuna	Global	API-based aggregator. Requires `ADZUNA_APP_ID` and `ADZUNA_APP_KEY` env vars.

Arbeitnow backend¶

Site	Market	Notes
Arbeitnow	Germany-focused	Free, no auth required. Good for German-language job listings.

Tips for effective profiles¶

Start broad, then narrow. Create a few general profiles first (e.g., "software engineer" + your city). Review the results, then create more targeted profiles based on what you're seeing.

Use multiple profiles. Jobs found by multiple profiles score higher on the "search terms" signal (up to 5 bonus points). Having overlapping profiles is fine — duplicates are handled via upsert.

Match the site to the market. Indeed and LinkedIn work globally. Adzuna is strong in the UK, Australia, and Germany. Arbeitnow focuses on Germany. Bayt, Naukri, and BDJobs serve specific regional markets.

Set hours_old to manage volume. The default of 72 hours works well for daily/frequent scraping. Set higher (168 = 1 week) if you scrape less often, lower (24) if you want only fresh postings.

Use is_remote intentionally. Setting is_remote=true filters at the scraper level — you'll miss hybrid and onsite jobs entirely. If you want to see everything but prefer remote, leave is_remote unset and configure remote: preferred in your match profile instead. The scorer will give remote jobs higher scores without hiding the rest.

Relationship to scoring¶

Scrape profiles determine which jobs enter your database. The match profile determines how they're scored. The two work together:

Scrape profiles fetch jobs matching your search terms and locations
The scorer evaluates every fetched job against your match profile
Jobs found by multiple scrape profiles get bonus "search terms" points
The search_terms field on each job records which profiles found it

If you're getting lots of low-scoring jobs, the issue is usually with the match profile (tune skills, role types, deal-breakers), not the scrape profiles. Scrape profiles should cast a reasonably wide net; let the scorer do the filtering.