Scrape Profile Reference¶
Scrape profiles define what jobs to search for, where, and on which sites. Manage them in the web UI at /scrapes, or via the MCP tools (create_scrape_profile, update_scrape_profile, delete_scrape_profile, list_scrape_profiles).
Creating a profile¶
In the web UI, go to Scrapes and click New Profile. Fill in the fields and save. The profile is stored in the database and used by the built-in scheduler whenever it runs a scrape.
Fields¶
| Field | Required | Default | Description |
|---|---|---|---|
name |
Yes | — | Unique name for the profile (e.g., "backend_berlin_remote"). Used as an identifier. |
search_term |
Yes | — | The job search query (e.g., "backend developer", "Ruby on Rails", "Softwareentwickler"). |
location |
Yes | — | City, region, or country (e.g., "Berlin, Germany", "United Kingdom", "Remote"). |
is_remote |
No | null |
Filter by remote status. true = remote only, false = non-remote only, null = all. |
results_wanted |
No | 100 | Maximum number of results to fetch per scrape run. Higher values take longer and use more memory. |
site_names |
No | indeed, linkedin |
Comma-separated list of job sites to scrape. See supported sites below. |
country_indeed |
No | Germany |
Country filter for Indeed and Glassdoor. Must match their country dropdown values. |
hours_old |
No | 72 | Only fetch jobs posted within this many hours. null = no time filter. |
job_type |
No | null |
Filter by employment type: fulltime, parttime, contract, internship. null = all. |
distance |
No | null |
Radius in miles from the location. null = site default. |
google_search_term |
No | null |
Custom Google Jobs query. If unset, auto-generated from search_term + location. |
linkedin_fetch_description |
No | true |
Fetch full job descriptions from LinkedIn (slower but much better for scoring). |
Supported job sites¶
Scraping is powered by JobSpy. Three scraper backends handle different site groups:
JobSpy backend¶
| Site | Market | Notes |
|---|---|---|
| Indeed | Global (country-filtered) | Largest general job board. Uses country_indeed to set search region. |
| Global | Professional network listings. May be rate-limited; descriptions fetched by default. | |
| Glassdoor | Global (country-filtered) | Listings with company reviews and salary data. Uses country_indeed for region. |
| Google Jobs | Global | Aggregates from multiple sources. Uses google_search_term or auto-generates one. |
| ZipRecruiter | US & Canada | North American board focused on small-to-mid-size employers. |
| Bayt | Middle East & North Africa | Leading MENA job board. Ignores city; uses country only. |
| Naukri | India | India's largest job board. Set country to India and location to an Indian city. |
| BDJobs | Bangladesh | Bangladesh's primary job board. |
Adzuna backend¶
| Site | Market | Notes |
|---|---|---|
| Adzuna | Global | API-based aggregator. Requires ADZUNA_APP_ID and ADZUNA_APP_KEY env vars. |
Arbeitnow backend¶
| Site | Market | Notes |
|---|---|---|
| Arbeitnow | Germany-focused | Free, no auth required. Good for German-language job listings. |
Tips for effective profiles¶
Start broad, then narrow. Create a few general profiles first (e.g., "software engineer" + your city). Review the results, then create more targeted profiles based on what you're seeing.
Use multiple profiles. Jobs found by multiple profiles score higher on the "search terms" signal (up to 5 bonus points). Having overlapping profiles is fine — duplicates are handled via upsert.
Match the site to the market. Indeed and LinkedIn work globally. Adzuna is strong in the UK, Australia, and Germany. Arbeitnow focuses on Germany. Bayt, Naukri, and BDJobs serve specific regional markets.
Set hours_old to manage volume. The default of 72 hours works well for daily/frequent scraping. Set higher (168 = 1 week) if you scrape less often, lower (24) if you want only fresh postings.
Use is_remote intentionally. Setting is_remote=true filters at the scraper level — you'll miss hybrid and onsite jobs entirely. If you want to see everything but prefer remote, leave is_remote unset and configure remote: preferred in your match profile instead. The scorer will give remote jobs higher scores without hiding the rest.
Relationship to scoring¶
Scrape profiles determine which jobs enter your database. The match profile determines how they're scored. The two work together:
- Scrape profiles fetch jobs matching your search terms and locations
- The scorer evaluates every fetched job against your match profile
- Jobs found by multiple scrape profiles get bonus "search terms" points
- The
search_termsfield on each job records which profiles found it
If you're getting lots of low-scoring jobs, the issue is usually with the match profile (tune skills, role types, deal-breakers), not the scrape profiles. Scrape profiles should cast a reasonably wide net; let the scorer do the filtering.