XML Job Posts & Signals Feeds
JobFront can provide multiple types of XML feeds for you with clean and formatted job post data
(Most common) Large XML feed for your overall, chronologically-ordered job feed
Different XML feeds per source JobFront tracks for you
Smaller XML feeds for different categories (job function, location, etc)
How it works
If you are ingesting different XML feeds for each source JobFront tracks for you (usually for job boards), once you get access to the JobFront data playground app you can search and retrieve different sources and see the XML feed link directly in the interface that you can copy into your own XML ingestion program
If you are ingesting a single large XML feed from many sources in chronological order (either for large job boards, or for sales intelligence tools), we will usually give you an API token that you can use to generate a signed S3 bucket url. You can then use that authenticated S3 bucket URL to download the XML feeds that we provide from S3
You will be able to retrieve the full active jobs index, along with delta files (removed and added jobs in the past day)
Job Posts - XML Format
Root Elements
<?xml version="1.0" encoding="UTF-8"?>: XML declaration
<jobs>: Root element containing all job entries
Job Entry
Each job is enclosed in a <job> tag and contains the following fields:
Basic Job Information
<id> Unique identifier for the job
<title> Title of the job position
<commitment> Type of job commitment
Options: full_time, part_time, internship, contract, temporary, volunteer, other
<description> Brief 1-line description of the job
<post> Detailed job posting in HTML format
<post_language> Most likely 2-character language
Note: Most common language will be lowercase "en" - english but we do occasionally see jobs from dozens of other languages, indicated by their 2-character names
Note: In some cases job posts contain many languages, and we will provide a single predominate language
<level> Level of the job position (e.g., entry, senior)
Options: internship, entry_level, junior, mid_level, senior, expert ("" blank, if not detected)
<benefits> List of freetext benefits offered with the job
List of <benefit> tags with freetext benefits described in each
<requirements> List of requirements for the job
List of <requirement> tags with freetext requirements described in each
<responsibilities> Responsibilities associated with the job
List of <responsibility> tags with freetext responsibilities described in each
URLs
<url_application> URL for the job application
Note: Many jobs do not have an explicit application URL and this field can be empty
<url_job> Origin URL for the job post
Dates
<created> Date the job was seen (usually when we scraped this job for the first time, although sometimes we can determine that the job was posted in the past and we use that more precise historical timestamp)
Format: YYYY-MM-DD and UTC time
<created_at> Timestamp when the job was created
Format: Unix timestamp
Salary Information
<salary> Salary information enclosing the following subtags:
<min> Minimum salary
<max> Maximum salary
<currency> 3-character currency code
<period> Pay period (default is "year")
Locations
<locations> List of structured location information
List of <location> tags containing the following subtags:
<text> City, State, Country of the job location (includes all available entries)
<city> City of the job location
<state> State or region of the job location
<country> Country of the job location
<zip> Zip code of the job location
<latitude> Lat of the job location
<longitude> Long of the job location
<cbsas> List of <cbsa> tags that contain both cbsa <code> and <title>
Tags
<tags> List of <tag> tags: Keywords or categories associated with the job
Source Information
<name> Name of the job source, organization, or company
<description> 1-line description of the job source
<domain> Naked domain of the source, for example “google.com”
<url_logo> Image/logo url hosted by JobFront
<tags> List of freetext <tag> tags: Tags associated with the job source
<industries> List of freetext <industry> tags: Industries associated with the job source
Formatting Notes
All text content is escaped using XML entity encoding to prevent XML parsing errors.
Numeric fields (like salary) are not enclosed in quotes.
Empty fields are represented by self-closing tags or empty string values.
Lists (like locations, tags, industries) are represented by multiple child elements within a parent element.
Intent Signals - XML Format
Root Elements
<?xml version="1.0" encoding="UTF-8"?>: XML declaration
<signals>: Root element containing all signal entries
Signal Entry
Each signal is enclosed in a <signal> tag and contains the following fields:
Basic Signal Information
<id> Unique identifier for the signal (from problem_id)
<category> Category of the signal (from problem_category)
<type> Type of the signal (from problem_type)
<recency> Recency status of the signal (from problem_recency)
<severity> Severity level of the signal (from problem_severity)
<match> Match information for the signal (from problem_match)
<classification> Classification of the signal (from problem_classification)
<text> Descriptive text of the signal (from problem_text)
Evidence Information
Enclosed in <evidences> tags:
Multiple <evidence> tags: Each containing evidence supporting the signal
Associated JobsJobFront - XML Job Object Field Documentation V3JobFront - XML Job Object Field Documentation V3
Enclosed in <jobs> tags:
Multiple <job> tags, each containing:
<id>: Unique identifier for the job (from job_id)
<title>: Title of the job (from job_title)
<url>: URL for the job posting (from url_job)
Dates
<created>: Formatted timestamp when the signal was created (YYYY-MM-DDThh:mm )
<created_at>: Unix timestamp of when the signal was created
<start_at>: Unix timestamp of when the signal started (from problem_start_at)
<recent_at>: Unix timestamp of the most recent signal occurrence (from problem_recent_at)
Source Information
Enclosed in <source> tags:
<id>: Unique identifier for the source (from source_id)
<name>: Name of the source (from source_name)
<description>: Description of the source (from source_description)
<domain>: Domain URL of the source (from source_url)
<url_logo>: URL for the source's logo image (from source_url_logo)
Enclosed in <tags> tags:
Multiple <tag> tags: Tags associated with the source
Enclosed in <industries> tags:
Multiple <industry> tags: Industries associated with the source
Retrieving XML Files
XML feeds are stored in an S3 bucket. To retrieve the S3 bucket URL:
Aggregate feed URLS will be shared with you directly, as they are typically unique per customer
Per-source feed urls are available to retrieve via the app (when logged in)
Authentication
Most XML feeds require authentication. We enforce authentication via AWS Signed URL's to access the S3 buckets
JobFront will provide you with an API token
Submit your API token to our api to retrieve a signed download URL to your S3 bucket
More explicitly for both Jobs and Signals XML feeds, first authorize via the below API calls to retrieve available signed URLS:
API Returns signed URL's for different XML files. Note that each field returns a list of URL's because for some accounts these files are too large (10's of GB) to manage and we split these datasets into multiple files.
Please reach out to us and we'll get you set up hello@jobfront.io.
Last updated