Specification

Technical reference for implementers. Everything you need to build a VitaeFlow reader or writer in any language.

1. Overview

VitaeFlow is an open standard for embedding structured resume data inside PDF files. It uses the PDF/A-3 associated files standard (ISO 19005-3) to attach a JSON file inside the PDF — the same standard used by Factur-X for electronic invoices, adapted for resume data.

A VitaeFlow PDF is a standard PDF that contains an embedded JSON file named vitaeflow.json. The PDF remains fully readable by any viewer. Tools that understand VitaeFlow can extract the structured data; others simply ignore it.

Resume PDF + vitaeflow.json = .vf.pdf

The .vf.pdf file extension is recommended for discoverability but not required. The embedded data is what makes a PDF a VitaeFlow document, not the filename.

2. Schema

The resume data follows a JSON Schema (draft 2020-12) defined in schema.json. The schema is the source of truth — refer to it for the exhaustive list of fields and constraints.

Required top-level fields

Field Type Description
version string Schema version, e.g. "0.1". Format: major.minor
profile string Always "standard"
basics object Core identity. Requires givenName, familyName, email

Optional sections

All other sections are arrays of objects. Include what you have, omit the rest:

workeducationskillslanguagescertificationsprojectspublicationsvolunteerreferencesinterestscustom

Shared types

dateYYYY, YYYY-MM, or YYYY-MM-DD. Pattern: ^\d4(-(?:0[1-9]|1[0-2])(-(?:0[1-9]|[12]\d|3[01]))?)?$
countryCode — ISO 3166-1 alpha-2, e.g. "FR", "US". Pattern: ^[A-Z]2$
fluencyA1 A2 B1 B2 C1 C2 native bilingual
levelbeginner intermediate advanced expert

Example

vitaeflow.json
{
  "version": "0.1",
  "profile": "standard",
  "lang": "en",
  "basics": {
    "givenName": "Marie",
    "familyName": "Laurent",
    "email": "[email protected]",
    "headline": "Lead Developer"
  },
  "work": [
    {
      "organization": "TechCorp",
      "position": "Lead Developer",
      "startDate": "2021-03"
    }
  ]
}

3. PDF embedding

VitaeFlow uses the PDF/A-3 associated files mechanism (ISO 19005-3) to embed the JSON data. This is the same standard mechanism used by Factur-X for electronic invoices.

Constants

Filename vitaeflow.json
MIME type application/json
AFRelationship /Alternative
Description VitaeFlow structured resume data

Interactive overview

Explore the PDF internal structure to see where VitaeFlow data lives. Click on any node to expand it and see details.

PDF internal structure— click to expand

PDF structure

The JSON file is attached to the PDF using two complementary mechanisms: the EmbeddedFiles name tree (PDF 1.7, ISO 32000-1 §7.11.4) and the AF array (PDF/A-3, ISO 19005-3).

PDF Catalog structure
Catalog {
  Names: {
    EmbeddedFiles: {
      Names: [
        (vitaeflow.json)  <FileSpec ref>
      ]
    }
  }
  AF: [ <FileSpec ref> ]
  Metadata: <XMP stream ref>
}

EmbeddedFiles name tree — stores the file reference by name. Name trees can be hierarchical (with Kids arrays) or flat (with a Names array). Implementations must handle both. The array contains alternating name/reference pairs: ["vitaeflow.json", <ref>].

AF array — PDF/A-3 requires the file reference to also appear in the catalog-level AF (Associated Files) array. This enables conforming readers to discover embedded files without traversing the name tree.

FileSpec dictionary

The file is described by a FileSpec dictionary (PDF 2.0 §7.11.3). The JSON content is stored as a compressed stream in the EF.F entry.

FileSpec structure
FileSpec Dictionary {
  Type:            /Filespec
  F:               (vitaeflow.json)
  Desc:            (VitaeFlow structured resume data)
  AFRelationship:  /Alternative
  EF: {
    F:  <stream>    % JSON content, UTF-8 encoded
  }
}

AFRelationship is set to /Alternative, indicating that the embedded file is an alternative representation of the document content — the same resume in a different format.

Stream encoding — the JSON content must be valid UTF-8. The stream may be compressed using standard PDF filters (typically FlateDecode). Implementations must decompress the stream before parsing.

Embedding steps

  1. 1 Validate the resume data against the schema in strict mode. Reject if invalid.
  2. 2 Remove any existing attachment named vitaeflow.json to prevent duplicates.
  3. 3 Create the FileSpec dictionary with the JSON content as a compressed UTF-8 stream.
  4. 4 Register the FileSpec in the EmbeddedFiles name tree and the catalog AF array.
  5. 5 Write XMP metadata to the document catalog (see next section).

4. XMP metadata

VitaeFlow writes custom XMP metadata into the PDF's metadata stream. This allows tools to identify VitaeFlow documents and read basic information without extracting and parsing the full JSON attachment.

XMP namespace

Namespace URI urn:vitaeflow:pdfa:resume:1p0#
Prefix vf

Properties

Property Value Source
DocumentType "RESUME" Constant
Version e.g. "0.1" From resume.version
ConformanceLevel "standard" From resume.profile
Generator e.g. "MyApp/1.0" From resume.meta.generator or SDK default

XMP template

If the PDF already contains XMP metadata, merge the VitaeFlow properties into the existing RDF block. If no XMP exists, create a new metadata stream with this structure:

XMP metadata
<?xpacket begin="\uFEFF" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <rdf:Description rdf:about=""
      xmlns:vf="urn:vitaeflow:pdfa:resume:1p0#">
      <vf:DocumentType>RESUME</vf:DocumentType>
      <vf:Version>0.1</vf:Version>
      <vf:ConformanceLevel>standard</vf:ConformanceLevel>
      <vf:Generator>MyApp/1.0</vf:Generator>
    </rdf:Description>
  </rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>

5. Extraction

Detection

To quickly check if a PDF contains VitaeFlow data without parsing the JSON:

  1. 1 Navigate to Catalog → Names → EmbeddedFiles
  2. 2 Traverse the name tree (handle both Names and Kids nodes)
  3. 3 Look for an entry named exactly "vitaeflow.json" (case-sensitive)

Full extraction

  1. 1 Locate the FileSpec for vitaeflow.json in the EmbeddedFiles name tree
  2. 2 Read the stream from FileSpec → EF → F
  3. 3 Decompress the stream (handle FlateDecode and other PDF filters)
  4. 4 Decode bytes as UTF-8 and parse as JSON
  5. 5 Validate the parsed data in tolerant mode

Error handling

No embedded file found — return null. The PDF is not a VitaeFlow document.
JSON parse failure — return a validation error. The file is corrupt or not valid JSON.
Validation failure — return the parsed data alongside the validation errors. A tolerant reader should still surface the data even if some fields are invalid.

6. Validation

VitaeFlow defines two validation modes to balance strictness during writing with flexibility during reading.

Strict mode

Used when writing (embedding). Rejects unknown fields at any level via additionalProperties: false. Ensures the data conforms exactly to the schema.

An implementation must validate in strict mode before embedding.

Tolerant mode

Used when reading (extracting). Strips all additionalProperties: false constraints, allowing unknown fields to pass through. Enables forward compatibility.

A v0.1 reader can read a v0.2 document without failing on new fields.

Error format

Validation errors use JSON Pointer paths (RFC 6901) and human-readable messages:

validation result
{
  "valid": false,
  "errors": [
    {
      "path": "/basics",
      "message": "Missing required property: email"
    },
    {
      "path": "/work/0",
      "message": "Missing required property: startDate"
    }
  ],
  "warnings": []
}

7. Versioning

The schema follows an additive-only evolution model:

When a reader encounters a document with a newer schema version than it supports, it should emit a warning but still attempt to extract and return the data. This is the purpose of tolerant mode.

Version comparison: parse version as major.minor integers. If the document's version is greater than the implementation's supported version, emit: "Resume uses schema version X.Y, but this SDK supports A.B. Some fields may not be validated."

8. Implementation guide

To build a VitaeFlow implementation in a new language, your library needs to handle three concerns: PDF manipulation, schema validation, and XMP metadata.

Checklist

PDF operations

  • Read and write PDF 1.7+ files
  • Navigate EmbeddedFiles name trees (flat and hierarchical)
  • Create FileSpec dictionaries with embedded streams
  • Manage the catalog AF array
  • Decompress streams (FlateDecode at minimum)

Schema validation

  • Support JSON Schema 2020-12
  • Compile in strict mode (additionalProperties: false enforced)
  • Compile in tolerant mode (additionalProperties: false removed)
  • Validate format keywords: email, uri
  • Return all errors, not just the first

XMP metadata

  • Create RDF/XML with the VitaeFlow namespace
  • Merge into existing XMP if present
  • Handle XML entity escaping (&, <, >)

Expected behavior

Function Behavior
embed Validate strict → remove existing → attach JSON → write XMP → return PDF bytes
extract Find attachment → decompress → parse JSON → validate tolerant → return data + validation
detect Check EmbeddedFiles name tree for "vitaeflow.json" → return boolean

Reference constants

Constant Value
FILENAME vitaeflow.json
MIME_TYPE application/json
AF_RELATIONSHIP /Alternative
XMP_NAMESPACE urn:vitaeflow:pdfa:resume:1p0#
XMP_PREFIX vf
SCHEMA_VERSION 0.1

Resources

Have an idea for a new tool?

Contribute your tools to help grow the VitaeFlow ecosystem.

Contribute on GitHub