Skip to content

SharePoint workflow

Two-stage workflow: pull binary documents from SharePoint, convert them to markdown for downstream processing.

version: "1.0"
routes:
sharepoint_ingest:
inbound:
queue: "/queue/sharepoint.enumerate"
subscription: "sp-ingestors"
adapters:
- type: "sharepoint_ingest"
config:
site_id: "${SHAREPOINT_SITE_ID}"
drive_id: "${SHAREPOINT_DRIVE_ID}"
path: "/Shared Documents/Policies"
document_converter:
inbound:
queue: "/queue/sharepoint.converter.in"
subscription: "converters"
adapters:
- type: "document_converter"
init_message:
route: "sharepoint_ingest"
payload:
full_sync: true

Microsoft Graph via app registration. Env:

SHAREPOINT_TENANT_ID=...
SHAREPOINT_CLIENT_ID=...
SHAREPOINT_CLIENT_SECRET=...
SHAREPOINT_SITE_ID=...
SHAREPOINT_DRIVE_ID=...

Required Graph permissions: Sites.Read.All, Files.Read.All (application, not delegated).

The binary document is persisted between the two stages. That lets you re-run conversion (e.g., after upgrading the converter library) against the same source without re-hitting Graph.

document_converter handles:

  • .docx → markdown
  • .pdf → markdown (text-based; not OCR)
  • .pptx → markdown (headings + body text)
  • Plain .txt, .md — passed through

Formats outside this list are logged and skipped.

Feeds the markdown workflow for segmentation, then embeddings, then knowledge for concept extraction.