User Stories
As a publisher, I want to import a WordPress export without manually fixing structure so that my site comes in cleanly.
As a writer, I want my authored content to be credited correctly so that ownership and attribution are clear.
As an imported author, I want a guided handoff after import so that I can claim my identity and rotate keys myself.
Problem
We’re importing WordPress content into a system that has stricter identity and signing semantics than WordPress does. WordPress is loose and practical; Seed is explicit about author vs publisher. That mismatch is where most failures come from.
There are two recurring pain points we need to treat as first-class architecture concerns:
Permissions can get polluted over repeated imports if we keep creating writer grants without a stable identity model.
Author/email associations in WordPress exports are often incomplete or malformed, which breaks strict authored signing assumptions.
The result is operational friction: imports succeed “mostly,” but trust in the output drops when identity and capability state feels inconsistent.
Solution
Before details, the key idea is simple: we support two explicit import paths with different trust and signing models.
Two Paths
Ghostwritten Path (default, low-risk)
Publisher signs all imported documents.
Author names are preserved as display metadata only.
Authored Path (higher trust, identity-aware)
Importer signs with per-author keys when author identity is sufficiently resolved.
If identity is incomplete, we gracefully downgrade that item to publisher signing while keeping display attribution.
Actually we could copy the import flow WP has Starts in 3'11'':
Authored gives us operational safety by default and higher-fidelity authorship when vault supports it.
Architecture Overview
Ingestion and Normalization Layer
Parse WordPress export data into a canonical internal model.
Normalize creator identifiers early (case, wrappers, malformed tokens).
Separate “declared author directory” from “actual item creator” and resolve them into one canonical author map.
This layer is where we remove WordPress-specific noise before it contaminates downstream logic.
Path Planning Layer
Posts go to a dedicated posts namespace and stay flat.
Pages preserve hierarchy from parent relationships.
Slugs are normalized and decoded deterministically.
Content Transformation Layer
Convert rich HTML and plain text content into block documents.
Media files are scrapped and uploaded to ipfs, then linked in the documents. If the scrapper fails we insert a placeholder image
Keep transformation deterministic so rerunning import does not mutate content shape unpredictably.
Signing Orchestrator
Decide signer per item based on import mode and identity completeness.
Ghostwritten path: always publisher signer.
Authored path: author signer when identity is valid; otherwise publisher signer + display attribution.
Capability Guardrail Layer
Treat writer capabilities as idempotent operations
Check existing capabilities before creating new ones.
Avoid duplicate capabilities for the same logical author on the same resource path.
State and Resume Layer
Persist import state and progress so interruptions are recoverable.
Store enough context to resume safely without duplicating completed work.
Keep state explicit per import run while identity mapping remains stable across runs.
Post-Import Author Onboarding
After authored imports finish, we send email to created authors with a secure onboarding link.
Flow:
Email author with “claim your imported author identity” link.
Author logs in with that email.
Author rotates from importer-created key material to a user-owned key.
System updates author identity to the rotated key as the long-term signer.
This turns authored import into a proper identity handoff instead of leaving imported authors stranded with opaque key state.
Rabbit Holes/risks
WordPress identity data is inconsistent in the wild; trying to perfectly infer real humans from exports can eat unlimited time.
Email onboarding
Key-rotation UX can become a product project if we overcomplicate recovery and edge cases.