Skip to content

test — extractor

Extractor

Functions

findDocFiles

function findDocFiles(dir: string): string[]
TypeScript

Use findDocFiles to collect every .md and .mdx file under a directory so you can process, validate, or index your documentation in bulk.

Reach for this when you need a flat list of all doc files before feeding them into a pipeline — for example, checking which pages are missing frontmatter, building a search index, or passing files to skrypt generate programmatically.

It walks the directory tree recursively, so nested folders (like docs/api/reference/) are included without any extra configuration.

NameTypeRequiredDescription
dirstringYesAbsolute or relative path to the root directory to scan. All subdirectories are traversed automatically.

Returns a flat string[] of file paths, one entry per .md or .mdx file found. Each path is relative to the process working directory. Pass these paths directly to readFileSync or your doc processing logic — the array is empty if no matching files exist.

  • Heads up: Only .md and .mdx extensions are matched — files like .markdown or .txt are silently skipped.

Example:

import { readdirSync, statSync, writeFileSync, mkdirSync } from "fs";
import { join, extname } from "path";
import { tmpdir } from "os";

// Inline implementation — do not import from autodocs
function findDocFiles(dir: string): string[] {
  const files: string[] = [];

  function walk(currentDir: string): void {
    const entries = readdirSync(currentDir);
    for (const entry of entries) {
      const fullPath = join(currentDir, entry);
      const stat = statSync(fullPath);
      if (stat.isDirectory()) {
        walk(fullPath);
      } else if ([".md", ".mdx"].includes(extname(entry))) {
        files.push(fullPath);
      }
    }
  }

  walk(dir);
  return files;
}

// Set up a realistic docs directory structure
const root = join(tmpdir(), "my-api-docs-" + Date.now());
const dirs = [
  join(root, "docs", "api"),
  join(root, "docs", "guides"),
  join(root, "docs", "reference", "v2"),
];
dirs.forEach((d) => mkdirSync(d, { recursive: true }));

writeFileSync(join(root, "docs", "index.mdx"), "# Welcome");
writeFileSync(join(root, "docs", "api", "authentication.md"), "# Auth");
writeFileSync(join(root, "docs", "api", "rate-limits.mdx"), "# Rate Limits");
writeFileSync(join(root, "docs", "guides", "quickstart.md"), "# Quickstart");
writeFileSync(join(root, "docs", "reference", "v2", "endpoints.mdx"), "# Endpoints");
writeFileSync(join(root, "docs", "reference", "v2", "schema.txt"), "not a doc"); // should be excluded

const docFiles = findDocFiles(join(root, "docs"));

console.log(`Found ${docFiles.length} doc files:`);
docFiles.forEach((f) => console.log(" ", f.replace(root, ".")));

// Expected output:
// Found 5 doc files:
//   ./docs/index.mdx
//   ./docs/api/authentication.md
//   ./docs/api/rate-limits.mdx
//   ./docs/guides/quickstart.md
//   ./docs/reference/v2/endpoints.mdx
TypeScript

extractSnippets

function extractSnippets(filePath: string, languageFilter?: string): ExtractedSnippet[]
TypeScript

Use extractSnippets to pull all fenced code blocks out of a Markdown or MDX file so you can analyze, test, or re-use the examples your docs already contain.

Reach for this when you need to audit code examples across your documentation — for example, to validate that snippets are syntactically correct, run them through a linter, or feed them back into skrypt's generation pipeline as reference examples.

It reads the file at filePath, scans for fenced code blocks (```lang), and returns each one as a structured object with the language tag and raw source. Pass languageFilter to narrow results to a single language when you only care about, say, TypeScript examples in a mixed-language doc.

Parameters

NameTypeRequiredDescription
filePathstringYesAbsolute or relative path to the .md or .mdx file to parse. The file must be readable at runtime.
languageFilterstringNoLanguage identifier to match against fenced code block tags (e.g. "typescript", "python"). When omitted, all code blocks are returned regardless of language.

Returns

An array of ExtractedSnippet objects, each containing the language tag and the raw code content of one fenced block. Returns an empty array if the file contains no code blocks (or none matching the filter). Pass the results to a linter, test runner, or back into skrypt generate as seed examples.

Heads up

  • The language tag is whatever string appears immediately after the opening fences — "ts" and "typescript" are treated as distinct values, so make sure your filter matches the exact tag used in the file.
  • Blocks with no language tag (bare ```) are included when languageFilter is omitted, but their language field will be undefined or an empty string.

Example:

import { readFileSync, writeFileSync } from "fs";
import { join } from "path";

// Inline types — do not import from autodocs
interface ExtractedSnippet {
  language: string | undefined;
  code: string;
}

// Self-contained implementation matching the real behavior
function extractSnippets(filePath: string, languageFilter?: string): ExtractedSnippet[] {
  const content = readFileSync(filePath, "utf-8");
  const snippets: ExtractedSnippet[] = [];
  const codeBlockRegex = /```(\w+)?[^\n]*\n([\s\S]*?)```/g;
  let match;

  while ((match = codeBlockRegex.exec(content)) !== null) {
    const language = match[1];
    const code = match[2].trimEnd();
    if (!languageFilter || language === languageFilter) {
      snippets.push({ language, code });
    }
  }

  return snippets;
}

// Create a realistic sample MDX file to parse
const sampleMdx = `
# Payment API

Create a charge like this:

\`\`\`typescript
const charge = await stripe.charges.create({
  amount: 2000,
  currency: "usd",
  source: "tok_visa",
});
console.log(charge.id); // ch_3OqX2KLkdIwHu7ix0Xyz1234
\`\`\`

Or in Python:

\`\`\`python
charge = stripe.Charge.create(
  amount=2000,
  currency="usd",
  source="tok_visa",
)
print(charge["id"])
\`\`\`
`;

const tmpPath = join(process.cwd(), "_sample_doc.mdx");
writeFileSync(tmpPath, sampleMdx, "utf-8");

try {
  // Extract only TypeScript snippets
  const tsSnippets = extractSnippets(tmpPath, "typescript");

  console.log(`Found ${tsSnippets.length} TypeScript snippet(s):\n`);
  tsSnippets.forEach((snippet, i) => {
    console.log(`--- Snippet ${i + 1} [${snippet.language}] ---`);
    console.log(snippet.code);
  });

  // Extract all snippets (no filter)
  const allSnippets = extractSnippets(tmpPath);
  console.log(`\nTotal snippets (all languages): ${allSnippets.length}`);
  // Expected output:
  // Found 1 TypeScript snippet(s):
  // --- Snippet 1 [typescript] ---
  // const charge = await stripe.charges.create({ ... })
  // ...
  // Total snippets (all languages): 2
} catch (err) {
  console.error("Failed to extract snippets:", err);
} finally {
  // Clean up temp file
  import("fs").then(({ unlinkSync }) => unlinkSync(tmpPath));
}
TypeScript
Was this helpful?