Skip to main content

CSV

CSV (Comma-Separated Values) is a simple, widely-supported file format for storing tabular data. Despite its simplicity, it remains one of the most common formats for data exchange between systems.

Format

name,email,age,active
Alice,alice@example.com,30,true
Bob,bob@example.com,25,true
Charlie,charlie@example.com,35,false

Key Characteristics

  • Plain text: Human-readable, easy to inspect
  • Universal support: Every spreadsheet and database can handle CSV
  • No types: Everything is a string (parsing required)
  • Delimiter variations: TSV (tabs), semicolons, pipes

Common Issues

IssueExampleSolution
Commas in values"Smith, John"Quote the field
Quotes in values"He said ""hello"""Escape with double quotes
Newlines in valuesMulti-line textQuote the field
EncodingNon-ASCII charactersUse UTF-8, specify BOM
Large filesMemory issuesStream processing

Reading CSV in JavaScript

// Using PapaParse (browser and Node)
import Papa from 'papaparse';

Papa.parse(csvString, {
header: true,
dynamicTyping: true,
complete: (results) => {
console.log(results.data);
},
});

// Node.js with csv-parse
import { parse } from 'csv-parse/sync';

const records = parse(csvString, {
columns: true,
skip_empty_lines: true,
});

CSV vs Alternatives

FormatBest For
CSVSimple tabular data, spreadsheet import
JSONNested/hierarchical data, APIs
ParquetLarge datasets, analytics
ExcelRich formatting, formulas

What We Like

  • Simplicity: Everyone understands CSV
  • Compatibility: Works with every tool
  • Streaming: Can process line by line
  • Debugging: Easy to inspect and edit

What We Don't Like

  • No schema: Type information lost
  • No hierarchy: Flat structure only
  • Ambiguous spec: Edge cases handled differently
  • No compression: Larger than binary formats
  • Encoding issues: UTF-8 not always assumed

Best Practices

  1. Always use headers: First row should be column names
  2. Quote strings: Avoid delimiter issues
  3. Specify encoding: UTF-8 with or without BOM
  4. Validate on import: Type and format checking
  5. Consider alternatives: Parquet for analytics, JSON for nested data