Skip to content

Instantly share code, notes, and snippets.

@do-me
Last active April 9, 2026 13:37
Show Gist options
  • Select an option

  • Save do-me/4c70f9e3e6a176ce3f26b50b872f063b to your computer and use it in GitHub Desktop.

Select an option

Save do-me/4c70f9e3e6a176ce3f26b50b872f063b to your computer and use it in GitHub Desktop.
Useful Parquet oneliners, view and manipulate with DuckDB
# view 10 lines
uvx duckdb -c "FROM 'results.parquet' LIMIT 10"
# add 2 new columns
uvx duckdb -c "COPY (SELECT *, NULL::INT AS dominik_label, NULL::VARCHAR AS dominik_comments FROM 'results.parquet') TO 'results.parquet'"
# remove 2 columns
uvx duckdb -c "COPY (SELECT * EXCLUDE (dominik_label, dominik_comments) FROM 'results.parquet') TO 'results.parquet'"
# sort by 2 columns
uvx duckdb -c "COPY (SELECT * FROM 'results.parquet' ORDER BY has_error DESC, has_deviation DESC) TO 'results.parquet'"
# display and copy as markdown
uvx duckdb -markdown -c "FROM 'results.parquet' LIMIT 10" | pbcopy
# add lat lon from geometry and base_geometry
uvx duckdb -c "INSTALL spatial; LOAD spatial; COPY (SELECT *, ST_Y(ST_GeomFromWKB(geometry)) AS latitude, ST_X(ST_GeomFromWKB(geometry)) AS longitude, ST_Y(ST_GeomFromWKB(base_geometry)) AS base_latitude, ST_X(ST_GeomFromWKB(base_geometry)) AS base_longitude FROM 'results.parquet') TO 'results.parquet' (FORMAT PARQUET, COMPRESSION 'SNAPPY');"
# add distance score from 0 - 1 based on meters
uvx duckdb -c "
COPY (
WITH stats AS (
SELECT MAX(distance_meters) as max_dist
FROM 'results.parquet'
)
SELECT
t.*,
(1 - (t.distance_meters / s.max_dist)) AS distance_score
FROM 'results.parquet' AS t, stats s
) TO 'results.parquet' (FORMAT PARQUET);"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment