Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You should checkout powershell; it supports converting CSV into in-memory structured data and then you can run regular powershell queries on that data:

    $> csvData = @"
       Name,Department,Salary
       John Doe,IT,60000
       Jane Smith,Finance,75000
       Alice Johnson,HR,65000
       Bob Anderson,IT,71000
       "@;
    
    $> csvData
         | ConvertFrom-Csv
         | Select Name, Salary
         | Sort Salary -Descending
    
       Name          Salary
       ----          ------
       Jane Smith    75000
       Bob Anderson  71000
       Alice Johnson 65000
       John Doe      60000
You can also then convert the results back into CSV by piping into ConvertTo-Csv

    $> csvData
         | ConvertFrom-Csv
         | Select Name, Salary
         | Sort Salary -Descending
         | ConvertTo-Csv
         
       "Name","Salary"
       "Jane Smith","75000"
       "Bob Anderson","71000"
       "Alice Johnson","65000"
       "John Doe","60000"


nushell does too:

    /tmp/> "Name,Department,Salary
    ::: John Doe,IT,60000
    ::: Jane Smith,Finance,75000
    ::: Alice Johnson,HR,65000
    ::: Bob Anderson,IT,71000" |
    :::     from csv | 
    :::     select Name Salary | 
    :::     sort-by -r Salary
    ╭───┬───────────────┬────────╮
    │ # │     Name      │ Salary │
    ├───┼───────────────┼────────┤
    │ 0 │ Jane Smith    │  75000 │
    │ 1 │ Bob Anderson  │  71000 │
    │ 2 │ Alice Johnson │  65000 │
    │ 3 │ John Doe      │  60000 │
    ╰───┴───────────────┴────────╯


for something a bit more robust, check out DuckDB. It's a library you can embed, use it to run SQL on local files as well as connect to databases, do joins, analytics, etc.


Agreed. The article mentioned duckdb and I'm her to thumbs-up the use of DuckDB wholeheartedly. If you like the world of public CSV files as data sources that you can query or cross-query, duckdb is the tool for you. Just follow the demo on the duckdb website and you'll be wow'd for sure.


I use both, and I have found it helpful to have nushell around when munging csv and parquet files, even when working with duckdb - I find it quicker to ask for the first few rows of a thing or do simple stuff with it, then deeper analysis with duckdb.

They're a powerful pairing


Crazy how similar the commands are.


qsv (https://github.com/jqnatividad/qsv) also has a sqlp command which lets you run Polars SQL queries on CSV(s).

Here I'll:

  - Send the csv data from stdin (using echo and referred to in the command by -)
  - Refer to the data in the query by stdin. You may also use the _t_N syntax (first table is _t_1, then _t_2, etc.), or the file name itself before the .csv extension if we were using files.
  - Pipe the output to the table command for formatting.
  - Also, the shape of the result is printed to stderr (the (4, 2) below).

  $ echo 'Name,Department,Salary
    John Doe,IT,60000
    Jane Smith,Finance,75000
    Alice Johnson,HR,65000
    Bob Anderson,IT,71000' |
    qsv sqlp - 'SELECT Name, Salary FROM stdin ORDER BY Salary DESC' |
    qsv table
    
    (4, 2)
    Name           Salary
    Jane Smith     75000
    Bob Anderson   71000
    Alice Johnson  65000
    John Doe       60000


You do realize that this happened also on the article? I.e. In memory and autoinference




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: