---
title: "Getting started with grumpy"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with grumpy}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(grumpy)
```


## Motivation

This package allows users to read a wide variety of `.npy` and `.npz` files in R. These file formats are commonly used in Python for storing NumPy arrays and compressed archives of arrays, respectively. By providing a convenient interface for reading these files, `grumpy` enables R users to easily access and work with data that has been saved in these formats, facilitating interoperability between Python and R.

We envision users may want to perform some steps of their data analysis in Python and others in R.

It is thus important to be able to read and write files in both languages.

Note however we would usually push users towards more advanced and performant formats such as Zarr for large datasets. Zarr datasets are supported for example by the `{Rarr}` Bioconductor package.

## Using grumpy

Most users are expected to mostly want to use `grumpy::read_npy()` and `grumpy::read_npz()` to read `.npy` and `.npz` files, respectively. These functions will return R objects that are equivalent to the original NumPy arrays, allowing users to easily manipulate and analyze the data in R.

```{r}
read_npy(system.file("extdata", "test_2d.npy", package = "grumpy"))
```

### Structured datatypes

One notable example are structured datatypes, where each element of the array is a record with named fields.
To keep the output consistent and as conceptually close as possible to the original NumPy array, `grumpy` returns a list of list, with a `dim()` attribute to preserve the original shape of the array.

It behaves like a standard R array, but each element is a list of the fields of the original structured datatype.

Note that in many cases, this is not efficient for any downstream analysis, and users may want to convert the output to a more standard R data structure such as a `data.frame` or `data.table` for easier manipulation.



