Skip to content
Snippets Groups Projects
README.Rmd 6.35 KiB
Newer Older
Bergant's avatar
Bergant committed
---
title: finstr - Financial Statements in R 
output:
  html_document:
    keep_md: yes
---


```{r, echo=FALSE, results='hide', message=FALSE }
library(dplyr)
library(tidyr)
library(finstr)
data(xbrl_data_aapl2013)
data(xbrl_data_aapl2014)
```

**Warning: finstr package is in development. 
Please use with caution.**

The purpose of finstr package is to use financial statements 
data in more structured form and process.
For now it is offering:

1. Data structure for financial statements in tidy and usable format
2. Function to merge two reporting periods in single object
3. Some helper functions to help explore and manipulate the data in the 
structure

The idea in long term is to create an environment for reproducible financial 
statement analysis. With existing packages like XBRL for XBRL parsing, 
dplyr for data manipulation and knitr for reproducible research, this 
shouldn't be a long journey.


## Get data
Use XBRL package or `xbrl_parse_min` function to parse XBRL files. For example:
```{r xbrl_parse_min, eval=FALSE, echo=TRUE}
library(finstr)
# parse XBRL (Apple 10-K report)
xbrl_url2014 <- 
  "http://edgar.sec.gov/Archives/edgar/data/320193/000119312514383437/aapl-20140927.xml"
xbrl_url2013 <- 
  "http://edgar.sec.gov/Archives/edgar/data/320193/000119312513416534/aapl-20130928.xml"
xbrl_data_aapl2014 <- xbrl_parse_min(xbrl_url2014)
xbrl_data_aapl2013 <- xbrl_parse_min(xbrl_url2013)
```

With `xbrl_get_statements` convert XBRL data to *statements* object. 
```{r xbrl_get_statements}
st2013 <- xbrl_get_statements(xbrl_data_aapl2013)
st2014 <- xbrl_get_statements(xbrl_data_aapl2014)
st2014
```

Statements object is a list of 
several statement objects (ballance sheets, income and cash 
flow statements) which are data frames with elements as columns and periods
as rows. 
To get a single *statement* use *statements* object as a regular R list:
```{r statement}
balance_sheet2013 <- st2013$StatementOfFinancialPositionClassified
balance_sheet2014 <- st2014$StatementOfFinancialPositionClassified
balance_sheet2014
```

Only terminal (lowest level) concepts and values are kept in statement
object's columns. 
Information about hierarchical definition of higher order concepts is stored 
as an attribute to the statement object.

To see the calculation hierarchy of elements use `get_relations`: 
```{r relations}
get_relations(balance_sheet2014)
```

To query the fundamental elements from higher order elements use
`get_elements`:
```{r elements}
get_elements(balance_sheet2014, parent_id = "LiabilitiesCurrent", as_data_frame = T)
```

## Merge statements from different periods
Use `merge` function to create single financial statement data from two 
statements. 
```{r merge_statement}
balance_sheet <- merge( balance_sheet2013, balance_sheet2014 )
```

The structure of merged balance sheets may differ because of taxonomy change. 
Function `merge` takes care of structure change by expanding the element 
hierarchy to capture the elements and their relations of both statements. 
The values of any missing elements is set to 0.

To merge all statements from *statements* object use merge on statements objects:
```{r merge_statements}
# merge all statements
st_all <- merge( st2013, st2014 )
# check if statement of income is merged:
balance_sheet <- st_all$StatementOfFinancialPositionClassified
```

## Prepare data with higher order concepts
### Simple example
To get the higher order values in hierarhcy we have to sum the fundamental 
element values. Function `expose` does it for us:
```{r expose1}
library(dplyr)

balance_sheet %>%
  expose("Assets",
         "Liabilities",
         "CommintmentsAndContingencies",
         "StockholdersEquity")
  
```

We could define new names for elements. Let say we would like to see *contingencies* 
and *equity* summed up in the liabilities element:
```{r expose2}
balance_sheet %>%
  expose("Assets",
         Liabilities = c("Liabilities", 
                         "CommintmentsAndContingencies",
                         "StockholdersEquity"))
```

### Using other
Function `other` sums everything not yet covered inside a higher order element.
To split the assets to current and non-current we can define non-current assets
as other assets after we "used" current assets:
```{r other1}
balance_sheet %>%
  expose("AssetsCurrent",
         NonCurrentAssets = other("Assets"),
         Liabilities = other())
```

Note that we used `other` without element definition for the rest of the balance
sheet. In this case `other()` results in sum of everything not already
used.

### Without
Sometimes we need a substraction of concepts. For example:
```{r without1}
balance_sheet %>%
  expose(
    NonCurrentAssets = "Assets" %without% "AssetsCurrent",
    CurrentAssets = other("Assets")
  )
```

It is possible to substract several elements. For example:
```{r without2}
balance_sheet %>%
  expose( 
    TangibleAssets =
      "Assets" %without% c("Goodwill","IntangibleAssetsNetExcludingGoodwill"),
    IntangibleAssets = other("Assets")
  ) 
```


## Calculate new values and ratios
Statement object (in our case `balance_sheet`) is also a data frame object.
With elements (or concepts) as columns and time periods as rows.
It is possible then to use statement as a data frame:

Lets calculate current ratio which is defined by

$$ Current Ratio = \frac{Current Assets}{Current Liabilities} $$

```{r current_ratio}
library(dplyr)

balance_sheet %>%
  expose("AssetsCurrent", "LiabilitiesCurrent") %>%
  mutate(CurrentRatio = AssetsCurrent / LiabilitiesCurrent) %>%
  select(endDate, CurrentRatio)

```

If we need a period average value we can use a `lag` function.
For example, to calculate *DSO* (days sales outstanding) over longer periods
the average of account receivable is compared to net sales.

We will use the formula for yearly statements:

$$ DSO = \frac{Average Accounts Receivable}{Sales Revenue} \times 365 $$

In this case we need to connect two type of statements: balance sheets and
income statements. With matching reporting periods it can be accomplished 
with joining two data frames:

```{r DaysSalesOutstanding}

balance_sheet %>%
  inner_join( st_all$StatementOfIncome, by = "endDate") %>%
  mutate( 
    AccountReceivableLast = lag(AccountsReceivableNetCurrent),
    AccountReceivableAvg = (AccountReceivableLast+AccountsReceivableNetCurrent)/2,
    DaysSalesOutstanding = AccountReceivableAvg / SalesRevenueNet * 365 
  ) %>%
  select(endDate, DaysSalesOutstanding) %>%
  na.omit()

```