errors:
Error Propagation for R Vectors

Iñaki Úcar
Dpto. Ingeniería Telemática
Universidad Carlos III de Madrid

22 de mayo de 2017

Introducción

Definiciones

Fuentes de error: sistemático vs. aleatorio
Medida directa: tensión, corriente
- Error observacional, resolución
- Incertidumbre de la medida
Medida indirecta: potencia = tensión \(\times\) corriente
- Propagación de la incertidumbre

Propagación del error (1)

En general, \(Y = f(X_1, ..., X_n)\)
Primera aproximación: \(X_1, ..., X_n\) normales
Segunda aproximación: linealidad

Propagación del error

Propagación del error (2)

Ley general (a primer orden)

\[\Sigma_Y = J_X \Sigma_X J_X^T\]

Tercera aproximación: independencia

\[\Delta Y = \left[\sum_i \left(\frac{\partial f}{\partial x_i}\right)^2\cdot \left(\Delta x_i\right)^2\right]^{1/2}\]

errors

Motivación: ejemplo práctico

Tensión superficial

\[w^2 = \sqrt{\frac{\sigma}{\rho}}k^3, \quad k = \frac{\pi}{n\lambda}\frac{c_n}{a}\sin\left(\frac{b}{a}\right)\]

\(f\) [Hz]	\(n\)	\(c_n\) [mm]	\(\omega\) [rad/s]	\(\omega^2\)	\(k\) [rad/cm]	\(k^3\)
100(1)	1	11(1)	628(6)	3.95(8)e5	19(2)	6000(2000)
100(1)	2	22(1)	628(6)	3.95(9)e5	19(2)	6000(2000)
110(1)	1	12(1)	691(6)	4.78(8)e5	20.3(9)	8400(900)
110(1)	2	24(1)	691(6)	4.78(9)e5	20.3(9)	8000(1000)

Motivación: resumen

Proceso tedioso
Propenso a errores (\(\leftarrow\) chistaco)
Mostrar resultados es muy laborioso
Idea: tratamiento automático + pretty printing en R

library(errors)

x <- 1:10
errors(x) <- 0.1
x

## errors: 0.1 0.1 0.1 0.1 0.1 ...
##  [1]  1  2  3  4  5  6  7  8  9 10

class(x); attr(x, class(x))

## [1] "errors"

##  [1] 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1

x <- 1:10
(x <- set_errors(x, x * 1/30))

## errors: 0.0333333333333333 0.0666666666666667 0.1 0.133333333333333 0.166666666666667 ...
##  [1]  1  2  3  4  5  6  7  8  9 10

df <- as.data.frame(x); (df$`3x` <- 3*x)

## errors: 0.1 0.2 0.3 0.4 0.5 ...
##  [1]  3  6  9 12 15 18 21 24 27 30

(df$`x^2` <- x^2)

## errors: 0.0666666666666667 0.266666666666667 0.6 1.06666666666667 1.66666666666667 ...
##  [1]   1   4   9  16  25  36  49  64  81 100

(df$`sin(x)` <- sin(x))

## errors: 0.0180100768622713 0.0277431224364762 0.0989992496600445 0.0871524827818149 0.0472770309105377 ...
##  [1]  0.8414710  0.9092974  0.1411200 -0.7568025 -0.9589243 -0.2794155
##  [7]  0.6569866  0.9893582  0.4121185 -0.5440211

(df$`cumsum(x)` <- cumsum(x))

## errors: 0.0333333333333333 0.074535599249993 0.124721912892465 0.182574185835055 0.247206616236522 ...
##  [1]  1  3  6 10 15 21 28 36 45 55

df

##          x      3x     x^2   sin(x) cumsum(x)
## 1  1.00(3)  3.0(1) 1.00(7)  0.84(2)   1.00(3)
## 2  2.00(7)  6.0(2)  4.0(3)  0.91(3)   3.00(7)
## 3   3.0(1)  9.0(3)  9.0(6) 0.14(10)    6.0(1)
## 4   4.0(1) 12.0(4)   16(1) -0.76(9)   10.0(2)
## 5   5.0(2) 15.0(5)   25(2) -0.96(5)   15.0(2)
## 6   6.0(2) 18.0(6)   36(2)  -0.3(2)   21.0(3)
## 7   7.0(2) 21.0(7)   49(3)   0.7(2)   28.0(4)
## 8   8.0(3) 24.0(8)   64(4)  0.99(4)   36.0(5)
## 9   9.0(3) 27.0(9)   81(5)   0.4(3)   45.0(6)
## 10 10.0(3)   30(1)  100(7)  -0.5(3)   55.0(7)

options(errors.notation = "plus-minus")
df

##                x           3x           x^2         sin(x)     cumsum(x)
## 1  1.00 +/- 0.03  3.0 +/- 0.1 1.00 +/- 0.07  0.84 +/- 0.02 1.00 +/- 0.03
## 2  2.00 +/- 0.07  6.0 +/- 0.2   4.0 +/- 0.3  0.91 +/- 0.03 3.00 +/- 0.07
## 3    3.0 +/- 0.1  9.0 +/- 0.3   9.0 +/- 0.6   0.14 +/- 0.1   6.0 +/- 0.1
## 4    4.0 +/- 0.1 12.0 +/- 0.4      16 +/- 1 -0.76 +/- 0.09  10.0 +/- 0.2
## 5    5.0 +/- 0.2 15.0 +/- 0.5      25 +/- 2 -0.96 +/- 0.05  15.0 +/- 0.2
## 6    6.0 +/- 0.2 18.0 +/- 0.6      36 +/- 2   -0.3 +/- 0.2  21.0 +/- 0.3
## 7    7.0 +/- 0.2 21.0 +/- 0.7      49 +/- 3    0.7 +/- 0.2  28.0 +/- 0.4
## 8    8.0 +/- 0.3 24.0 +/- 0.8      64 +/- 4  0.99 +/- 0.04  36.0 +/- 0.5
## 9    9.0 +/- 0.3 27.0 +/- 0.9      81 +/- 5    0.4 +/- 0.3  45.0 +/- 0.6
## 10  10.0 +/- 0.3     30 +/- 1     100 +/- 7   -0.5 +/- 0.3  55.0 +/- 0.7

sum(x)

## 55.0 +/- 0.7

print(mean(x), digits=3)

## 5.500 +/- 0.957

options(errors.notation = "parenthesis")
library(dplyr)

iris_e <- iris %>%
  mutate_at(vars(-Species), funs(set_errors(., .*0.02)))

head(iris_e)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1       5.1(1)     3.50(7)      1.40(3)    0.200(4)  setosa
## 2     4.90(10)     3.00(6)      1.40(3)    0.200(4)  setosa
## 3      4.70(9)     3.20(6)      1.30(3)    0.200(4)  setosa
## 4      4.60(9)     3.10(6)      1.50(3)    0.200(4)  setosa
## 5       5.0(1)     3.60(7)      1.40(3)    0.200(4)  setosa
## 6       5.4(1)     3.90(8)      1.70(3)    0.400(8)  setosa

plot(iris_e$Sepal.Length, iris_e$Sepal.Width, col=iris_e$Species)
legend(6.2, 4.4, unique(iris_e$Species), col=1:length(iris_e$Species), pch=1)

ggplot2::ggplot(iris_e, aes(Sepal.Length, Sepal.Width, color=Species)) + 
  geom_point() +
  geom_errorbar(aes(ymin=errors_min(Sepal.Width), ymax=errors_max(Sepal.Width))) +
  geom_errorbarh(aes(xmin=errors_min(Sepal.Length), xmax=errors_max(Sepal.Length)))

Discusión adicional

`x + x` no es `2*x`

x + x

## errors: 0.0471404520791032 0.0942809041582063 0.14142135623731 0.188561808316413 0.235702260395516 ...
##  [1]  2  4  6  8 10 12 14 16 18 20

2*x

## errors: 0.0666666666666667 0.133333333333333 0.2 0.266666666666667 0.333333333333333 ...
##  [1]  2  4  6  8 10 12 14 16 18 20

Integración

Tipos de datos
- Diseñado para operaciones vectoriales
  - S3 Math, Ops, Summary
  - [, [[, c, rep (¿cbind, rbind?)
- data.frame, tibble ok
- Sin soporte completo para matrix
R en general es agresivo con los atributos
Otras herramientas
- No he probado data.table
- dplyr tiene soporte parcial de atributos
Caso particularmente peliagudo: agregaciones

Media por especie (1)

aggregate(. ~ Species, data = iris_e, mean)

##      Species Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1     setosa        5.006       3.428        1.462       0.246
## 2 versicolor        5.936       2.770        4.260       1.326
## 3  virginica        6.588       2.974        5.552       2.026

tapply(iris_e$Sepal.Length, iris_e$Species, mean, simplify=FALSE)

## $setosa
## 5.0(1)
## 
## $versicolor
## 5.9(1)
## 
## $virginica
## 6.6(1)

Media por especie (2)

by(iris_e, iris_e$Species, function(i) {
  i$Species <- NULL
  do.call(c, lapply(i, mean))
})

## iris_e$Species: setosa
## errors: 0.10012 0.06856 0.02924 0.0149037729779943
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
##        5.006        3.428        1.462        0.246 
## -------------------------------------------------------- 
## iris_e$Species: versicolor
## errors: 0.11872 0.0554 0.0852 0.0279664522058053
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
##        5.936        2.770        4.260        1.326 
## -------------------------------------------------------- 
## iris_e$Species: virginica
## errors: 0.13176 0.05948 0.11104 0.04052
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
##        6.588        2.974        5.552        2.026

Media por especie (3)

by_attr_friendly <- function(df, by, func) {
  data.frame(lapply(by(df, df[by], function(i) {
    i[[by]] <- NULL
    do.call(c, lapply(i, mean))
  }), function(i) i))
}

by_attr_friendly(iris_e, "Species", mean)

##               setosa versicolor virginica
## Sepal.Length  5.0(1)     5.9(1)    6.6(1)
## Sepal.Width  3.43(7)    2.77(6)   2.97(6)
## Petal.Length 1.46(3)    4.26(9)    5.6(1)
## Petal.Width  0.25(1)    1.33(3)   2.03(4)

Media por especie (4)

iris_e_sum <- iris_e %>%
  group_by(Species) %>%
  summarise_at(vars(-Species), mean)

iris_e_sum

## Error in if (digits < 0L) digits <- 6L else {: valor ausente donde TRUE/FALSE es necesario

iris_e_sum$Sepal.Width

## errors: 0.0611466666666667
## [1] 3.428 2.770 2.974

errors(mean(filter(iris_e, Species=="setosa")$Sepal.Width))

## [1] 0.06114667

issue

Media por especie (5)

aggregate_attr_friendly <- function(df, by, func) {
  cols <- colnames(df)[!colnames(df) %in% by]
  new_df <- data.frame(lapply(cols, function(i) {
    do.call(c, lapply(levels(df[[by]]), function(j) {
      func(df[[i]][df[[by]]==j])
    }))
  }))
  colnames(new_df) <- cols
  new_df[[by]] <- levels(df[[by]])
  new_df
}

aggregate_attr_friendly(iris_e, "Species", mean)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1       5.0(1)     3.43(7)      1.46(3)     0.25(1)     setosa
## 2       5.9(1)     2.77(6)      4.26(9)     1.33(3) versicolor
## 3       6.6(1)     2.97(6)       5.6(1)     2.03(4)  virginica

errors: Error Propagation for R Vectors

Iñaki ÚcarDpto. Ingeniería TelemáticaUniversidad Carlos III de Madrid