{"id":113,"date":"2014-09-02T19:10:54","date_gmt":"2014-09-02T19:10:54","guid":{"rendered":"http:\/\/www.jamesrising.net\/blog\/?p=113"},"modified":"2014-09-02T19:10:54","modified_gmt":"2014-09-02T19:10:54","slug":"simple-robust-standard-errors-library-for-r","status":"publish","type":"post","link":"http:\/\/www.jamesrising.net\/blog\/?p=113","title":{"rendered":"Simple Robust Standard Errors Library for R"},"content":{"rendered":"<p>R doesn&#8217;t have a built-in method for calculating heteroskedastically-robust and clustered standard errors. Since this is a common need for us, I&#8217;ve put together a little library to provide these functions.<\/p>\n<p>Download the <a href=\"http:\/\/ift.tt\/1x76MHi\" download=\"tools_reg.R\">library<\/a>.<\/p>\n<p>The file contains three functions:<\/p>\n<ul>\n<li><code>get.coef.robust<\/code>: Calculate heteroskedastically-robust standard errors<\/li>\n<li><code>get.coef.clust<\/code>: Calculate clustered standard errors<\/li>\n<li><code>get.conf.clust<\/code>: Calculate confidence intervals for clustered standard errors<\/li>\n<\/ul>\n<p>The arguments for the functions are below, and then an example is at the end.<\/p>\n<p><b>get.coef.robust(mod, var=c(), estimator=&#8221;HC&#8221;)<\/b><br \/>\n<i>Calculate heteroskedastically-robust standard errors<\/i><\/p>\n<pre>mod: the result of a call to lm()\n     e.g., lm(satell ~ width + factor(color))\nvar: a list of indexes to extract only certain coefficients (default: all)\n     e.g., 1:2 [to drop FE from above]\nestimator: an estimator type passed to vcovHC (default: White's estimator)\nReturns an object of type coeftest, with estimate, std. err, t and p values\n<\/pre>\n<p><b>get.coef.clust(mod, cluster, var=c())<\/b><br \/>\n<i>Calculate clustered standard errors<\/i><\/p>\n<pre>mod: the result of a call to lm()\n     e.g., lm(satell ~ width + factor(color))\ncluster: a cluster variable, with a length equal to the observation count\n     e.g., color\nvar: a list of indexes to extract only certain coefficients (default: all)\n     e.g., 1:2 [to drop FE from above]\nReturns an object of type coeftest, with estimate, std. err, t and p values\n<\/pre>\n<p><b>get.conf.clust(mod, cluster, xx, alpha, var=c())<\/b><br \/>\n<i>Calculate confidence intervals for clustered standard errors<\/i><\/p>\n<pre>mod: the result of a call to lm()\n     e.g., lm(satell ~ width + factor(color))\ncluster: a cluster variable, with a length equal to the observation count\n     e.g., color\nxx: new values for each of the variables (only needs to be as long as 'var')\n     e.g., seq(0, 50, length.out=100)\nalpha: the level of confidence, as an error rate\n     e.g., .05 [for 95% confidence intervals]\nvar: a list of indexes to extract only certain coefficients (default: all)\n     e.g., 1:2 [to drop FE from above]\nReturns a data.frame of yhat, lo, and hi\n<\/pre>\n<h3>Example in Stata and R<\/h3>\n<p>The following example compare the results from Stata and R for an example from section 3.3.2 of &#8220;An Introduction to Categorical Data Analysis&#8221; by Alan Agresti.  The data describes 173 female horseshoe crabs, and the number of male &#8220;satellites&#8221; that live near them.<\/p>\n<p>Download the <a href=\"http:\/\/ift.tt\/1x76MHk\" download=\"data.csv\">data as a CSV file<\/a>.  The columns are the crabs color (2-5), spine condition (1-3), carapace width (cm), number of satellite crabs, and weight (kg).<\/p>\n<p>The code below calculates a simple model with crab color fixed-effects, relating width (which is thought to be an explanatory variable) and number of satellites.  The first model is a straight OLS regression; then we add robust errors, and finally crab color clustered errors.<\/p>\n<p>The analysis in <b>Stata<\/b>.<\/p>\n<p>First, let&#8217;s do the analysis in Stata, to make sure that we can reproduce it in R.<\/p>\n<p>import delim data.csv<\/p>\n<p>xi: reg satell width i.color<br \/>\nxi: reg satell width i.color, vce(hc2)<br \/>\nxi: reg satell width i.color, vce(cluster color)<\/p>\n<p>Here are excerpts of the results.<\/p>\n<p>The OLS model:<\/p>\n<pre>\n------------------------------------------------------------------------------\n      satell |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]\n-------------+----------------------------------------------------------------\n       width |   .4633951   .1117381     4.15   0.000     .2428033    .6839868\n       _cons |  -8.412887   3.133074    -2.69   0.008    -14.59816   -2.227618\n------------------------------------------------------------------------------\n<\/pre>\n<p>The robust errors:<\/p>\n<pre>\n------------------------------------------------------------------------------\n             |             Robust HC2\n      satell |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]\n-------------+----------------------------------------------------------------\n       width |   .4633951   .1028225     4.51   0.000     .2604045    .6663856\n       _cons |  -8.412887   2.939015    -2.86   0.005    -14.21505   -2.610726\n------------------------------------------------------------------------------\n<\/pre>\n<p>Note that we&#8217;ve used HC2 robust errors, to provide a direct comparison with the R results.<\/p>\n<p>The clustered errors.<\/p>\n<pre>\n------------------------------------------------------------------------------\n             |               Robust\n      satell |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]\n-------------+----------------------------------------------------------------\n       width |   .4633951   .0466564     9.93   0.002     .3149135    .6118767\n       _cons |  -8.412887   1.258169    -6.69   0.007    -12.41694   -4.408833\n------------------------------------------------------------------------------\n<\/pre>\n<p>The robust and clustered errors are a little tighter for this example.<\/p>\n<p>The analysis in <b>R<\/b>.<\/p>\n<p>Here&#8217;s the equivalent code in R.<\/p>\n<pre class=\"brush: r; collapse: false; title: ; wrap-lines: false; notranslate\">\nsource(&quot;tools_reg.R&quot;)\ntbl &lt;- read.csv(&quot;data.csv&quot;)\n\nmod &lt;- lm(satell ~ width + factor(color), data=tbl)\n\nsummary(mod)\nget.coef.robust(mod, estimator=&quot;HC2&quot;)\nget.coef.clust(mod, tbl$color)\n<\/pre>\n<p>And excerpts of the results.  The OLS model:<\/p>\n<pre>\n               Estimate Std. Error t value Pr(>|t|)    \n(Intercept)     -8.4129     3.1331  -2.685  0.00798 ** \nwidth            0.4634     0.1117   4.147 5.33e-05 ***\n<\/pre>\n<p>The coefficient estimates will be the same for all three models.  Stata gives us <code>-8.412887 + .4633951 w<\/code> and R gives us <code>-8.4129 + 0.4634 w<\/code>.  These are identical, but with different reporting precision, as are the standard errors.<\/p>\n<p>The robust errors:<\/p>\n<pre>\n(Intercept)    -8.41289    2.93902 -2.8625  0.004739 ** \nwidth           0.46340    0.10282  4.5067 1.229e-05 ***\n<\/pre>\n<p>Again, we use HC2 robust errors.  Stata provides 2.939015 and .1028225 for the errors, matching these exactly.<\/p>\n<p>The clustered errors:<\/p>\n<pre>\n                Estimate Std. Error  t value  Pr(>|t|)    \n(Intercept)    -8.412887   1.258169  -6.6866 3.235e-10 ***\nwidth           0.463395   0.046656   9.9321 < 2.2e-16 ***\n<\/pre>\n<p>Stata gives the exact same values as R.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>R doesn&#8217;t have a built-in method for calculating heteroskedastically-robust and clustered standard errors. Since this is a common need for us, I&#8217;ve put together a little library to provide these functions. Download the library. The file contains three functions: get.coef.robust: Calculate heteroskedastically-robust standard errors get.coef.clust: Calculate clustered standard errors get.conf.clust: Calculate confidence intervals for clustered &hellip; <a href=\"http:\/\/www.jamesrising.net\/blog\/?p=113\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Simple Robust Standard Errors Library for R<\/span> <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[10,14,15],"class_list":["post-113","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-ifttt","tag-james-rising","tag-pro"],"_links":{"self":[{"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=\/wp\/v2\/posts\/113","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=113"}],"version-history":[{"count":1,"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=\/wp\/v2\/posts\/113\/revisions"}],"predecessor-version":[{"id":114,"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=\/wp\/v2\/posts\/113\/revisions\/114"}],"wp:attachment":[{"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=113"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=113"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.jamesrising.net\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=113"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}