Categories
日常应用

用R做过的最无聊的事

有句话怎么说的来着,当你无聊的时候,就去背英语单词吧。

于是乎,曾经特别无聊,直接自己写了个R程序帮自己背单词。基本就是一个伪装在Rstudio里面的gre单词选择器。大致原理就是,死记硬背。每次显示一个单词和对应的四个选项,然后记录一下选没选对。下一次,自动优先没有选对的词,提高其出现的频率。大致就是一个简单的机器学习模型来预测我对于一个单词可能的出错率。

至于为啥要在r里面做这件事...因为我天天上班用r啊,成功地伪装在rstudio的界面里面,就没有人知道我是在摸鱼还是在正经工作了呢。当然,这都是陈年往事了...现在已经不需要背单词了,而且很多单词死记硬背其实没啥效果,最后不会用还是不会用。阅读量上来的词汇才是真的记住了。

不过死记硬背也大概是某个阶段不可避免的吧。不能读一篇文章一直查单词去了。所以这段代码我准备留着,说不定二十年后自己的孩子还能用到呢?谁知道呢对吧。

截图一张留念吧

RStudio里面背GRE单词
Categories
读书有感

囧事一则

有个面试经历挺好玩的,记录一下。

面某家数据科学家...

面试官:你一般用什么语言?

我:r用的比较多,python也可以。

面试官:你写一下xx算法的实现(某个简单的计算机算法)

我:xxxxx()这个函数?

面试官:你自己写一遍。

我:我记不太住了,当年学过,考完四级就忘了(我还无聊到去考过计算机四级)...我不是学计算机专业的,不太写这种程序。r和python, c不一样,里面函数比较多,大部分可以直接调用(我想说:我很少操作指针这种东西)。我用统计方面的函数比较多。

面试官:所以你们写程序就是调用一下函数?

(结束)

我:....(我不是这个意思....)

唉,无力辩解的忧伤。没法跟cs出身的面试官友好的谈话了。我以后再也不指责那些直接拿各种现成的统计模型往数据上套的“数据”工程师了...人家至少不需要调用函数包,看看模型的伪程序就可以自己写了....

后续:然后我就去刷leetcode了...

后续2:对“数据科学家”(data scientist)这个职位我都有心理阴影了...面一个挂一个,呵呵。

Categories
日常应用

install R on Centos 6

following this thread: http://blogs.helsinki.fi/bioinformatics-viikki/documentation/getting-started-with-r-programming/installingrlatest/#CentOS

Installing the latest R on CentOS:

Add the latest EPEL repository which you can find from here. Don’t forget to add the 64 bit f you are using a 64 bit OS. I have a CentOS release 5.8, 64 bit (Check the Ubuntu installation section of this document if you don’t know your Linux distribution or whether it is 64 or 32 bit ) and I used the following script to add the proper repository:

$ sudo rpm -Uvh http://www.nic.funet.fi/pub/mirrors/fedora.redhat.com/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm

then I got the error

CentOS 6.3 Instance Giving "Cannot retrieve metalink for repository: epel" Error

follow this page: https://community.hpcloud.com/article/centos-63-instance-giving-cannot-retrieve-metalink-repository-epel-error

Walkthrough Steps

Running this command will update the repo to use HTTP rather than HTTPS:

sudo sed -i "s/mirrorlist=https/mirrorlist=http/" /etc/yum.repos.d/epel.repo

You should then be able to update with this command:

yum -y update

then I am able to install R...

$ sudo yum install R

Installed:
  R.x86_64 0:3.1.2-1.el6                                                        

Dependency Installed:
  R-core.x86_64 0:3.1.2-1.el6                                                   
  R-core-devel.x86_64 0:3.1.2-1.el6                                             
  R-devel.x86_64 0:3.1.2-1.el6                                                  
  R-java.x86_64 0:3.1.2-1.el6                                                   
  R-java-devel.x86_64 0:3.1.2-1.el6                                             
  blas.x86_64 0:3.2.1-4.el6                                                     
  blas-devel.x86_64 0:3.2.1-4.el6                                               
  bzip2-devel.x86_64 0:1.0.5-7.el6_0                                            
  cups.x86_64 1:1.4.2-67.el6                                                    
  desktop-file-utils.x86_64 0:0.15-9.el6                                        
  fontconfig-devel.x86_64 0:2.8.0-5.el6                                         
  freetype-devel.x86_64 0:2.3.11-14.el6_3.1                                     
  gcc-gfortran.x86_64 0:4.4.7-11.el6                                            
  ghostscript.x86_64 0:8.70-19.el6                                              
  ghostscript-fonts.noarch 0:5.50-23.2.el6                                      
  java-1.6.0-openjdk.x86_64 1:1.6.0.0-11.1.13.4.el6                             
  java-1.6.0-openjdk-devel.x86_64 1:1.6.0.0-11.1.13.4.el6                       
  jline.noarch 0:0.9.94-0.8.el6                                                 
  kpathsea.x86_64 0:2007-57.el6_2                                               
  lapack.x86_64 0:3.2.1-4.el6                                                   
  lapack-devel.x86_64 0:3.2.1-4.el6                                             
  lcms-libs.x86_64 0:1.19-1.el6                                                 
  libRmath.x86_64 0:3.1.2-1.el6                                                 
  libRmath-devel.x86_64 0:3.1.2-1.el6                                           
  libX11-devel.x86_64 0:1.6.0-2.2.el6                                           
  libXau-devel.x86_64 0:1.0.6-4.el6                                             
  libXft-devel.x86_64 0:2.3.1-2.el6                                             
  libXmu.x86_64 0:1.1.1-2.el6                                                   
  libXrender-devel.x86_64 0:0.9.8-2.1.el6                                       
  libXt.x86_64 0:1.1.4-6.1.el6                                                  
  libgfortran.x86_64 0:4.4.7-11.el6                                             
  libicu.x86_64 0:4.2.1-9.1.el6_2                                               
  libicu-devel.x86_64 0:4.2.1-9.1.el6_2                                         
  libxcb-devel.x86_64 0:1.9.1-2.el6                                             
  netpbm.x86_64 0:10.47.05-11.el6                                               
  netpbm-progs.x86_64 0:10.47.05-11.el6                                         
  openjpeg-libs.x86_64 0:1.3-10.el6_5                                           
  pcre-devel.x86_64 0:7.8-6.el6                                                 
  poppler.x86_64 0:0.12.4-3.el6_0.1                                             
  poppler-data.noarch 0:0.4.0-1.el6                                             
  poppler-utils.x86_64 0:0.12.4-3.el6_0.1                                       
  portreserve.x86_64 0:0.0.4-9.el6                                              
  psutils.x86_64 0:1.17-34.el6                                                  
  rhino.noarch 0:1.7-0.7.r2.2.el6                                               
  tcl.x86_64 1:8.5.7-6.el6                                                      
  tcl-devel.x86_64 1:8.5.7-6.el6                                                
  tex-preview.noarch 0:11.85-10.el6                                             
  texinfo.x86_64 0:4.13a-8.el6                                                  
  texinfo-tex.x86_64 0:4.13a-8.el6                                              
  texlive.x86_64 0:2007-57.el6_2                                                
  texlive-dvips.x86_64 0:2007-57.el6_2                                          
  texlive-latex.x86_64 0:2007-57.el6_2                                          
  texlive-texmf.noarch 0:2007-38.el6                                            
  texlive-texmf-dvips.noarch 0:2007-38.el6                                      
  texlive-texmf-errata.noarch 0:2007-7.1.el6                                    
  texlive-texmf-errata-dvips.noarch 0:2007-7.1.el6                              
  texlive-texmf-errata-fonts.noarch 0:2007-7.1.el6                              
  texlive-texmf-errata-latex.noarch 0:2007-7.1.el6                              
  texlive-texmf-fonts.noarch 0:2007-38.el6                                      
  texlive-texmf-latex.noarch 0:2007-38.el6                                      
  texlive-utils.x86_64 0:2007-57.el6_2                                          
  tk.x86_64 1:8.5.7-5.el6                                                       
  tk-devel.x86_64 1:8.5.7-5.el6                                                 
  tmpwatch.x86_64 0:2.9.16-4.el6                                                
  unzip.x86_64 0:6.0-1.el6                                                      
  urw-fonts.noarch 0:2.4-10.el6                                                 
  xdg-utils.noarch 0:1.0.2-17.20091016cvs.el6                                   
  xorg-x11-proto-devel.noarch 0:7.7-9.el6                                       
  xz-devel.x86_64 0:4.999.9-0.5.beta.20091007git.el6                            

Dependency Updated:
  cpp.x86_64 0:4.4.7-11.el6                                                     
  cups-libs.x86_64 1:1.4.2-67.el6                                               
  gcc.x86_64 0:4.4.7-11.el6                                                     
  gcc-c++.x86_64 0:4.4.7-11.el6                                                 
  libgcc.x86_64 0:4.4.7-11.el6                                                  
  libgomp.x86_64 0:4.4.7-11.el6                                                 
  libstdc++.x86_64 0:4.4.7-11.el6                                               
  libstdc++-devel.x86_64 0:4.4.7-11.el6                                         
  xz-libs.x86_64 0:4.999.9-0.5.beta.20091007git.el6                             

Complete!
Categories
读书有感

R vs Python: data frame和高速数据整理

由于种种的原因,我的feedly里面很多东西很久没看了...今儿抽时间看来一下,貌似是十一月份的热点是dplyr, data.table或者说,data.frame高速操作的各种办法。

http://www.r-bloggers.com/dplyr-and-a-very-basic-benchmark/

这里有有个蛮有意思的比较,抄过来:

base dplyr-df dplyr-dt dplyr-dt-k dt dt-k
Filter筛选 2 1 1 1 1 1
Sort排序 30-60 20-30 1.5-3 [1] 1.5-3 [1]
New column加列 1 1 (6) 4 (6) 4 (4) 1 (4) 1
Aggregation加总 8-100 4-30 4-6 1.5 1.5-5 1
Join合并 >100 4-15 4-6 1.5-2.5 - 1

从base的最基本函数,到dplyr+data.frame, 到dplyr+data.table,到dplyr+data.table+key,挺神奇的...我一直比较依赖的数据整理的包有两个:plyr和data.table,现在终于看到一丝更加有效率的曙光了。顺便作者还和pandas比了一下...这是为了杜绝我多用python的决心么?我一直试图努力的多用一点python,看来越来越不可能了...

pandas data.table
Aggregate 1.5 1
Aggregate (keys/pre-sorted) 0.4 0.2
Join 5.9 -
Join (keys/pre-sorted) 2.1 0.5
Creating keys (sort) 3.7 0.7

话说,谁来进一步搞一下稀疏矩阵啊?我现在对这货比较依赖...

Categories
日常应用

据说是R 2014年最重要的发明...

今儿听Hadley大人做training,才第一次好好去看pipe这个东西...以前有点印象,主要是R会上有人讲过,当时只是记住了一个名词。今儿才有机会好好的去看看去想一想。(吐槽:R有的时候是不是太灵活了...)

pipe的广告语: the pipe operator is one (if not THE) most important innovation introduced, this year, to the R ecosystem. 听起来挺神奇的,好像是从F#那里搬过来的....R果然是耐揉。

短短的历史就是,随着Hadley大人搞定了dplyr,MAGRITTR 这个包开始浮出水面,各种热门...

然后果然COS上有人介绍过,Ren Kun童鞋早已经进一步弄好了一个pipeR包可以玩:http://cos.name/2014/04/use-pipeline-operators-in-r/

然后再去看今年5月份北京R会议的slides...原来这么赞(可是当时我明明在北京呀,当时干嘛去了...总是这么后知后觉)。

然后COS论坛上果然早早就有讨论了,这群geek...

没了,我要好好学习去了,R永远是个学不完的东西啊啊啊啊!三观总是不时被重新颠覆一次,唉。