Categories
读书有感

读大学读什么?

最近一直在想这个问题:花费了那么多时间读书,究竟读了一些什么?

知识这东西,但凡肯花时间,大部分都是能学会的。应付考试什么的就更不是特别难的事情了。

可是成绩单上满满的,都是知识、知识。让人看起来都觉得疲倦。

除了知识,上学的时候还学会了什么?更多是培养性情?养成一颗好奇心,养成探索事物的兴趣,广泛的接纳各个领域的思维冲击。说起来工作了之后,太多东西都是可以现用现学的,没有什么那么困难的。

前段时间在看美国LAC(Liberal Arts College)的教育模式,培养精英的气质。因为有幸接触过一些top LAC出来的精英,确实气质上稍胜一筹。

A "liberal arts" institution can be defined as a "college or university curriculum aimed at imparting broad general knowledge and developing general intellectual capacities, in contrast to a professional, vocational, or technical curriculum."

越往后走,这种积淀的力量越能超越知识课程什么的,支撑着前行。而我的大学,确实缺少这样的时间。被无辜的填了太多鸭,被GPA逼得去竞争分数,缺少了太多太多思考的广度和深度。而那些知识,考过了试,又有多少受用至今?了了。

说回语言。学西班牙语的时候,很多人说,拉丁语系学两门以上,其他的就都很容易了。现在深以为然——计算机语言也是如此。R和Matlab用的熟了,加上C和PHP的一些基础,现在去看Python真的没什么难度。估计去学Java也不会花太多功夫。

我曾经试图说服无数周围的人,数学也是一门语言(统计学不是,它是一种思维方式,可以用多种语言表述),学了那么多公式什么的表达的其实是人们对于逻辑推理的极致追求。看似复杂高深的课程,其实大都还是可以,读书百变、其意自现的。

想到这里就说到这里。是的,我是在有些可惜那些匆匆错过的时光。

Categories
读书有感

python小试

今天非常无聊的决定去试一下python。找了一个题,大意如下:

  • 给定一个输入字符串,找出最漂亮的无重复子字符串。
  • 子字符串:从原字符串中减掉某些字符可得到的。
  • 无重复字符串:没有重复的字符
  • 甲比乙漂亮:甲的长度>乙,或者甲的字典排序在乙之后。

因为都是无重复的,所以肯定不需要甲的长度大于乙,故而是所有长度一样的无重复子字符串中,找出字典排序最大的。

这个先用R写的,为的是写出一个有效的算法来。基本的思路就是强行的逐层递归。

x = 'nlhthgrfdnnlprjtecpdrthigjoqdejsfkasoctjijaoebqlrgaiakfsbljmpibkidjsrtkgrdnqsknbarpabgokbsrfhmeklrle'

x_split = strsplit(x,split="")[[1]]
unique_x = unique(x_split) 
unique_x_order = sort(unique_x,decreasing=T) 
x_remain = character() 

# find the largest character than can be remained

#initialize
current_string = x_split
current_unique = unique_x
current_order = unique_x_order
while ( length(x_remain) < 20) 
{ 
  for(i in 1:length(current_order))
  { character = current_order[i]
    index = which(current_string == character)
    sub_string = current_string[min(index):length(current_string)]  
    if (length(setdiff(unique(current_string),unique(sub_string)))==0) #no lose of characters
    {x_remain = c(x_remain,character);
     current_string = current_string[-c(1:min(index),index)];
     current_unique = unique(current_string);
     current_order = sort(current_unique,decreasing=T);
     break;
    }
  }
}

#answer is 'tsocrpkijgdqnbafhmle'

后面用python重写了一遍。基本就是等价函数的替换...我是不是在暴殄天物的利用python?完全不理解program on the fly的感觉...

x = 'nlhthgrfdnnlprjtecpdrthigjoqdejsfkasoctjijaoebqlrgaiakfsbljmpibkidjsrtkgrdnqsknbarpabgokbsrfhmeklrle';
x_split = list(x);
unique_x = list(set(x_split));
unique_x.sort(reverse=True)
x_remain = list();
###initialize
current_string = x_split;current_unique = unique_x;current_order = unique_x;
while len(x_remain) < len(unique_x):
	for character in current_order:
		index = current_string.index(character);
		sub_string = current_string[index:len(current_string)];
		#print(character);
		if (len(set(current_string)-set(sub_string))==0): #no lose of characters
			x_remain.append(character);
			for i in range(sub_string.count(character)):
				sub_string.remove(character);
			current_string= sub_string;
			current_unique = list(set(current_string));
			current_unique.sort(reverse=True);
			current_order = current_unique;
			break;
print(x_remain);

最后好不容易写完python之后,发现网断了...没法在线提交了。等重新连上,时间已经过了,sigh。就当周末无聊历练一下了。

Categories
读书有感

连续>离散

我只是在试图恢复,所以顺便看点死物。

--------------------废话结束---------------------

我很佩服Andrew Gelman这样一写博客写了那么多年的,还什么都涉及到一些的,无论什么时候读起来都觉得很有收获(希望我是在进步....)。经常能在他那里看到一些“不是很大”却很基本的问题。

刚刚跑code的间隙去扫了一眼这篇Econometrics, political science, epidemiology, etc.: Don’t model the probability of a discrete outcome, model the underlying continuous variable,蛮有意思的。基调就是,如果可以选择连续变量,就不要用那些拆分出来的离散变量了。举了一些例子,baseball的那些我不熟,最后econ的那个自然是吸引眼球的——

Even in recent years, with all the sophistication in economic statistics, you’ll still see people fitting logistic models for binary outcomes even when the continuous variable is readily available. (See, for example, the second-to-last paragraph here, which is actually an economist doing political science, but I’m pretty sure there are lots of examples of this sort of thing in econ too.)

然后又翻回到那篇Estimating the incumbent-party advantage and the incumbency advantage in House elections,略读了一下明白原来Andrew是建议直接预测numbers of votes而不是预测win or lose。否则中间丢失的信息蛮可惜的——

The key is that vote differential is available, and a simply performing a logit model for wins alone is implicitly taking this differential as latent or missing data, thus throwing away information.

此外,有人觉得用binary会变得更加稳健,因为不需要对分布进一步做假设。对此,Andrew的回应和以前看到过的他的另外一篇post相同—— Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses,当你把那么多时间地点的分散信息汇总在一起做回归的时候,就已经在挑战估计量的稳健性了。所以用连续变量,反而允许你在一定程度上更少的混合这些数据就可以得出比较好的估计量。

----------------检讨开始--------------

1. R里面的cut()函数需要慎用。

2. 刚刚还在试图把一个连续变量分成几段呢...默默的把写好的SQL的一堆case when删掉了,sigh。白白的码了那么半天。

Categories
读书有感

Constitutional Law by Yale 听课笔记(二)

随便整理一点东西。

Anti-Federalists and the Federalists

基本上这两派就是对联邦政府和州政府权力应该多大的争议。抄一段总结:

The Anti-Federalists opposed the new U.S. Constitution for numerous reasons.

  • They distrusted large, powerful national governments and believed liberty could only be protected in small republics in which the rulers were closely checked by the public.
  • They believed a large nation could best be governed by a confederation, with local governments having the most control. A strong national government would be distant from the people and not capable of protecting the rights of the citizens. Congress would tax too heavily and the Supreme Court would overrule state courts.
  • They distrusted the president having too much power, including a standing army under his control.
  • They also favored the addition of a Bill of Rights to protect the citizens from the national government. They wanted the House of Representatives increased in size so it would reflect a greater variety of popular interests.
  • The wanted a council created to check the actions of the president.
  • They also favored leaving military affairs in the hands of the state militias.

Federalists favored a strong national government with supreme power over state governments.

  • The rights of citizens would be protected from the government via legislation, the courts, and the Bill of Rights.
  • Federalists distrusted the masses to select the best candidates so they made only the House of Representatives directly elected by the people. Checks and Balances within the Constitution would make sure no one branch became too powerful.
  • The President would have control over the military, necessary for national defense, but could not violate the laws.The Secretary of War would advise the President.
  • The national government needed the power to tax and enforce the laws, or the ills of the Articles would hamper the development, agriculture and industry, of the new nation.

说白了,Anti-Federalists就是希望州政府更加独立,而联邦政府减少对各州的干涉。

Categories
经济、IT观察与思考 读书有感

从网上交易征税争议说起

这几年一直有对网上交易(中小卖家)是否征税的舆论争议,随便一搜新闻,淘宝就是一个箭靶子——

美帝的eBay日子也不好过...

说到这里,就不得不去翻一下美国税法对于销售税的规定。

--------------下段比较罗嗦,不关心细节这可以跳过-----------

这要起源于上世纪98年,克林顿还在的时候,通过的一项《互联网免税法案》,英文原名是Internet Tax Freedom Act。从wiki上抄一下法案的基本内容:

This law bars federal, state and local governments from taxing Internet access and from imposing discriminatory Internet-only taxes such as bit taxes, bandwidth taxes, and email taxes. The law also bars multiple taxes on electronic commerce.

简而言之,就是联邦和地方政府都不得对互联网接入征税,且不得对比特、带宽和电子邮件征税。翻了翻原始法案文件,第720页开始,到后面说了multiple taxes的定义:

IN GENERAL.—The term ‘‘multiple tax’’ means any tax that is imposed by one State or political subdivision thereof on the same or essentially the same electronic commerce that is also subject to another tax imposed by another State or political subdivision thereof (whether or not at the same rate or on the same basis), without a credit (for example, a resale exemption certificate) for taxes paid in other jurisdictions.

简单理解一下(sorry,我不是学法律的,很可能不准),就是多州不得对一项电子商务交易重复征税。2007年的时候,这项法案延续到2014年11月1日(Internet Tax Freedom Act Amendment Act of 2007)。而实践上,大多遵循1992年的一项最高法院的裁决

In Quill Corp. v. North Dakota, the Supreme Court ruled that a business must have a physical presence in a state for that state to require it to collect sales taxes.

-------------罗嗦完毕-------------

也就是说,只要没有实体店,州政府就不能强制征收消费税。有趣的就是2013年,市场公平法案(Marketplace Fairness Act ),主要内容就是对虚拟商店也要征收消费税或者使用税。众议院目前还没表决。

[声明]:下面关于eBay的知识均来源于互联网及其他公开渠道,与本人工作无关,在这里只是陈述。所有结论由文章作者负责,不代表公司观点。

那在eBay上,现在的销售税是怎么征收的呢?

Normally buyer do NOT pay tax on eBay unless the following 3 criteria all meet:

  1. The seller is a Business seller.
  2. The seller has a physical presence in buyer’s shipping address state.
  3. That state charges sales tax.

也就是说,只有从eBay上的在买家所在州拥有实体店的商业卖家那里买东西、且该州征税,那么消费者才需要为此付税。一般的案例就是Macy‘s或者bestbuy这样在eBay上开网店的。所以一般在eBay上买东西的时候,结帐是看不到sales tax这一项的(美国都是价外税,如果有销售税会在账单上写明的)。这么看,线上卖家就比线下卖家多了免付税这个优势(虽然征税是直接针对消费者征收的,但是税负的实际承担者取决于供给和需求曲线的弹性)。直白的讲,如果我在网上买一件东西包邮需要$100,家旁边的店也卖$100,但是我在店里买需要交9%的税(以加州为例),那么如果不急用,我为啥不在网上买呢?

终于铺垫完了背景,现在来看AER 2014年1月刊的一篇paper:

Einav, Liran, et al. "Sales Taxes and Internet Commerce." American Economic Review 104.1 (2014): 1-26.
这篇paper主要就是探讨,当某个州提高消费税率的时候,对实体店和网店的影响是怎么样的。他们用的只是eBay的数据,结论是:
every one percentage point increase in a state's sales tax increases online purchases by state residents by almost 2%,while decreasing their online purchases from state retailers by 3.4%.
也就是说,消费税每上升1%,会导致该州居民网购增加2%、从本地零售商网购减少3.4%(因为需要交税)。下面看一下这个结论是怎么一步步得出的。
首先看一下美国各州的消费税率:
2014-02-12 14_23_01-SalesTaxes(1).pdf - Adobe Reader