Categories
读书有感

python小试

今天非常无聊的决定去试一下python。找了一个题,大意如下:

  • 给定一个输入字符串,找出最漂亮的无重复子字符串。
  • 子字符串:从原字符串中减掉某些字符可得到的。
  • 无重复字符串:没有重复的字符
  • 甲比乙漂亮:甲的长度>乙,或者甲的字典排序在乙之后。

因为都是无重复的,所以肯定不需要甲的长度大于乙,故而是所有长度一样的无重复子字符串中,找出字典排序最大的。

这个先用R写的,为的是写出一个有效的算法来。基本的思路就是强行的逐层递归。

x = 'nlhthgrfdnnlprjtecpdrthigjoqdejsfkasoctjijaoebqlrgaiakfsbljmpibkidjsrtkgrdnqsknbarpabgokbsrfhmeklrle'

x_split = strsplit(x,split="")[[1]]
unique_x = unique(x_split) 
unique_x_order = sort(unique_x,decreasing=T) 
x_remain = character() 

# find the largest character than can be remained

#initialize
current_string = x_split
current_unique = unique_x
current_order = unique_x_order
while ( length(x_remain) < 20) 
{ 
  for(i in 1:length(current_order))
  { character = current_order[i]
    index = which(current_string == character)
    sub_string = current_string[min(index):length(current_string)]  
    if (length(setdiff(unique(current_string),unique(sub_string)))==0) #no lose of characters
    {x_remain = c(x_remain,character);
     current_string = current_string[-c(1:min(index),index)];
     current_unique = unique(current_string);
     current_order = sort(current_unique,decreasing=T);
     break;
    }
  }
}

#answer is 'tsocrpkijgdqnbafhmle'

后面用python重写了一遍。基本就是等价函数的替换...我是不是在暴殄天物的利用python?完全不理解program on the fly的感觉...

x = 'nlhthgrfdnnlprjtecpdrthigjoqdejsfkasoctjijaoebqlrgaiakfsbljmpibkidjsrtkgrdnqsknbarpabgokbsrfhmeklrle';
x_split = list(x);
unique_x = list(set(x_split));
unique_x.sort(reverse=True)
x_remain = list();
###initialize
current_string = x_split;current_unique = unique_x;current_order = unique_x;
while len(x_remain) < len(unique_x):
	for character in current_order:
		index = current_string.index(character);
		sub_string = current_string[index:len(current_string)];
		#print(character);
		if (len(set(current_string)-set(sub_string))==0): #no lose of characters
			x_remain.append(character);
			for i in range(sub_string.count(character)):
				sub_string.remove(character);
			current_string= sub_string;
			current_unique = list(set(current_string));
			current_unique.sort(reverse=True);
			current_order = current_unique;
			break;
print(x_remain);

最后好不容易写完python之后,发现网断了...没法在线提交了。等重新连上,时间已经过了,sigh。就当周末无聊历练一下了。

Categories
我的生活状态

美好的时光是忙碌中有闲暇可打发

人生经历过或快或慢的一些节奏。读master的那年还是比较忙碌的,很多deadline赶很多paper要写,但也觉得蛮开心的。一方面是做着自己喜欢做的事情,可以专心的泡在图书馆里一个人傻傻的兴奋或发狂;另一方面,也是巴塞罗那这座城市给了人一些生活的情调。每月一日的免费博物馆日,骑车十分钟可到的音乐厅画廊,随便走走小巷子里面的手工艺品店,还有穿梭在居民区内的甜点和酒窖珍藏。和同学们研究一下分布在校园不同角落的自动咖啡机哪个更好喝,周末顺便去海滩上懒懒的晒太阳。对的,这样的调剂让人不觉得忙碌是可怕的。略有闲暇,无数的方案可以打发。

工作的第一年经常很忙,忙到记忆都有些中断。然而那样的忙碌却单单是忙碌,因为一旦闲下来只想窝在家里睡过去。完全没有精力调节生活的情调。上海的艺术氛围远不如它金碧辉煌的购物广场,走来走去的文艺青年捧本线装书就在地铁上傲然。太不随性,太过于符号化。在纸醉金迷的城市里,看不到年轻人的美好梦想。走进过一家又一家咖啡馆,只觉得小资书吧和星巴克连锁也没有什么气质上的两样。我嗅不到自由的味道。对我这种物质欲尚远不如口腹之欲的人来说,在这个城市活得多少有些麻木和迷茫了。我需要新的目标,来忙碌,来折腾。

死得轰轰烈烈,也比活得行尸走肉强。期待,破茧。
7-31-12-spongilla-fly-cocoon-img_7739图片来自naturallycuriouswithmaryholland

Categories
读书有感

连续>离散

我只是在试图恢复,所以顺便看点死物。

--------------------废话结束---------------------

我很佩服Andrew Gelman这样一写博客写了那么多年的,还什么都涉及到一些的,无论什么时候读起来都觉得很有收获(希望我是在进步....)。经常能在他那里看到一些“不是很大”却很基本的问题。

刚刚跑code的间隙去扫了一眼这篇Econometrics, political science, epidemiology, etc.: Don’t model the probability of a discrete outcome, model the underlying continuous variable,蛮有意思的。基调就是,如果可以选择连续变量,就不要用那些拆分出来的离散变量了。举了一些例子,baseball的那些我不熟,最后econ的那个自然是吸引眼球的——

Even in recent years, with all the sophistication in economic statistics, you’ll still see people fitting logistic models for binary outcomes even when the continuous variable is readily available. (See, for example, the second-to-last paragraph here, which is actually an economist doing political science, but I’m pretty sure there are lots of examples of this sort of thing in econ too.)

然后又翻回到那篇Estimating the incumbent-party advantage and the incumbency advantage in House elections,略读了一下明白原来Andrew是建议直接预测numbers of votes而不是预测win or lose。否则中间丢失的信息蛮可惜的——

The key is that vote differential is available, and a simply performing a logit model for wins alone is implicitly taking this differential as latent or missing data, thus throwing away information.

此外,有人觉得用binary会变得更加稳健,因为不需要对分布进一步做假设。对此,Andrew的回应和以前看到过的他的另外一篇post相同—— Everyone’s trading bias for variance at some point, it’s just done at different places in the analyses,当你把那么多时间地点的分散信息汇总在一起做回归的时候,就已经在挑战估计量的稳健性了。所以用连续变量,反而允许你在一定程度上更少的混合这些数据就可以得出比较好的估计量。

----------------检讨开始--------------

1. R里面的cut()函数需要慎用。

2. 刚刚还在试图把一个连续变量分成几段呢...默默的把写好的SQL的一堆case when删掉了,sigh。白白的码了那么半天。

Categories
读书有感

Constitutional Law by Yale 听课笔记(二)

随便整理一点东西。

Anti-Federalists and the Federalists

基本上这两派就是对联邦政府和州政府权力应该多大的争议。抄一段总结:

The Anti-Federalists opposed the new U.S. Constitution for numerous reasons.

  • They distrusted large, powerful national governments and believed liberty could only be protected in small republics in which the rulers were closely checked by the public.
  • They believed a large nation could best be governed by a confederation, with local governments having the most control. A strong national government would be distant from the people and not capable of protecting the rights of the citizens. Congress would tax too heavily and the Supreme Court would overrule state courts.
  • They distrusted the president having too much power, including a standing army under his control.
  • They also favored the addition of a Bill of Rights to protect the citizens from the national government. They wanted the House of Representatives increased in size so it would reflect a greater variety of popular interests.
  • The wanted a council created to check the actions of the president.
  • They also favored leaving military affairs in the hands of the state militias.

Federalists favored a strong national government with supreme power over state governments.

  • The rights of citizens would be protected from the government via legislation, the courts, and the Bill of Rights.
  • Federalists distrusted the masses to select the best candidates so they made only the House of Representatives directly elected by the people. Checks and Balances within the Constitution would make sure no one branch became too powerful.
  • The President would have control over the military, necessary for national defense, but could not violate the laws.The Secretary of War would advise the President.
  • The national government needed the power to tax and enforce the laws, or the ills of the Articles would hamper the development, agriculture and industry, of the new nation.

说白了,Anti-Federalists就是希望州政府更加独立,而联邦政府减少对各州的干涉。

Categories
经济、IT观察与思考 读书有感

从网上交易征税争议说起

这几年一直有对网上交易(中小卖家)是否征税的舆论争议,随便一搜新闻,淘宝就是一个箭靶子——

美帝的eBay日子也不好过...

说到这里,就不得不去翻一下美国税法对于销售税的规定。

--------------下段比较罗嗦,不关心细节这可以跳过-----------

这要起源于上世纪98年,克林顿还在的时候,通过的一项《互联网免税法案》,英文原名是Internet Tax Freedom Act。从wiki上抄一下法案的基本内容:

This law bars federal, state and local governments from taxing Internet access and from imposing discriminatory Internet-only taxes such as bit taxes, bandwidth taxes, and email taxes. The law also bars multiple taxes on electronic commerce.

简而言之,就是联邦和地方政府都不得对互联网接入征税,且不得对比特、带宽和电子邮件征税。翻了翻原始法案文件,第720页开始,到后面说了multiple taxes的定义:

IN GENERAL.—The term ‘‘multiple tax’’ means any tax that is imposed by one State or political subdivision thereof on the same or essentially the same electronic commerce that is also subject to another tax imposed by another State or political subdivision thereof (whether or not at the same rate or on the same basis), without a credit (for example, a resale exemption certificate) for taxes paid in other jurisdictions.

简单理解一下(sorry,我不是学法律的,很可能不准),就是多州不得对一项电子商务交易重复征税。2007年的时候,这项法案延续到2014年11月1日(Internet Tax Freedom Act Amendment Act of 2007)。而实践上,大多遵循1992年的一项最高法院的裁决

In Quill Corp. v. North Dakota, the Supreme Court ruled that a business must have a physical presence in a state for that state to require it to collect sales taxes.

-------------罗嗦完毕-------------

也就是说,只要没有实体店,州政府就不能强制征收消费税。有趣的就是2013年,市场公平法案(Marketplace Fairness Act ),主要内容就是对虚拟商店也要征收消费税或者使用税。众议院目前还没表决。

[声明]:下面关于eBay的知识均来源于互联网及其他公开渠道,与本人工作无关,在这里只是陈述。所有结论由文章作者负责,不代表公司观点。

那在eBay上,现在的销售税是怎么征收的呢?

Normally buyer do NOT pay tax on eBay unless the following 3 criteria all meet:

  1. The seller is a Business seller.
  2. The seller has a physical presence in buyer’s shipping address state.
  3. That state charges sales tax.

也就是说,只有从eBay上的在买家所在州拥有实体店的商业卖家那里买东西、且该州征税,那么消费者才需要为此付税。一般的案例就是Macy‘s或者bestbuy这样在eBay上开网店的。所以一般在eBay上买东西的时候,结帐是看不到sales tax这一项的(美国都是价外税,如果有销售税会在账单上写明的)。这么看,线上卖家就比线下卖家多了免付税这个优势(虽然征税是直接针对消费者征收的,但是税负的实际承担者取决于供给和需求曲线的弹性)。直白的讲,如果我在网上买一件东西包邮需要$100,家旁边的店也卖$100,但是我在店里买需要交9%的税(以加州为例),那么如果不急用,我为啥不在网上买呢?

终于铺垫完了背景,现在来看AER 2014年1月刊的一篇paper:

Einav, Liran, et al. "Sales Taxes and Internet Commerce." American Economic Review 104.1 (2014): 1-26.
这篇paper主要就是探讨,当某个州提高消费税率的时候,对实体店和网店的影响是怎么样的。他们用的只是eBay的数据,结论是:
every one percentage point increase in a state's sales tax increases online purchases by state residents by almost 2%,while decreasing their online purchases from state retailers by 3.4%.
也就是说,消费税每上升1%,会导致该州居民网购增加2%、从本地零售商网购减少3.4%(因为需要交税)。下面看一下这个结论是怎么一步步得出的。
首先看一下美国各州的消费税率:
2014-02-12 14_23_01-SalesTaxes(1).pdf - Adobe Reader