Ken taught me that thinking before debugging is extremely important. If you dive into the bug, you tend to fix the local issue in the code, but if you think about the bug first, how the bug came to be, you often find and correct a higher-level problem in the code that will improve the design and prevent further bugs.
爱因斯坦的在专利局工作时候的 side project 是相对论论文,学校门卫波洛克(对,现代艺术甩泥点子那位)的 side project 是画画,Slack 一开始也是游戏公司的 side project 聊天工具。Side project 能真正有效促进一个人的专业进步(深有同感)。不是有句话说么,这年头在湾区没有一两个 side project 都不好意跟人打招呼。
每次面试时候最后面试官都会问 “Do you have any questions?”,然后大多数人都只是问一些隔靴搔痒的问题,这篇文章比较深入的探讨这个问题,怎样让问面试官问题成为你的加分点,哪些问题能反映出这个公司的开发流程和工程师文化(关系到是否值得加入),面试必备的好问 (Victoria 同学翻译成了中文)。
最近看到一个很有意思的面试题:给一个单词和一个字典,找出字典中所有和给定单词编辑距离不大于 k 的词。
一个常见的思路是遍历一遍,求单词和字典中每一项的编辑距离。我们知道编辑距离是二维 DP,时间复杂度为 $O(L^2)$,其中 L 为每个单词平均长度,则总时间复杂度为$O(NL^2)$, N 为字典中词的个数。
这个方法的问题在于,一旦查询单词变多,性能会很糟糕。基于知乎 Lee Shellay的回答,可以通过构造 Trie, 结合 DFS,来解决这个问题。
所以算法思路并不难:
根据字典中的单词构造前缀树,标记每个单词结束时的结束符为 ’$’。
设计函数 API 为check_fuzzy(trie, word, path, tol)。trie是在树中当前走到的节点,word 表示走到当前节点剩余需要处理的查询单词,path表示走到当前节点已经记录的字典单词前缀,tol 表示剩余可容忍的编辑距离。然后定义一个set,不断找到可能的单词并入这个set,直到结束。 所以,函数只在tol 为0时候终止(为什么不是word为空时候终止?因为有可用的编辑距离都用在增加后缀的情况)。
匹配当前字符,有两种情况:匹配,那么直接递归下一层;不匹配,可能是字母不一致或者是 word 已经结束(这个情况很容易被忽略),需要 tol 减一后递归下一层。
增加任意字母(字典单词比查询单词多字母)。这里和知乎回答里的不一样,那里是枚举了26个字母,其实只要枚举当前 tree 的所有节点字母就行了(Jayxon 大牛想到的)。
# Based on Lee Shellay's code http://www.zhihu.com/question/29592463
END = '$' defmake_trie(words): trie = {} for word in words: t = trie for c in word: if c notin t: t[c] = {} t = t[c] t[END] = {} return trie
defcheck_fuzzy_v4(trie, word, path = '', tol = 1): if tol < 0: returnset()
ps = set() if word == '': if END in trie: ps = {path}
for k in trie: # match current or mark as substition ps |= check_fuzzy_v4(trie[k], word[1:], path+k, tol - (not word or k != word[0])) # add random char ps |= check_fuzzy_v4(trie[k], word, path+k, tol-1)
# delete one (if word is empty, word[2:] will not report error) ps |= check_fuzzy_v4(trie, word[1:], path, tol-1) return ps
if __name__ == '__main__': words = ['hello', 'hela', 'hel', 'dokm', 'i', 'ke', 'ik'] t = make_trie(words) print check_fuzzy_v4(t, 'helo','', tol=2)
然后试试大一点的数据。我们知道在/usr/share/dict/words存着拼写检查的单词表,一共 2.4M 共 235886个单词(至少在我的 Mac 上是这么多)。可以用它来构造字典 cat /usr/share/dict/words > ./words.txt。然后把一句话改的乱七八糟,用代码来跑跑试试:
1 2 3 4 5 6 7 8
deftest(): origin = "For you know only a heap of broken images" modified = "Far your knn onlie a deep of borken iimaes"
words_list = [line.strip() for line inopen('words.txt', 'r')] tree = make_trie(words_list) for w in modified.split(): print check_fuzzy_v4(tree, w, tol=2)
结果也挺快的:
CPython: 2.53s user 0.25s system 50% cpu 5.470 total
As the name implies, coroutine refers to co-operative routine. It allows you to suspending and resuming execution at different locations. So, it’s essentially just context switching. Not surprisingly, coroutine is implemented in primitives like setjmp/longjump or ucontext in low level.
In many senarioes, coroutine is a more light-weight alternative of threads. For programming languages with GIL (like Python), coroutine would used to handle concurrency.
Producer and consumer problem
Let’s take a look at classic “producer-consumer” problem. At each time, one coroutine produce products and add them into queue, the other coroutine take products from queue and use it (hmm, sounds like video buffering, right?).
The code below assumes you already have some knowledge of generator.
import time import random defcoroutine(func): # A wrapper to convert function into generator # From David Beazley defstart(*args,**kwargs): cr = func(*args,**kwargs) cr.next() return cr return start defproducer(target): whileTrue: time.sleep(1) data = random.randint(1, 10) print ("# producer: sending data {}".format(data)) target.send(data) @coroutine defconsumer(): whileTrue: data = yield print ("# consumer: receving data {}".format(data)) if __name__ == '__main__': g_consumer = consumer() producer(g_consumer)
Simple enough, send() is a built-in function of generator. The producer send data to consumer, consumer receives data from yield.
Coroutine usage
Yes, the famous concurrency library gevent is based on coroutine.
Many people believe that decorator is one of the obscure concepts in Python. Trust me, it is not. To be short, a decorator is a function that modifies other functions via closures.
They are plenty detailed articles about what decorator it is so there is no need to write one more. If you are not familiar with it, you may want to check these:
In this article, I am going to use a simple but interesting example to show
how to colorize the output in command line
how to implement a switch case in Python (Python does not have built-in switch case)
a decorator with parameters
OK, let’s rock.
colorize your text
There is a set of escape sequences used to change the color of texts. So if we want to colorize a sentence, we just need to put the sentence between the color escape sequence and reset escape sequence. For example:
1 2 3 4 5 6 7
>> ORANGE = '\033[33m' >> RED = '\033[31m' >> GREEN = '\033[32m' >> BLUE = '\033[34m' >> RESET = '\033[0m' >> print ORANGE + "Chinese New Year" + RESET >> print GREEN + "Chinese" + GREEN + "New" + BLUE + "Year" + RESET
You will see Chinese New Year
and Chinese New Year
How to choose the color in code
As I said, Python does not support switch case. So we cannot switch the color name and choose the corresponding escaped sequence. Fortunately, dictionary would do the work.
1 2 3 4 5 6 7 8 9 10 11
defgetColor(color): # @param color: string like "red" or "yellow" # @return the corresponding escape sequence. # If not valid, return empty string return { RED = '\033[31m', GREEN = '\033[32m', BLUE = '\033[34m', # and more ... ORANGE = '\033[33m' }.get(color, "")
The trick is, the dictionary’s built in get method. The first parameter here is key, the second optional parameter is default. As the docstring shows:
D.get(k[,d]) -> D[k] if k in D, else d. d defaults to None.
Use the decorator
Suppose we have a function to implement a task. It may has three kind of return string, if task is completed successfully, it returns “SUCCESS: blah blah …”, if the task is finished but we cannot ensure it’s corretness, it returns “WARNING: blah blah …”, if task failed, returns “ERROR: blah blah …”, how do we colorize these return strings?
@colorize("green", "orange", "red") defdo_task(): # working working .... if (success): return"SUCCESS: Yeah~~" elif (warning): return"WARNING: wait, what?" else: return"ERROR: something went wrong here."
As you can tell, to make the decorator with parameters, we need to put it in another decorator.
Decorator are often used as cache, profiler, logger, synchronization(acquire lock, drop lock) and so forth. One of my favourite library Click is also a wonderful example.