1、你在用哪些机器学习技术,是研究层次的,还是生产层次的?
“What ML techniques do you work with? / Are these research level or production level techniques?”[来源1]
2、请告诉我一项你曾全程参与的项目,包括项目名称,所解决的问题及其解决方案和项目最终结果。
“Tell me about an in-depth example of projects you have worked on from inception to completion. What was the project, how did you approach the problem, what was the end result etc.”?[来源1]
3、你最喜欢的算法是什么
“What’s your favorite algorithm?”[来源1]
4、你[编程语言]能力达到什么级别?你通常用[编程语言]做什么?以及你遇到过最难的挑战是什么?
“What level of experience do you have with [programming language]? What do you do daily with [programming language] and what was your hardest challenges with this?”[来源1]
5、你处理过最大的数据集是什么?你是如何处理的,最终结果怎么样?
“What is the largest data set that you have processed? How did you approach this, and what was the end result?”[来源1]
6、如果让你向一名业务主管解释“线性回归”,你会如何解释?
How would you explain a linear regression to a business executive?[来源2]
7、线性回归的一些替代模型有哪些?这些替代模型的优缺点是什么?
What are some alternative models to a linear regression? Why are they better or worse?[来源2]
8、(基于以下关系表,)请编写SQL查询语句,创建对应关系表,并计算出每个班的最高成绩(Grade)。
Write a SQL query to create a table that shows, for each class, the value of the highest grade in the class.[来源2]

9、基于上表,我想计算出每个班得分最高的同学的姓名,请写出SQL查询语句。
Suppose I had the same table as the previous question, but instead for each class I want to find the name of the student who got the highest grade. Write a query to do that.[来源2]
10、用伪代码或任何您想用的编程语言编写一个程序,要求如下:1)输出数字从1到100;2)遇到3的倍数、5的倍数以及3和5的公倍数,分别用“Fizz”和“Buzz”和“FizzBuzz”代替。
In pseudo-code or whatever language you would like: write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.[来源2]
11、一家公司正在出售Microsoft Office的竞争对手的产品,该公司正在通过发送两套不同的电子邮件方案来测试自己的营销策略。其中,一种方案涉及与业务相关的内容,另一种方案涉及与消费者相关的内容。以下是关于两种电子邮件的一系列图表。最下面的两张图与前两张的数据相同,是根据客户在发送电子邮件前一年在公司消费的金额计算得出的数据。请问,哪种方式效果更好?
A company selling a competitor to Microsoft Office is testing their marketing by sending out two different sets of emails. One set contains business related content, and one contains consumer related content. We are interested in how each campaign performed; did one do at getting people to click-through? Below is a selection of graphs on the two email campaigns. The bottom two graphs have the same data as the top two, only bucketed by the amount the customer has spent with the company the year before the emails were sent. Which campaign did better?[来源2]

12、什么是正则化?有什么用?
Explain what regularization is and why it is useful[来源3]
13、你最喜欢的数据科学家以及创业公司有哪些?
Which data scientists do you admire most? which startups?[来源3]
14、您将如何检验一个基于多元回归的预测模型的有效性?
How would you validate a model you created to generate a pre dictive model of a quantitative outcome variable using multiple regression.[来源3]
15、解释什么是精确率和召回率。它们与ROC曲线的关系?
Explain what precision and recall are. How do they relate to the ROC curve?[来源3]
16、你怎样证明你对算法的改进确实比不改进有用?
How can you prove that one improvement you've brought to an algorithm is really an improvement over not doing anything?[来源3]
17、什么是根因分析(root cause analysis)?
What is root cause analysis?[来源3]
18、您是否熟悉价格优化,价格弹性,库存管理,竞争情报?举例说明。
Are you familiar with price optimization, price elasticity, inventory management, competitive intelligence? Give examples.[来源3]
19、 什么是统计功效?
What is statistical power?[来源3]
20、解释什么是“重采样”方法,并揭示它们为什么有用?说明其局限性。
Explain what resampling methods are and why they are useful. Also explain their limitations.[来源3]