Predictive Analytics 7.6分
读书笔记 2
甄兰儿

Predictive Analytics – The power to predict who will click, buy, lie, or die (已更新到第二章完结)

承接上文, 第二章后面继续介绍了PA 在其他几个领域内的应用,虽然作者一再表示,”PA in and of itself doesn’t not invade privacy”, 但是后面介绍的应用也不免让人争议他的想法。

其中一个应用HP 如何预测公司内部哪些员工有高离职率。 在介绍这部分内容的时候, HP 所采用的算法原理与Target预测顾客类似, 找出过去两年内离职的员工, 在这些历史数据的基础上定位了工资、升职、工作评比、工作轮班等因素来搭建模型,然后反复的测试, 最后找到最正关的模型。 在这里HP 给每位员工设立了”Flight Risk Score”, 以最后的这个分数来提前预测出,具有某种组合的员工, 离职可能性会偏高。 公司对此类目标提出对策, 比如提高薪金或者提高职位等降低核心员工的离职率。

PA Application : Employee Retention

1. What’s predicated: Which employee will quit.

2. What is done about it: Managers take the predictions for those they supervise into consideration, at their discretion. This is an example of decision support rather than feeding predictions into an automatic decision process.

当然, 这样的预测肯定引发了HP 内部员工的担忧, “What if your score is wrong, unfairly labeling you as disloyal and blemishing your reputation? “ 虽然员工会这样担忧,但是这样的数据对于HP 这样大型的全球化公司太重要了, 特别是针对它的销售人员们, 这样的预测可以大大降低企业在雇佣新员工上的成本。至于对于员工的担心, 作者在这里并没有表态是好或坏。

不过在这个研究里的一些发现还是比较有意思的, 比如说给员工升职并不一定总能留着员工, 特别是对于销售团队和技术团队,往往会有相反的结果。以公司作为整理来说, “升职”的确可以降低员工的离职风险分数, 但是对于销售团队的人来说, 如果“升职”后没有跟随相应的加薪, 那么这些员工往往更容易辞职,除非是职位和薪水相应都有增长。 在研究中,也发现了一些数据上有意义但在实际运行上未必真实的预测, 比如根据模型的结果, 高中学历以下的员工比高学历的员工在企业呆的时间长2.6倍, 以及作为PA 这个部门的分析员们是离职风险最高的一群人。 然而这并不说明高中学历以下的员工更忠诚,PA部门的人不喜欢公司。 这恰恰是因为PA 这类技术在工作市场上需求量太高了,薪水又高,选择性会增多,导致他们的离职分数比较高。

对于这样敏感的数据, HP 也必须认真的处理和对待, 所以被授权的只有3个人能看到每个员工的离职风险分数。并且不能因为员工的FRS 就对员工有所区别对待,只能是让管理层对离职风险高的人群多加关注, 并提前改变文化或者政策“

“Over time, the Flight Risk reports sway management decisions in a productive direction. They serve as early waring signals that guide management in planning around loss of staff when it can’t be avoided, and working to keep key employees where possible. The system informs what factors drive employee attention, empowering managers to develop more robust strategies to retain their staffs in order to reduce costs and maintain business continuity. “

到目前为止,HP已经通过这样的预测让销售员工流失率从20%以上,降到了15%以下。

作者在这里的笔墨如此之多,也是为了引出下一个PA 的应用,同样的预测原理,如果我们不是用来预测员工离职风险,但是对于高犯罪的地区来进行预测呢?

PA Application: Crime Prediction

1. What’s predicated: The location of future crime.

2. What’s done about it: Police patrol the area.

通过同类型预测, 在2009年的时候已经帮助美国几个城市降低了31% 的犯罪率, 因为在犯罪率预测的高发地区, 警察局会增加警车的巡逻,就降低了那个地区的犯罪可能性。同样的,不仅仅是针对地区,也可以换成是预测某些特殊的日子,(比如工资日、节假日)或者特定的天气等, 都可以做不同的预测来调配井队的巡逻。

这种针对犯罪的预测, 不仅仅只针对罪犯个体, 比如在Oregon, PA 预测在给犯人量刑的时候就会为法官提供同类型犯罪行为的两个个体,哪人再次犯罪的几率比较高。重犯几率高的那个人有可能会因此被判更高的刑期。

当然, 这就又让人怀疑会不会使用这样预测,法官有新的偏见产生,与此同时, 对于法官是否误判的预测又从侧面制约了法官的执法。 这些新的矛盾和挑战也为未来数据化的执法时代提出了一些新的道德问题:

1. Does the application of PA for law enforcement fly in the face of the very notion of judging a person as an individual? Is it unfair to predict a person’s risk of bad behavior based on what other people - who share certain characteristics with that person – have done? Or isn’t the prediction by a human of one’s future crimes also intrinsically based only on prior observations of others, since humans learn from experience as well?

2. A crime risk model dehumanizes the prior offender by pairing him or her down to the extremely limited view captured by a small number of characteristics . But, if the integration of PA promises to lower the crime rate, is this within the acceptable realm of compromises to civil liberties that convicts endure?

3. With these efforts under way, should not at least as much effort go into leveraging PA to improve offender rehabilitation; for example, by targeting those with the highest risk of recidivism?

在本章的最后, 作者还是强调了PA 这种工具的强大性和其带来的责任的重要性。

“PA is an important, blossoming science. Foretelling your future behavior and revealing your intensions, it is an extremely powerful too – and one with significant potential for misuse. It is got to be managed with extreme care. The agreement we collectively come to for PA’s position in the world is central to the massive cultural shifts we face as we fully enter and embrace the information age. “

在本章非主体的部分, 作者也提到了PA 在各大财务机构和NSA 对于个人基本信息的使用, 从而达到预测fraud 和犯罪的可能性。 原理都与上面举得例子差不多,有兴趣的朋友可以去仔细看下书这部分,但在这里由于篇幅有限不展开了。

---------------------------------

Introduction and Chapter 1 Liftoff! Prediction Takes Action

开篇介绍以及第一章读书总结

作者先提了predictive analysis 在很多领域内的应用, 然后抛出几个问题来以表明以下问题都可以用Predictive Analysis 来解决:

Will the patient’s outcome from surgery be positive? Will the credit applicant turn out to be a fraudster? Will the homeowner face a bad mortgage? Will the airfare go down? Will the customer respond if mailed a brochure? By predicting these things, it is possible to fortify healthcare, combat risk, conquer spam, toughen crime fighting, boost sales, and cut costs.

在Introduction这个章节, 作者也指出了Predictive Analysis 与Forecasting 的根本区别:

Predictive analysis (PA) –Technology that learns from experience (data) to predict the future behavior of individuals in order to drive better decisions. It drives how organizations treat and serve an individual across the frontline operations that define a functional society.

Forecasting makes aggregate predictions on a macroscopic level. How will the economy fare? Which presidential candidate will win more votes in Ohio? Whereas forecasting estimates the total number of ice cream cones to be purchased next month in Nebraska, PA tells you which individual Nebraskans are most likely to be seen with cone in hand.

PA 最近几年得到飞跃性的发展也是因为一些外界因素引起的:

· Wildly increasing loads of data;

· Cultural shifts as organizations learn to appreciate, embrace, and integrate predictive technology

· Improved software solutions to deliver PA to organizations.

书里逐渐发展出, 如果想要自己建立PA, 那么两个元素不能缺少:

1) What’s predicted: the kind of behavior (i.e., action, event, or happening) to predict for each individual, stock, or other kind of element

2) What’s done about it: the decisions driven by prediction; the action taken by the organization in response to or informed by each prediction

书中用了几个故事和例子来说明了PA在三个领域内如何应用这两个元素:

领域1. PA APPLICATION: TARGETING DIRECT MARKETING

1. What’s predicted: Which customers will respond to marketing contact.

2. What’s done about it: Contact customers more likely to respond

这里书里用Google Ad和FB作为最直观的例子说明了Prediction Effect, 为什么一个好的PA 模型可以让公司打开销路,为什么一个不好的PA model会让公司损失一半以上的广告费用, 重点是还不知道是哪一半。

领域2. PA APPLICATION: PREDICTIVE ADVERTISEMENT TARGETING

1. What’s predicted: Which ad each customer is most likely to click.

2. What’s done about it: Display the best ad (based on the likelihood of a click as well as the bounty paid by its sponsor).

这部分书里用几个简单的例子解释了Predictive Model的定义:Predictive model—a mechanism that predicts a behavior of an individual, such as click, buy, lie, or die. It takes characteristics of the individual as input and provides a predictive score as output. The higher the score, the more likely it is that the individual will exhibit the predicted behavior.

书中也引出了“Machine Learning”的概念,利用学校招生广告作为一个例子, 说明了一整个machine learning系列的构建。 通过用以往申请的学生的信息,以做出predictive model,然后采用predictive score比较高的模型, 来向目标人群投放广告,这样增加了回响的准确率。

领域3. PA APPLICATION: BLACK-BOX TRADING

1. What’s predicted: Whether a stock will go up or down.

2. What’s done about it: Buy stocks that will go up; sell those that will go down.

这部分讲了一位叫 John的PHD,用PA 做了一个预测股市的模型, 并且把全部的养老金都拿出来试验是否有效的故事。 一开始他的模型并不准确, 以至于买他模型的financial investment 公司都要起诉他了,之后他经过很多调整以及Debug的过程才真正找出影响模型的因素, 经过调整的模型最终让他每年收入翻倍成长, 并且当上了人生赢家,之后还成立了Elder Research, 北美最大的预测服务公司。 作者也是在经由这个故事简单的说明, 在正式把模型投入现实中前,可以利用历史数据做些随机测试,如果出来的结果与历史的真实情况符合,那么这个模型才能是作为有效的模型。

之后的几章, 作者会从道德、海量数据的应用、模型的应用、公司的数据战略、人工智能等方面在商业、政治、公共管理等方面的应用。

----------------------------------

Predictive Analytics – The power to predict who will click, buy, lie, or die (已更新到第二章,第一部分)

Chapter 2 With Power Comes Responsibility

第二章概括

The media firestorm invoked misleading accusations, fear of corporate power, postulations by television personalities, and, of course, predictive analytics (PA). Target's and Hewlett-Packard's (HP's) predictive power brings to focus an exceptionally challenging and pressing ethical question. Within the minefield that is the privacy debate, the stakes just rose even higher. Retaining employees is core to protecting any organization. PA has taken on an enormous crime wave. It is central to tackling fraud and promises to bolster street-level policing as well. In these efforts, PA's power optimizes the assignment of resources. Its predictions dictate how enforcers spend their time, which transactions auditors search for fraud and which street corners cops search for crime. PA sometimes makes wrong predictions but often proves to be less wrong than people. Bringing PA in to support decision making means introducing a new type of bias, a new fallibility, to balance against that of a person.

本章虽然作者讲述的是Predictive Analysis 在几个方面应用时引发的争论和冲突, 但实际上还是在重点陈述PA 在这些方面的应用带来的效益。 重点讨论的几个冲突点包括了PA 在帮助Target 做市场预测、帮助HP 做雇员离职预测、帮政府以及财务机构做诈骗的预测时引发的争论。

以下就这几个冲突点做些简单的扩充:

(一)PA Application: Pregnancy Prediction

1. What’s predicted: Which female customers will have a baby in coming months

2. What’s done about it: Market relevant offers for soon –to-be parents of newborns.

这里作者用的是非常著名的案例, Target 在几年前曾经给一个未成年少女寄了婴儿产品的coupon, 虽然少女的父亲非常的愤怒, 但后来证实了少女已经怀孕的事实。 这当然引发了媒体的讨伐对于这种大公司的power的忌惮和攻击。 那么到底是Target 破坏了顾客的隐私还是有其他的superpower 来达到这样的效果呢?

“Target pulled together training data by merging the baby registry data with other retail customer data and generated a “fairly accurate” predictive model. The store can now apply the model to the customers who have not registered as pregnant. This identifies many more pregnant customers, since we can assume most such customers in fact do not register.” Eventually, “ the model identified 30% more customers for Target to contact with pregnancy- oriented marketing material, which is a significant marketing success story.”

当然作者在这里并不是为了洗白target, 而是想说明, 大公司通过已知的客户的群体的信息,可以设置模型来预测新的数据, 之前的历史信息越多,或者数据用户群体越大, 建立的模型就可以越达到公司期望的目标。 这也证明了数据的能量巨大。

数据之所以吸引大公司以及很多机构的另一个原因, 是因为它的廉价性 “a user’s data can be purchased for about half a cent, but the average user’s value to the Internet advertising ecosystem is estimated at $1,200 per year.” 并且信息一旦存在, 就很难被抹去, 特别是在数字时代, 总会以各种形式被保存着。那么对于大型企业以及组织, 如何制定针对数据的政策,就成为了挑战之一。

这也就需要企业在考虑数据的时候需要以下基本的几个方面(who, what, where, when, how long, and why):

Retain- what is stored and for how long

Access – Which employees, types of personnel, or group members may retrieve and look at which data elements.

Share – What data may be disseminated to which parties within the organization, and to what external organizations

Merge – What data may be joined together, aggregated, or connected.

React – How many each data element be acted upon, determining an organization’s response, outreach, or other behavior.

再复杂一些,甚至可以延伸到以下的问题:

· Which data policies can and should be established via legislation, and which by industry best practices, and rules of etiquette?

· How are policies enforced: What security standards – encryption, password integrity, firewalls, and the like – promise to earn Fornt Knox’s reputation in the electronic realm?

作者倡导大企业和公众们一起思考这些数据相关的问题。 也只有当大众和企业在面对数据的思考站在同样的角度的时候, 才会真正的建立起对于数据真正有意义的立法和政策。

未完待续, 第二部分讲的是与HP 如何预测公司内部哪些员工有高离职几率的预测所引发的争论。

(二) PA Application : Employee Retention

1. What’s predicated: Which employee will quit.

2. What is done about it: Managers take the predictions for those they supervise into consideration, at their discretion. This is an example of decision support rather than feeding predictions into an automatic decision process.

0
《Predictive Analytics》的全部笔记 1篇
豆瓣
我们的精神角落
免费下载 iOS / Android 版客户端