TL:DR, for the curious only

In 2007, when I began this subprime project, data science as currently understood was in a formative state and its methods and outlook were not universally current in large corporations, such as the Fortune 50 company that I worked for.

In the mortgage backed securities business of which I was a part, the two principal parts of the business, sales and service, were separately managed without the benefit of easily interconneced systems. The primary data and analytic tool on the sell side was the Excel spreadsheet, often stretched to and beyond its limits. As one example, mortgage backed securities have long been famous for highly complex cash flows – payments from mortgages go to different classes of investors based on rules particular to each transaction that are difficult to understand and even more difficult to calculate.

When several mortgage backed securities classes from several differnt mortgage backed securities were placed into a separate security, often called a collateralized debt obligation, calculating who was entitled to what often required a spreadsheet to run overnight to produce an answer. I suspect that no one is confident whether it was the right answer, but I am confident that few involved attempted to replicate the process.

The other side of the process, loan servicing, calculated the interest and principal due each month (plus or minus impoundments for tax and insurance escrows) on each loan in the system of all loans (not just those in a particular mortgage backed security) and through processes probably most comprehensively documented by little yellow stickies on monitors produced a report in spreadsheet form each month that eventually made its way to the corporate system of record. (The small fee made on each loan represented part of a balance sheet asset, which, in the aggregate, represented up to 50% of the net worth of the bank.)

It is understandable that in a hierarchacial organization with multiple semi-autonomous operating units and reporting relationships up to 14 levels deep that the steps required to assemble a comprehensive view of the detail of a small fraction of the loans under management would arise to the level of a project, requiring broad participation of representatives of 20 or more organizational units, a budget, and a schedule measured in quarters.

It’s also understandable, in those days at least, that an IT department that had only recently managed the transformation from a mainframe based system to a modern server-client system might be reluctant to make exceptions. When I asked for the software needed to access the databases in which the information I needed was stored, I was told that it is not on the standard legal department image, no exceptions, [sorry]. (They didn’t actually apologize.)

So ready access to the data I needed wasn’t in reach. I was senior enough (in the top half percent) to have had my manager talk to the general counsel to talk to the CEO to talk to the CIO to talk to her ops SVP to make it happen. I was also seasoned enough that by the time it happened, it would be too late.

The reason that the problem of what went wrong with 2006 mortgage backed securities is that in 2007 we were continuing to issue much the same type of mortgage loans. The federal security laws are strict, and impose two types of sanctions. If you are inaccurate, even in all innocence, buyers are entitled to a full refund. If you are inaccurate and should have know you were inaccurate, some combination of big fines and the possibility of prison beckon.

When sales came to me to report that buyers thought that our 2006 securities were performing worse than others, I had questions they couldn’t answer, for the some of the reasons given above. In particular, I wanted to know

They wanted to know what they could say, and all I could tell them is they couldn’t say anything before knowing the facts.

I decided that being kept from the internal data was a feature, not a bug, even though it cost me a month. It would be possible to say that I was working with no more data than was available to the potential plantiffs, had they looked into and considered the cumulating facts.

And that was the push that set me on the path to data science.