PRISM > What I learned building a database of 3,045,406 real estate professionals for audience modeling

PRISM > What I learned building a database of 3,045,406 real estate professionals for audience modeling.

This article is centered on private lending and audience modeling for customer acquisition strategy.  The non-QM loan origination opportunity is a 100 billion dollar non owner-occupied investment space covering SFR ’s 1-4 Fix N’ Flip, Acquisition, Refinance, Ground Up & Small Balance Commercial real estate.

It’s safe to say that most real estate investors in this market and category have little to no capital, yet they self identify as investors. This presents an interesting yet challenging marketing opportunity to surface qualified leads where application submissions and closed funding scenarios intersect.

Hard money has a negative connotation in the market. It’s generally a term used by lesser experienced borrowers. This concept is not limited to just the 3-5 core private lending products across real estate. It’s also not limited to the united states. This is a global trend occurring. It’s hugely popular in the UK  and other EMEA markets. Borrowers in this category typically rely on private funds, or “hard money” to complete work.

Traditional Realtors and Mortgage Brokers grossly misappropriate the need for a NMLS license to be a lender in this product space. The main point in dealing with a private lender is while they primarily fund SFR's, these loans are governed under commercial lending guidelines. Thus, are not governed under RESPA, TRID or TIlLA. These loans are funded only into business entities allowing 7-10 business day closings and can pay anyone under a Brokerage license a referral fee or commission on the HUD at closing.

The goal in building this database was getting us close to the true purchase decision maker and borrower as possible. Rather than relying on what realtors or borrowers tell us, we use data for targeting.

---

SELECT SUM(TABLE_ROWS)
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = 'prism'

3,045,406 Records

Prism contains 3,045,406 records, many of which were acquired from public sources. Or scraped from public sources. If your wondering what scraping is, it’s a means of acquiring large amounts of data from public facing origins. In a recent landmark case, Microsoft and LinkedIn were ordered to let parties scrape data. Legal teams @ MS and LI claimed the 1986 Computer Fraud and Abuse Act was violated by scraping data. This is in fact one of my sources for PRISM, among many others.

Utilizing engineering teams familiar with Chromium, Scrapy and Python I built what may be one of the largest privately operated databases entirely focused on real estate. This process is repeatable and scalable, so as records become dated or as time passes - new data can be acquired.

These saved websites and data typically have a structure we can parse in volume. By storing the downloaded HTML we allow ourselves the ability to make mistakes, re-run scripts, or adjust the specifics of what we are acquiring. It’s a common practice that makes up almost 52% of todays internet traffic.

In short, we reverse engineer HTTP calls (the thing you see in your browser bar) and download specific web pages that contain the data of interest. Then, our software parses these pages for the data. A great example would be passing over Zillow or Redfin for distressed properties which are a great resource for sales teams.

Cited data sources

  • AATOM Data

  • Bigger Pockets

  • Zillow & Redfin distressed property sales

  • Hard Money Lenders

  • MLS Associations

  • MLS Real estate agents

  • Mortgage branches

  • Mortgage companies

  • Mortgage professionals

  • REO Network

  • US Postal Code Geolocations

  • Zillow Home Value Index


What’s in there?
How is this even useful?

Think of this like columns in a spreadsheet. The information that’s available is indicative of what we can acquire during the scraping process. Some of the acquired data is incredibly value for cold calling, audience modeling, or making extremely specific conclusions about market performance. You’d expect that the data acquired would be indicative of desired output. It ranges from first and last name, right down to addresses and mobile phone numbers for the majority of the United States real estate professional market.

How you use this in combination or in relation with the surrounding data is where PRISM becomes powerful. If I want to build a prospecting list for our sales team of distressed property resellers, it can be done objectively. Rather than manually parsing and dialing against a site like REO Network (which we found to be of no value) we can take transaction data from RedFin and Zillow and look at the number of unique properties sold by a realtor. The same thing goes for LinkedIn. I see a lot of sales team manually chasing, scrolling, messaging, clicking and essentially wasting time. This alleviates a lot of that.

Compounding this already powerful output data is the ability to narrow it down to states, markets or MSA’s (metropolitan statistical area). If we see that John Doe is listed on REO network and focused on XYZ market but we don’t see transaction data for their proclaimed focus - we know they aren't worth prospecting against. By combining the relationship in MSA’s and areas a lender lends in you can hyper-target.

For Facebook Ads

The ability to matricie or rubric out decision making for these types of outputs are nearly endless. In another profound instance, data can be used to generate incredibly accurate saved audience models in Facebook. Today, Facebook gives you the ability to rudimentally select subsets of an audience by role, job type, interests etc. Taking that an extreme degree further, we can utilize phone numbers, first and last names, and email addresses as a means of generating an extremely targeted list.

Solvable Executive Asks:

  1. I need to target Mortgage Brokers with more than 5 years of experience in active Fix and Flip MSA’s with a median home sale price more than $159,000, but less than $1,250,000 where median credit scores are greater than 640

  2. I’d like to serve ads to REO Real Estate agents and Brokers on Facebook and Instagram with 3-5 years of experience where the median sale price of a home is between $600,000 and $850,000 and median credit scores are greater than 640, in Utah, Virginia, DC, California and Delaware

  3. I’d like to refine our Google Adwords campaign to specific postal codes in known active Fix & Flip markets where median credit scores are greater than 640 and profits were greater than $40,000 in 2018

  4. We lend in 32 of 50 states, I need an audience model for borrowers in known active Fix & Flip borrowers where the median home sale price is between $290,00 and $850,000

  5. Build us an auto-voice mail drop list of all mobile numbers for Brokers within 30 miles of Washington, DC that have been licensed and in business for more than 6 years


It’s entirely a numbers game. Number of calls made, number of emails sent, ad impressions served etc. The data can be skewed in almost any direction for any real estate market related usage. Even by narrowing down to income levels of median areas. All in all, it’s an asset that few if any lending institutions have the wherewithal or engineering prowess to build. The value ad is nearly indefinite.

Results in production environments (coupled with media buys etc) have been profound. Cost per lead dropped significantly and funding scenario application rates have risen and stabilized.

In conclusion, marketing efforts have a direct correlation with loan values, lead volumes and conversion rates. It’s a seemingly undervalued or a misunderstood aspect of the mortgage industry. The inability to objectively identify  borrowers and true sources of revenue within specific markets is costing many lenders and brokers. There is a data backed argument that investing in customer-acquisition efforts with objectivity and specificity reduces customer acquisition costs.


Ryan Roberts