It’s painstaking work! Dozens of data cleaners and QA engineers
count and draw the perimeter of every store, parking lot, factory,
warehouse, sand mine pit, and so on. The hardest part is not the
geofencing itself, but making sure the geofences are correct; we
have two separate QA teams to run multiple checks on the data.
Our data scientists all come from the financial sector, where data quality and integrity is critical for making split-second trading and investment decisions, often worth many millions of dollars. They have spent over 5 years carefully selecting locations and testing the data to certain make our geofences are as precise as they can be.
Bidstream data, also known as “advertising (ad) exchange” data, is collected from ads on mobile applications (apps). When a smartphone app wants to show an ad, it connects to an ad exchange and ‘asks’ which advertiser has paid the most to display their ads at that specific location. The ad exchange sends back an ad from the highest bidder, and then resells the location information. The location data collected therefore reflects the number of people who have seen a specific ad on their mobile. And it contains basic facts about the ad unit such as the publisher and URL, the type of device and the IP address. Location data can be derived from bidstream data and is resold separately by app developers and publishers.
Bidstream data is widely considered to be of low quality for the
purposes of geolocation. One key reason is advertising fraud, which
some estimate to affect up to 95% of bidstream data. This is
because it is difficult to verify whether a cellphone app has sent
a true advertising request to the ad exchange, or whether the app
was serving ads even when users were not looking at their phones,
in order to get paid.
Fraudulent app developers pretend to be showing ads, and ad exchanges try to counteract the fraud. But the ad exchanges also get a percentage of the proceeds from every ad, so they are not incentivized to police fraudulent ads. Even large companies like Google have been slow to clamp down on this and have allowed hundreds of millions of fraudulent ads to be collected.
For more insights:
GOOGLE SUSPENDS TWO MOBILE APPS AFTER REPORTS OF AD FRAUD - Wall Street Journal, December 4, 2018
GOOGLE REMOVES OVER 600 APPS FROM PLAY STORE TO CRACK DOWN ON MOBILE AD FRAUD - The Hindu Business Line, February 21, 2020
Cellphone carriers (including Verizon, AT&T, Sprint, and
T-Mobile) previously sold location information for every cellphone
number, in real time. This attracted a lot of attention, including
a $200 million LAWSUIT FILED BY THE FCC against the companies for
selling user data without permission.
Other than violating privacy, cell tower data is also unusable for detailed analytics. The only location details available are those of the cell tower that the phone is connecting to. These towers may be several miles away, therefore the only information that can be derived is that a phone was within a certain (broad) radius of the cell tower.
This is not precise enough to know which building or retail location the phone was in. For reasons of both accuracy and privacy Advan has never worked with cell tower data.
Since bidstream and cell tower data are challenging from a privacy perspective and unreliable for measuring foot traffic, alternative sources of accurate and reliable location data have been developed. The data that Advan uses is gathered directly from cellphone apps, which explicitly and clearly ask the user for permission for collection and dissemination. In addition, apps geolocate the phone with a 10 meter accuracy. The collection typically uses a standardized SDK for better accuracy and consistency of measures.
SDK stands for “Software Development Kit”. In the context of
location data, SDK is a standardized piece of software that is used
by a smartphone app to collect the location information of the
phone. The advantage of SDK data is that it is collected in a
standardized manner across every application.
The term SDK is sometimes overused. For example, some bidstream collecting companies claim to use SDK collection. In this context, however, the “SDK” is just a simplified code that sends location details to the bidstream company without verifying details, for example, whether the user opted-in to share the data, when and how often the data was collected, whether the phone sensors were used properly, etc. The SDKs used for the data Advan receives verify all of these details and more, to ensure the data is quality and privacy-protected.
Advan contractually receives only opt-in data and further agrees not to identify individuals based on their data. We have performed due diligence on our largest data providers and some of the largest applications that the data is collected from to ensure that their practices are privacy compliant and that the user is shown the proper privacy policies and opt-in messages. All our key data providers are also members of the National Advertising Institute (NAI) whose members voluntarily agree to protect the privacy of all data collected and ensure that their customers also do.
True Trade Area is the term used to describe the geographic area from which a retailer’s customers, for example, have traveled from to visit a specific store.
Our goal has always been to offer the most accurate geolocation data in the market, supported by institutional-level analytics. In order to make our location data more accurate, we have manually geofenced over 2 million locations (POIs) around the world. This ensures the when we look at foot traffic for any venue, we have the highest accuracy rate in the market.
POI means “Point Of Interest”. This can be a building, a mall, a retail store, a factory, a racetrack, a restaurant, a coffee shop, a hotel, a hospital, an airport, and so on. Advan has collected 150 million POIs and is continuously updating its database with new ones.
Geofencing is the process of creating a virtual fence around a real geographic area. This can be a store or a building, a mall or a city block, even a whole city. In this way we can select the mobile devices that are within the fence and provide analytics for those devices.
T+n is a standard financial term that means “Time plus n days”. This means we receive and process the traffic data one day after it occurs, so the data is available with a 1 day delay, which is faster than almost all alternative data, including transaction and satellite data.
Regulations such as GDPR (General Data Protection Regulation) in Europe, and the CCPA (California Consumer Privacy Act) both aim to protect consumers through legislation that requires transparency about how cell-phone data is being used, and by whom. We believe this is a positive development both for consumers and for data providers. Consent is key for maximizing the benefits of location data, both for firms that harvest the data and for consumers. The single largest effect of these regulations is that bidstream data (aka advertising exchange data) are disappearing, as the users have not explicitly consented to their data being collected. Despite the fact that CCPA is less restrictive than GDPR, the CCPA impact has been much larger in that respect, because most advertising exchange data were collected from US consumers. Advan has always focused its analytics on explicit opt-in, non-bidstream data, for reasons of both quality and privacy protection.