As you grow your codebase from proof of concept to fully functioning application, you may start running into the Trestle quotas, especially if your query strategy is inefficient or unthrottled. In Quotas and Limits, we discuss some of the details of the quota system Trestle employs.
To work within your quota, we recommend taking the following four steps:
- Use the correct timestamps for querying resources
- Consume as much data as possible with each query
- Throttle your queries so as not to exceed your quotas
- Use a wait-and-retry strategy if you do exceed your quota.
Use the correct timestamps
Downloading property records and media records is one of the most common practices for users of Trestle (it's why our Getting Started tutorial sets up a client that does those two things). If you are doing replication, you will want to ensure that you are keeping up to date with the property and media correctly.
For property, the timestamp to filter on is
ModificationTimestamp. This value represents when the record was last updated in Trestle. The record can be updated in Trestle because it was updated in the source system or because we updated mappings or enumerations that affected the record.
For media, the timestamp to filter on is
PhotosChangeTimestamp. This value lives in the Property resource, but describes the media. We do encourage pulling all of the media records for each record based on the
PhotosChangeTimestamp so that you will be updated on photos being added, changed, or deleted.
Consume as much data as possible
There are two main tricks for maximizing the amount of data you get per query. The first is to use
$top=1000. Setting this value will pull 1000 records per query, the maximum Trestle will return in one request. The second trick is to make use of
$expand. You can expand to multiple resources at one time, for example
$expand=Rooms,Units,OpenHouse,CustomProperty,Media to get all of the rooms, unit, open house, and custom data as well as all of the information about the photos for the property. Combining these two tricks can save you many queries.
Throttle your queries
Query throttling can be as simple as putting in a sleep statement between each query or as complex as dynamically tracking the baseline (per hour) and burst (per minute) quotas as well as their reset times and ensuring that you do not send more queries than you are allotted during those time frames. Often this latter, more complex method, will involve one of the flow control algorithms like Leaky Bucket or Token Bucket.
Here are some resources on setting up query throttling
Wait before you retry
If you are trying to maximize the number of queries you can execute per hour while staying within the quota limits, it is possible that you will still hit the 429 Too Many Requests HTTP status code indicating that you have exceeded your quota. When that happens, you will want to retry your query; however, you will also want to wait before you retry it.
Wait and retry approaches generally come in two flavors: static wait and exponential wait. For a static wait time, you will do something like wait 10 seconds and then retry your query. If it gets a 429 again, you will wait another 10 seconds and retry again. This loop can either go on forever (not recommended) or end after some number of retries.
The exponential wait is our recommended practice. In this model, you will wait increasingly large amounts of time between retries. So you might start at waiting only 1 second, then 2, then 4, 8, 16, 32, ... up to some number of retries. You can read more about exponential backoffs in Google's API Documentation.
When replicating data from Trestle, there are three main steps to keep in mind and one condition to watch out for. The main steps are the initial loading of the data, keeping up with the changes, and reconciliation. If you have all of that in place, then you will be set to provide up-to-date data to your clients. The one condition that you will want to watch out for (or really tell your code to watch out for) is mass updates. While we work to minimize the number of records updated in a mass update, we do not control the source systems and it is possible that they will update all of their records (by accident or on purpose). Being prepared for mass updates is important.
Loading the property data is fairly straightforward and covered in other sections of the documentation. Specifically, look at Consume as much data as possible above to see strategies for pulling as much data as possible in each query. We also highly recommend using the Replication=true extension to use Trestle's replication endpoint for highly efficient queries that can exceed 1,000,000 records returned.
There are two main strategies for replicating the media from Trestle: downloading photos individually using the MediaURL and downloading the photos in bulk from the Media/All endpoint.
Downloading the photos via MediaURL
Trestle provides a MediaURL for each photo. This is publicly accesible URL and is governed by a separate quota from the regular WebAPI requests. This separate quota is the main pro when talking about replicating images. It allows you to replicate the images without impacting your data loads and incremental updates. While there is a separate quota, it does not allow 30 times as many queries, so replicating all of the images one-by-one will take longer than replicating all of the property records.
This strategy may be suitable if you are doing some form of lazy image replication, e.g. downloading all of the primary photos initially and only downloading secondary photos when they are specifically requested.
Using the Media/All endpoint
In addition to providing MediaURLs, Trestle provides what we refer to as the Media/All endpoint. The Media/All endpoint is accessed via the Property record like
/Property('123456')/Media/All. This query will give you all of the media records for the property with
ListingKey = 123456. The photos are returned in the same mime multipart format that RETS GetObject uses. This strategy is the most quota-efficient strategy as it can pull as many images as the property has (whether that be 1, 10, or 100) in one query. The downside is that it the Media/All endpoint is not governed by a separate quota, so you will have to balance your Media/All queries with the rest of your incremental updates.
Keeping up with the Changes
Once you have your initial data load done, you will want to keep up with the changes in Trestle. To do this, you will use an incremental update strategy that relies on tracking the last modified dates for each resource you are updating. For property records (and the associated resources like OpenHouse, Rooms, Units, and CustomProperty), you will want to key off of and track
ModificationTimestamp. For best results, we recommend tracking the latest value you received from Trestle. Doing so will ensure that clock differences between your computers and Trestle's do not cause you to miss updates. For photos, you will want to key off of and track
PhotosChangeTimestamp from the property record.
If you run these incremental updates every few minutes to every hour, you should be able to keep up with changes while staying within your quota.
While the incremental updates will catch new records and changed records, it will not provide you information on records that are no longer available. In order to determine which listings are no longer available (off-market, deleted, no longer being distributed on the Internet, etc.), you must use a reconciliation process. The basics of this process are that you will query Trestle for all of the keys for the resource and remove any records that do not match that list of keys. When you select just the key field from a resource, you can download more than 1,000 records at a time. So your query would look like
/Property?$select=ListingKey&$top=300000. Note that if there are more than 1,000,000 records in the resource, we recommend using the replication endpoint for this step.