Skip to content
Charles edited this page Mar 7, 2018 · 11 revisions

Market Access / Market Potential Analysis

Objective

The objective of this analysis is to identify, for a range of origin points, what their accessibility relationship is to all of the destination points. The assessed trips can be by car, by bicycle, or on foot.

Definitions

‘Market access’ is a measure of accessibility from one origin to all destinations, based purely on time. The outcome of this analysis is often visualized as an ‘isochrone’, centered on a given origin point. Isochrones depict how far away from the origin you can move, in all directions, within a certain time frame. (E.g. 1 hour).

‘Market potential’ can be assessed by weighting all potential destinations by a factor designed to represent their attractiveness to the origin point. In the context of performing commercial site selection, such factors might include GDP per capita or population (positively related), or the number of other firms (negatively related). We refer to each origin – destination pair as an ‘O-D’ pair, and the matrix of travel times from all origin points to all destination points as an ‘O-D matrix’. An example of such a matrix is given below for five origin and destination points, A to E:

Image credit: https://ira.mit.edu/blog/agent-based-visualization

Where in the Project Cycle does this fit?

This type of analysis is well suited for the project design phase. In particular, it is excellent for comparing potential sites for an intervention. By quantifying the current state of accessibility from various origins with respect to a goal, it allows the user to directly appraise one site versus another, and even identify the optimal site from a large population of potential sites.

This script is not so well suited to monitoring and evaluation because it uses the Open Street Map (OSM) network as the road network dataset. This is best appreciated by taking the example of a road project. In order for a project to be represented on OSM, it must be manually added to the dataset by a user. The market accessibility would need to be calculated before and after the addition of the road to the dataset in order to detect a change (as represented, presumably, by shorter journeys for some O-D pairs).

Methodology: Market Access

The code draws extensively on the APIs that are now freely and commercially available to calculate travel times against the Open Street Map road network. It is pre-programmed to access three separate APIs, all of which return a travel time matrix, in seconds:

  • Open Source Routing Machine (OSRM)
  • Mapbox’s Matrix API (requires API key)
  • Mapbox Traffic (requires API key) – provides travel times between O-D pairs adjusted for traffic, as defined at runtime / the time when the call is made.

The three APIs have a very similar structure, as represented by:

/directions-matrix/v1/{profile}/{coordinates}&{sources}&{destinations}

Each element will be dissected in turn:

{profile} – This determines whether the speeds returned are for walking, cycling or driving.

{coordinates} – This is the list of input coordinates, to include both sources and destinations. An input coordinate is entered in WGS 1984, longitude; latitude format; entries are separated by semi colons, and coordinate pairs by commas. There are certain limitations in terms of the input list length for each API, but the tool takes care of this.

{sources}&{destinations} – These are the index numbers of the list of input {coordinates} which denote whether a coordinate pair is a source, a destination, or both. In many cases, the origins and destinations will not be the same (e.g. when assessing accessibility from villages to health clinics, or from individual buildings to green spaces). In order to build up a square -shaped O-D matrix, it is therefore necessary to separate out both origins and destinations in the requests. In the example below, only the 1st and 3rd pair of coordinates are treated as sources, whilst all three coordinates are treated as destinations. In this example, the API call would return a 2x3 O-D matrix.

https://api.mapbox.com/directions-matrix/v1/mapbox/cycling/-122.42,37.78;-122.45,37.91;-122.48,37.73?sources=0;2&destinations=all&{token}

Tiling

The limit of the APIs is a maximum of 25 input coordinates per request (10 for the Traffic API). This means that the maximum size matrix that can be generated, for distinct origins and destinations, is a 12 x 13 matrix (156 travel time pairs returned).

Many of our use cases require more than 12 origins and 13 destinations. As such, the key innovation is the ability to use the APIs to generate massive O-D matrices, of any scale. The algorithm uses the following process tree:

  1. Take a set of 12 origin destinations, and hold these constant.
  2. Calculate the O-D matrix for these 12 origins to the first 13 destinations
  3. Move to the next batch of 13 destinations and repeat step 2 until the travel time to all destinations is calculated;
  4. Move to the next batch of 12 origins and repeat steps 1 - 3 until the matrix is filled in for all origins.

This is easiest visualized as a chessboard – where the algorithm is moving horizontally across all destinations, before moving down to the next ‘row’ of origins

This algorithm design has two major advantages:

  • Although slow, the advantage of this method is that the road network and the O-D matrices do not need to be held locally in memory. The memory of a standard computer is quickly exceeded by O-D matrices, which can easily run into the millions of entries. By tiling up the desired end product into smaller batches, this process can remain lightweight whilst still building a very large end result.
  • Further, the APIs consider the entire OSM network when returning fastest travel speeds. Calculating the optimal travel time is a very intensive optimization process, which is effectively ‘outsourced’ to the API for the purposes of the end user. It will return an answer even if this means travelling outside the original country borders – which would require a very large network to be held in memory locally if it were to be replicated. The output of this is an O-D matrix, completed with a travel time from every origin to every destination. This can be visualized appropriately to demonstrate Market Access from a given point.

Methodology: Market Potential

With the O-D matrix from the Market Access process calculated, it is possible to weight the destinations positively by their attractiveness, and negatively by their distance to ascertain a measure of market potential. For each origin, the sum of the distance-weighted attractiveness is calculated. This relationship is represented by the following expression:

 Market Potential for Om = ∑ (Attractiveness Dn X e ^ ((-λ X (ρOmDn))))

Where:

 Om = An Origin, of which there are m 

 Dn = a Destination, of which there are n

 λ = Distance decay parameter 

 ρOmDn = the distance between Om and Dn

Where the origin is also a legitimate destination, it may also be relevant to add the attractiveness of the origin as well, unmodified by distance:

Market Potential for Om = ∑ (Attractiveness Dn X e ^ ((-λ X (ρOmDn)))) + Attractiveness Om

The two measures which require careful selection here are:

λ = Distance decay parameter – this parameter should be set relative to the type of good being assessed, and relative to transportation costs. When assessing the demand for high value goods, it is safe to assume that transporting them a long way makes economic sense due to the ratio of the value of the good vs. the value of transporting it. Therefore, the effective market catchment should not decay much over distance, and a low value of lambda should be set. By contrast, if the objective is to appraise the market potential for perishable / low value items, such as vegetables, a high value of lambda should be selected, to represent the concept that the demand is not worth anything to the firm, after transportation costs are taken into account, beyond a certain threshold.
In many cases, there is no ‘right answer’ for the value of lambda. The code therefore runs multiple values of lambda, over a wide range of magnitudes, to ensure that several will be relevant. The function can be ‘calibrated’ by selecting lambda in the following fashion:

If I want demand to halve every t minutes of travel time, I should set lambda to:

 λ = ln2 / (60 × t )

Selecting various values of ‘t’ sets various ‘interpretable’ values for lambda, which can then be used in the Market Potential code.

Attractiveness – Similar to lambda, setting the correct attractiveness parameter requires consideration of the question, and good, in mind. The characteristics that determine a place’s market attractiveness will vary relative to the good being sold. Taking the example above, if we are considering a low value product such as vegetables, population may be more relevant than income in determining a destination’s attractiveness; conversely, a measure of income such as GDP per capita may be more relevant to the exporters of a high value product such as wine or jewelry.

Data Input Requirements

The scripts expect a comma separated values (.csv) file, where each origin or destination is a row. The file should contain columns with, at a minimum, the following four pieces of information: 1.) Latitude of the points 2.) Longitude of the points 3.) The unique identifier for the point. This will help to map the final results back to the original input data; 4.) The attractiveness of the origin / destination. This needs to be a numerical value rather than a string or categorical variable.

There is no reason why the origins and destinations need to be the same (although they may be, if for example assessing accessibility between villages). Where origins and destinations are distinct, they must be placed into separate files in the same folder. The column names in the destination file must also be identical to the file describing the origins.

There is currently no size limit on either origins or destinations – the limitations are just 1.) time and 2.) RAM / hard drive size for the computer performing the download. The script, when set to the OSRM or MapBox servers, downloads O-D pairs at a rate of 500,000 per hour.

Using the Code: Code Syntax and Features

One of the most useful features of the code is the ability for it to be restarted in the event of a crash, for whatever reason (e.g. loss of internet connection, keyboard interruption). The code periodically saves its progress on the O-D matrix to a temporary .csv file in the input folder every time it finishes a block of 12 sources. Upon restarting the process, it creates a new temporary file. At the end of the download process, it stitches together all temporary files into a single file, and uses this to conduct the market analysis. This ‘process handling’ is implemented through the use of the -R and -T flags as described below.

The code makes use of a python functionality called ‘argv’. This is the ability to add various parameters and define key variables at runtime, from the command shell, rather than having them defined in script. This give the ability to ‘direct’ the algorithm at various input folder paths, for example. A full list of these flags is described below:

Helper flags

-h – This displays a help message, with helpful prompts for each of the other argv flags.

Required flags

-p – The folder in which all input files are contained.

-f – This is the file that describes the origins at a minimum, and also potentially the destinations (see the Data inputs section)

-m – Column name containing the latitude of the points

-n – Column name containing the longitude of the points

-o – Unique identifier column name

-q – column containing a numerical scalar for the origin / destination attractiveness.

-c – flag for defining which server to call for information – MB for MapBox, MBT for MapBox Traffic, and OSRM for OSRM.

Optional flags

The following flags are optional, and do not need to be included for the script to run:

-W – File name of the destinations .csv, where they are separate to the origins

-R – in the event of a process recovery, specify the last known save point number here

-Z – counter for the number of times the process has been restarted. Start from 1 for the first restart (2 for the second, etc.)

-l – limit mode. This will restrain the origin and destination list length to the inputted integer value. This is very useful for ‘trial running’ the process to ensure that the outputs are the correct shape before downloading the data for the entire matrix.