Skip to content

Commit

Permalink
first commit
Browse files Browse the repository at this point in the history
ga4gh client (based off david4096/client) and directory structure
  • Loading branch information
david4096 committed Feb 25, 2016
1 parent b700236 commit 3208318
Show file tree
Hide file tree
Showing 19 changed files with 6,563 additions and 1 deletion.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ __pycache__/
# C extensions
*.so

.idea

# Distribution / packaging
.Python
env/
Expand Down
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Apache License
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/

Expand Down
80 changes: 80 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Biomedicine API Examples

## Introduction
Data sharing efforts and readily available computing resources are making bioinformatics over the Web possible. In the past, siloed data stores and obscure file formats made it difficult to synthesize and reproduce results between institutions. Here we present two biomedicine APIs, currently under development, and provide example usage. Some familiarity with python is expected.

*Get started!*

```
pip install -r requirements.txt
python hello_ga4gh.py
```

## ExAC
> Building on the existing ExAC application we opened up direct data access through straight forward web services. These services enable a user to integrate ExAC services into their own tools, querying the variant information and returning the data in an easy to programmatically use JSON format.
https://github.com/hms-dbmi/exac_browser


> The Exome Aggregation Consortium (ExAC) is a coalition of investigators seeking to aggregate and harmonize exome sequencing data from a wide variety of large-scale sequencing projects, and to make summary data available for the wider scientific community.
> The data set provided on this website spans 60706 unrelated individuals sequenced as part of various disease-specific and population genetic studies.
http://exac.broadinstitute.org

The REST API for ExAC has been developed as part of Harvard’s Patient-centered Information Commons: Standardized Unification of Research Elements (PIC-SURE http://www.pic-sure.org/software).

## GA4GH

[GA4GH](https://genomicsandhealth.org) aims to standardize how bioinformatics data are shared over the web. A reference server with a subset of publicly available test data from 1000 genomes has been made available for these examples.

The GA4GH reference server hosts bioinformatics data using an HTTP API. These data are backed by BAM and VCF files. For these examples we will only be accessing a GA4GH server, but it is open source and eager individuals can create their own server instance using [these instructions](http://ga4gh-reference-implementation.readthedocs.org/en/latest/demo.html).

## What is HTTP API

HTTP APIs allow web browsers and command line clients to use the same communication layer to transmit data to a server. A client can `GET` a resource from a server, `POST` a resource on a server, or `DELETE` amongst other things.

The documents that servers and clients pass back and forth are often in JavaScript Object Notation (JSON), which can flexibly describe complex data structures. For example, a variant in GA4GH is returned as a document with the form:

{
"alternateBases": ["T"],
"calls": [],
"created": 1455236057000,
"end": 4530,
"id": "YnJjYTE6MWtnUGhhc2UzOnJlZl9icmNhMTo0NTI5OjllNjRkMDIzOTc5NzQ3M2MyNjk2NzFiNzczMjg1MWNj",
"info": {},
"referenceBases": "C",
"referenceName": "ref_brca1",
"start": 4529,
"updated": 1455236057000,
"variantSetId": "YnJjYTE6MWtnUGhhc2Uz"
}

JSON uses strings as keys for values that could be strings, numbers, or arrays and maps of more complex objects.

## Examples

Each example is provided with inline comments that explain what communication with a server is being performed and how those data are being manipulated by our script.

### hello_ga4gh.py

Access a GA4GH reference server hosting bioinformatics data and see the basics of building a query.

### hello_exac.py

Access an API hosting population genomics data and a query service for finding variants in a gene.

### hello_ga4gh_client.py

Access a GA4GH reference server using a (provided) client, making some operations easier.

### visualize_ga4gh.py

Get data from a remote web service and visualize it using matplotlib.

### combine_apis.py

Use data from two web services to produce synthetic results.

### simple_service.py

Make the results of combining two APIs available as its own web service.

7 changes: 7 additions & 0 deletions client_dev.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"""
Simple shim for running the client program during development.
"""
import ga4gh.cli

if __name__ == "__main__":
ga4gh.cli.client_main()
65 changes: 65 additions & 0 deletions combine_apis.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
"""
combine_apis.py
An example of combining the results of interacting with
both the ExAC and GA4GH APIs.
"""

# We'll need both the requests module and the GA4GH client.

import ga4gh.client as client
import requests

EXAC_BASE_URL = "http://exac.hms.harvard.edu/rest/"
GA4GH_BASE_URL = "http://ga4gh-a1.westus.cloudapp.azure.com/ga4gh-example-data/"

def main():
# Let's instantiate the GA4GH client first
c = client.HttpClient(GA4GH_BASE_URL)

# Since we've done it before, getting variants can be done
# in a one-liner. We're picking up the first variant set
# for the first dataset returned.

ga4gh_variants = [v for v in c.searchVariants(
c.searchVariantSets(c.searchDatasets().next().id).next().id,
start=0,
end=2**32,
referenceName="1")]

print(str(len(ga4gh_variants)) + " GA4GH variants.")

# Now we'll access the ExAC API in search of variants on
# the BRCA1 gene. See `hello_exac.py`

GENE_NAME = "OR4F5"

response = requests.get(
EXAC_BASE_URL + "awesome?query=" + GENE_NAME + "&service=variants_in_gene")

OR4F5_variants = response.json()

print(str(len(OR4F5_variants)) + " ExAC variants.")

# Let's find out if we have any matches on position.

matches = []

for OR4F5_variant in OR4F5_variants:
for ga4gh_variant in ga4gh_variants:
# Note that GA4GH positions are 0-based so we add
# 1 to line it up with ExAC.
if (ga4gh_variant.start + 1) == OR4F5_variant['pos']:
print(OR4F5_variant['pos'])
print(ga4gh_variant.start)
matches.append((ga4gh_variant, OR4F5_variant))

print("Found " + str(len(matches)) + " matches.")

for match in matches:
print(match[0].names)
print(match[1]['rsid'])
print(match[0].referenceBases, match[1]['ref'])
print(match[0].alternateBases, match[1]['alt'])

if __name__ == "__main__":
main()
34 changes: 34 additions & 0 deletions ga4gh/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@

.. image:: http://genomicsandhealth.org/files/logo_ga.png

==============================
GA4GH Reference Implementation
==============================

.. image:: https://badges.gitter.im/Join%20Chat.svg
:alt: Join the chat at https://gitter.im/ga4gh/server
:target: https://gitter.im/ga4gh/server?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge

This is the development version of the GA4GH reference implementation.
If you would like to install the stable version of the server, please
see the instructions on `the PyPI page <https://pypi.python.org/pypi/ga4gh>`_.

The server is currently under heavy development, and many aspects of
the layout and APIs will change as requirements are better understood.
If you would like to help, please check out our list of
`issues <https://github.com/ga4gh/server/issues>`_!

The latest bleeding-edge documentation is available at `read-the-docs.org
<http://ga4gh-reference-implementation.readthedocs.org/en/latest>`_.

- For a quick start with the GA4GH API, please see our
`demo <http://ga4gh-reference-implementation.readthedocs.org/en/latest/demo.html>`_.
- To configure and deploy the GA4GH server in production
please see the
`installation
<http://ga4gh-reference-implementation.readthedocs.org/en/latest/installation.html>`_
page.
- If you would like to contribute to the project, please see the
`development
<http://ga4gh-reference-implementation.readthedocs.org/en/latest/development.html>`_
page.
10 changes: 10 additions & 0 deletions ga4gh/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"""
Reference implementation of the GA4GH APIs.
"""

__version__ = "undefined"
try:
from . import _version
__version__ = _version.version
except ImportError:
pass
Loading

0 comments on commit 3208318

Please sign in to comment.