Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development...

29
Mangrove Documentation Release 0.1 Jeff Wishnie December 15, 2011

Transcript of Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development...

Page 1: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove DocumentationRelease 0.1

Jeff Wishnie

December 15, 2011

Page 2: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this
Page 3: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

CONTENTS

1 Project Organization 31.1 Project Governance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Faces of Mangrove . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Developer Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Setting Up the development environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Design and Technical Documents 92.1 Mangrove Tutorial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 APIs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 Datastore document structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4 System Concepts & Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.5 API SPEC EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.6 Data Dictionary Expected API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.7 Querying couch by hierarchy and time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.8 Setting up the ‘DataWinners’ Web App . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.9 Setting Up the POSTGIST and importing location data . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Indices and tables 25

i

Page 4: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

ii

Page 5: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

Mangrove is an open source platform for exploring data with location and time information.

Project Home http://mangroveorg.github.comEmail http://groups.google.com/group/mangroveorgIRC chat #mangrove on freenode.netSource https://github.com/mangroveorg/mangroveJenkins CI http://178.79.163.33:8080/

CONTENTS 1

Page 6: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

2 CONTENTS

Page 7: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

CHAPTER

ONE

PROJECT ORGANIZATION

1.1 Project Governance

[coming soon!]

1.2 Faces of Mangrove

1.2.1 Kick-off team as of Feb-2011

Akshay Naval

organization ThoughtWorks

location Pune, India

Alan Viars

organization Videntity

location Baltimore, MD, US

Alex Dorey

organization Columbia University

location New York, US

Andrew Marder

organization Earth Institute/Columbia University

location New York, US

Aroj George

organization ThoughtWorks

location Pune, India

3

Page 8: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

Asif Momin

organization ThoughtWorks

location Pune, India

David McAfee

organization HNI

location Antananarivo, Madagascar

Diptanu Choudhury

organization ThoughtWorks

location Bangalore, India

Jeff Wishnie

organization ThoughtWorks

location Portland, OR, US

Kedar Bapat

organization ThoughtWorks

location Pune, India

Kevin Samuel

organization

location Nice, France

Mamy Dafy

organization HNI

location Antananarivo, Madagascar

Matt Berg

organization Columbia University

location New York, US

Shweta Shetty

organization ThoughtWorks

4 Chapter 1. Project Organization

Page 9: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

location Pune, India

Ravi Kumar

organization ThoughtWorks

location Pune, India

Simon de Haan

organization Praekelt Foundation

location Cape Town, South Africa

Photos hosted here on Flickr

1.3 Developer Practices

• We are using Git as the Source Control Manager (SCM)

• We are using GitFlow for better version control and branching

• Our Documentation style is restructuredtext (RST)

– We can try out RST text with the online tool reSTrenderer

• Our python coding style guide is PEP8

– 4 spaces per indentation level

– Soft tabs (indentation is with spaces only)

• We have a continuous integration server set up using jenkins. It can be viewed on http://178.79.163.33:8080/

• We have detailed test reports and code coverage for every build

• We are using nose tests to write unit tests. You are requested to maintain the unit test suit for every code youcheck in. Please make sure that the test coverage for code is high :)

• Our functional tests are written in WebDriver (Selenium 2.0b2)

• We are using fabric for automatic deployment

• We use virtualenv and pip to set up our python environment

1.3.1 Other important links

• Our transport layer is managed by VUMI

• Django 1.3 is our web framework

1.3. Developer Practices 5

Page 10: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

1.4 Setting Up the development environment

In order to get your machine setup to start contributing to mangrove - this is what you need to do:

• Install chef client

• Clone the chef repository from mangrove

• Run chef solo

1.4.1 Details

We are detailing out the above steps for Ubuntu 10.10. For any other OS - please look through the chef documentationgiven here

• Install ruby:

$ sudo apt-get install ruby-full

• Install rubygems:

$ cd /tmp$ wget http://rubyforge.org/frs/download.php/70696/rubygems-1.3.7.tgz$ tar zxf rubygems-1.3.7.tgz$ cd rubygems-1.3.7$ sudo ruby setup.rb

You can verify your installation was successful with

$ gem -v1.3.7

• Install chef client:

$ sudo gem install chef

You can verify your installation was successful with

$ chef-client -vChef: 0.9.0

• Install git:

$sudo apt-get install git

• Clone the chef repository:

$git://github.com/mangroveorg/chef-repo.git

• Make sure your system is updated and upgraded before you run the chef script:

$sudo apt-get update$sudo apt-get upgrade

• Create a user mangrover and give him sudo rights:

$useradd mangrover$passwd mangrover$sudo usermod -aG sudo mangrover

6 Chapter 1. Project Organization

Page 11: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

• Run chef solo(as mangrover):

$cd chef-repo$sudo chef-solo -c chef-solo/solo.rb -j chef-solo/node.json

1.4. Setting Up the development environment 7

Page 12: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

8 Chapter 1. Project Organization

Page 13: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

CHAPTER

TWO

DESIGN AND TECHNICALDOCUMENTS

2.1 Mangrove Tutorial

2.1.1 Introduction

Follow are the main concepts in mangrove.

2.1.2 Entity Type:

create entity type:

entity_type = ["HealthFacility", "Clinic"]# entity type is hierarchy. example "Education School" etcdefine_type(self.dbm, entity_type)

2.1.3 Entity:

create entity

entity_type = ["HealthFacility", "Clinic"]# entity type is hierarchy. example "Education School" etccreate_entity(self.dbm, entity_type=entity_type, short_code="1")

2.1.4 Data Record:

get datarecord:

DataRecord.get(self.dbm,data_record_id)

2.1.5 Form Model:

Create a Form:

9

Page 14: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

default_ddtype = DataDictType(self.dbm, name=’Default String Datadict Type’, slug=’string_default’,primitive_type=’string’)

default_ddtype.save()question1 = TextField(name="Q1", code="ID", label="What is the reporter ID?",

language="eng", entity_question_flag=True, ddtype=default_ddtype)

question2 = TextField(name="Q2", code="DATE", label="What month and year are you reporting for?",language="eng", entity_question_flag=False, ddtype=default_ddtype)

question3 = TextField(name="Q3", code="NETS", label="How many mosquito nets did you distribute?",language="eng", entity_question_flag=False, ddtype=default_ddtype)

form_model = FormModel(dbm, entity_type=["Reporter"], name="Mosquito Net Distribution Survey",label="Mosquito Net Distribution Survey",form_code="MNET",type=’survey’,fields=[question1, question2, question3])

form_model.save()

2.1.6 Data Submission:

Submit data to the form directly

values = { "ID" : "rep45", "DATE" : "10.2010", "NETS" : "50" }form = get_form_model_by_code(dbm, "MNET")form_submission = form.submit(dbm, values, submission_id)

Submit data to the player

text = "MNET .ID rep45 .DATE 10.2010 .NETS 50"transport_info = TransportInfo(transport="sms", source="9923712345", destination="5678")sms_player = SMSPlayer(dbm)response = sms_player.accept(Request(transportInfo=transport_info, message=text))

The player will also log the submission for you in Mangrove.

Load all submissions for the form::

get_submissions_made_for_form()

2.1.7 Aggregation:

Monthly Aggregate on all data records for a field per entity for the form code

values = aggregate_for_time_period(self.manager,form_code=’CL1’,aggregates=[Sum("patients"), Min(’meds’), Max(’beds’),Latest("director")],period=Month(2, 2010))

10 Chapter 2. Design and Technical Documents

Page 15: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

Returns one row per entity, with the aggregated values for eachfield.{"<entity_id>": {"patients": 10, ’meds’: 20, ’beds’: 300 , ’director’: "Dr. A"}}

Weekly Aggregate on all data records for a field per entity for the form code

values = aggregate_for_time_period(self.manager,form_code=’CL1’,aggregates=[Sum("patients"), Min(’meds’), Max(’beds’),Latest("director")],period=Week(52, 2009))

52 is the weeknumber and 2009 is the year.Returns one row per entity, with the aggregated values for each field.{"<entity_id>": {"patients": 10, ’meds’: 20, ’beds’: 300 , ’director’: "Dr. A"}}

Yearly Aggregate on all data records for a field per entity for the form code

values = aggregate_for_time_period(self.manager,form_code=’CL1’,aggregates=[Sum("patients"), Min(’meds’), Max(’beds’),Latest("director")],period=Year(2010))

2010 is the year.Returns one row per entity, with the aggregated values for each field.{"<entity_id>": {"patients": 10, ’meds’: 20, ’beds’: 300 , ’director’: "Dr. A"}}

2.2 APIs

To get a quick idea of the current state of the mangrove.datastore API, we have included the API for all the files inmangrove/datastore/.

Here’s what’s in mangrove.datastore:

2.2. APIs 11

Page 16: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

2.2.1 aggregationtree

2.2.2 config

2.2.3 database

2.2.4 datadict

2.2.5 data

2.2.6 datarecord

2.2.7 documents

2.2.8 entity

2.2.9 initializer

This module tries to import validate.py, but I can’t get it to work on my machine.

2.2.10 reporter

This module tries to import validate.py, but I can’t get it to work on my machine.

2.2.11 settings

2.3 Datastore document structure

Entities:

Reporter{

"_id": "a676766fe45440f48ff4e9a0ce58b329","_rev": "1-f83b88a382c14b5ade660710adde0d9e","name": "reporter1","entity_type": "reporter","last_updated_on": null,"created_on": "2011-03-24T07:32:15Z","aggregation_trees": {

"org_chart": ["Country Manager","Field Manager","Field Agent"

]},"attributes": {

"age": 25,"entity_type": "reporter"

},"document_type": "Entity"

}

12 Chapter 2. Design and Technical Documents

Page 17: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

Clinic{

"_id": "961cefb2a0324878bed06ab736e5dc09","_rev": "1-c4d02559a13e7698437aae58f35e8440","name": "Clinic 1","entity_type": "clinic","last_updated_on": null,"created_on": "2011-03-24T07:32:15Z","aggregation_trees": {

"location": ["India","Maharashtra","Pune"

]},"attributes": {

"entity_type": "clinic"},"document_type": "Entity"

}

Data Records:

Data Record{

"_id": "e4d5cb3e76ca40a78088c7bfe5d0cf03","_rev": "1-7ca2ee8ad10a444eb6f3a8bad19ff957","reporter_backing_field": {

"_id": "a676766fe45440f48ff4e9a0ce58b329","name": "reporter1","entity_type": "reporter","_rev": "1-f83b88a382c14b5ade660710adde0d9e","last_updated_on": null,"created_on": "2011-03-24T07:32:15Z","aggregation_trees": {

"org_chart": ["Country Manager","Field Manager","Field Agent"

]},"attributes": {

"age": 25,"entity_type": "reporter"

},"document_type": "Entity"

},"last_updated_on": null,"source": {

"report": "hn1.2424","phone": "1234"

},"created_on": "2011-03-24T07:32:15Z","attributes": {

"beds": "100","event_time": "2011-02-01 00:00:00","arv": "200"

},"document_type": "DataRecord",

2.3. Datastore document structure 13

Page 18: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

"entity_backing_field": {"_id": "880552a483594ca9af07508e379f4520","name": "Clinic 2","entity_type": "clinic","_rev": "1-99c4e6ebd76bba417dcd034f935d7483","last_updated_on": null,"created_on": "2011-03-24T07:32:15Z","aggregation_trees": {

"location": ["India","Karnataka","Bangalore"

]},"attributes": {

"entity_type": "clinic"},"document_type": "Entity"

}}

2.4 System Concepts & Terminology

2.4.1 System Overview

2.4.2 Terminology

• SMS - Short Message Service (limited to 160 char), message string that is sent over mobile phones

• ICT - Information & Communication Technology

• ICT4D - ICT for Development

• M4D - Mobile for Development

• ARV - Anti-retroviral medication (against HIV Virus)

• IVR - Interactive Voice Response

• HCN - Host Country National

• USSD - Unstructured Supplementary Service Data

• MDG - Millenium Development Goals

• Indicator - it is a field that is computed or used in Visualization of associated data

• M&E - Monitoring and Evaluation

• MNO - Mobile Network Operators

• WASP - Wireless Access Service Provider

• SMSC - Short Message Service Center (component that sends the SMS messages to the recipients)

• MSL - Master Story List

14 Chapter 2. Design and Technical Documents

Page 19: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

2.4.3 Core Datastore Overview

Introduction

The high-level goal of the Mangrove datastore is to allow the free submission of data about a known set of entities andthe quick and easy retrieval of data aggregated across time and hierarchy without requiring any upfront definition ofschemas or entity structure.

The key goals can be summarized as:

• Support Schema-less submission of arbitrary data

This is motivated by expected usage patterns where an organization will frequently modify the data collectedbased on actual usage. By avoiding requiring any a-priori definition of data-sets users are given full flexibilityto adjust data collected on-the-fly.

for example, a health NGO operating rural clinics might begin by simply collecting a monthly report of howmany patients where seen in that month. As they get more sophisticated they may start collecting separate valuesfor men, women, girls and boys. This transition should not require any datastore restructuring.

• Support aggregation of data across time and hierarchy (geographic as special case)

Time-based aggregations include queries such as “Average number of patients seen in 2011” or more complexsegmented time aggregations such as “Average number of patients seen each month in 2011”

The key hierarchical aggregation is by geographic administrative boundaries. For example: “Total number ofpatients seen in 2011 for all clinics in San Francisco (or California or United States)”

Non-geographic arbitrary aggregation trees as supported as well. For example, aggregation by organizationchart: “Patients seen at clinics managed by the Child Protection group”

• Provide data consistency on a field level via ‘Data Dictionary’

To make it easy for users to aggregate data collect for a given entity via unstructured data submissions, thecore datastore will include a ‘Data Dictionary’ where semantic-types are defined at stored. These types arethen applied to submitted data fields allowing aggregation across different submissions and encouraging dataconsistency.

For example, our health NGO now wishes to collecting data on each patient who receives an HIV test so theysubmit data for each patient test in form (name, age-in-years, test-administered).

Later they start recording patients who receive family-planning counseling and collect: (name, age-in-years,counseling-program-attended)

When they want to get the average age of patients who received HIV Tests or Family Planning Counselingthe system can aggregate values of ‘age-in-years’ from both submissions even though the structure of eachsubmission is different.

And later, when they want to start registering infants seen, they can define a more useful ‘Age in Months’field (with values ranging from 0-60) and still run aggregations of the form “Average age of patients seen” bymultiplying any aggregated “Age in Years” values by 12 before averaging with “Age in Months” fields.

• Provide simple Python and RESTful APIs for accessing data and standard aggregation queries

The datastore is agnostic as to both the sources and consumers of data. These APIs will allow data sourcesranging from SMS engines, to XForms clients and Web applications to submit data.

On the visualization and reporting side, charting, plotting, graphing, and geographic visualization clients mayaccess data series suitable for visualization pre-aggregated across time and hierarchy.

2.4. System Concepts & Terminology 15

Page 20: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

Core Structures

The logical architecture as envision has very few structures:

• Entity

An ‘entity’ is anything that users may want to report on. For example: a patient, a clinic, a waterpoint, etc...Entities are typed (e.g. ‘Clinic’, ‘Waterpoint’) and uniquely identified

Entities contain no data beyond UID and TYPE

Entities must be registered in the system before data can be collected on them. Registration is nothing morethan the process of assigning a UID to the entity and does not have to be a distinct user-action—the datastorecan register an entity as part of the process of recording the first submission of data on the entity.

• Data Record

Every time data is submitted to the datastore it is saved as an independent time-stamped data record.

Each data record is associated with a single Entity. The set of data records for a given Entity comprises all thevalues/data known about that Entity.

For example, if a user submits a report that 10 patients were seen in May at Clinic1, and other user submits areport that Clinic1 had stock of 20 bednets in May, the set of information known about Clinic1 is that in May10 patients were seen and 20 bednets are in stock.

• Fields and Values

Each data record contains an arbitrary set of field/value tuples with fields optionally typed from the Data Dic-tionary.

• Data Dictionary Types

These are definitions of types which can be associated with fields in a data record. Defined types maybe containthe following:

– Type name

– Base type (numeric, string, choice, geocode etc...)

– User readable description

– Validation constraints

Questions we want to ask the Data Store

Rather than set out specific technical proposals, or get caught in the argument over what should be done in the DB vs.in application logic, here I try to categorize the different kinds of questions we want to be able to ask the data store.

For the examples, assume the datastore is holding information for a NGO that operates health clinics throughout theUnited State.

Basic Retrieval

Question Retrieve all the Entities of a specific type.

Example Show a list of all health clinics.

Question Retrieve specific entity by a unique id.

Example Show health clinic with ID Clinic001:

16 Chapter 2. Design and Technical Documents

Page 21: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

Question Retrieve specific entity by a semi-unique id. This may return a list if there are multiple matches.

Example Show health clinic with “Free Clinic” in its name.

State Queries

Question Retrieve an Entity (or set of Entities) with a specific set of values.

Example Show a list of all health clinics and include with each clinic:

• Geographic location

• Clinic Directors Name

• Current stock of Cipro (an antibiotic)

Question Return an Entity (or set of Entities) with all the latest values associated with it.

Example Show the latest information for Clinic001. This should include the latest reported value of every field everyreported on this clinic.

Question Retrieve an Entity (or set of Entities) a set of values as of a given date

Example Show all the latest information on Clinic001 as of Jan 15, 2010

Time Aggregated Queries

Question Retrieve an Entity (or set of Entities) with a specific set of values aggregated by a function such as sum()or avg() over a given time range.

Example Show a list of all health clinics and include with each clinic:

• Total number of patients seen in 2011

Question Retrieve an Entity (or set of Entities) with a specific set of values aggregated by a function such as sum()or avg() over a given time range with a given periodicity.

Example Show a list of all health clinics and include with each clinic:

• Average number of patients seen each month for each month in 2011

Selection Queries

Question Retrieve all Entities which have a specific value.

Example Show all health clinics where “Population Served” > 1000

Question Retrieve all Entities which have a specific aggregated value.

Example Show all health clinics where “Total Patients Seen” > 1000

2.4. System Concepts & Terminology 17

Page 22: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

Question Retrieve all Entities which have a specific aggregated value over time.

Example Show all health clinics where “Total Patients Seen in 2011” > 1000

Hierarchy Aggregated Queries

Note: These queries don’t return entities, they return values aggregated by a hierarchy node (e.g. ‘California’ or ‘SanFrancisco’) which suggests that maybe Matt Berg is right and hierarchy nodes maybe should be consider ‘Entities’, or‘Generated Entities’...

Question Retrieve a set of Values aggregated by a given node in a hierarchy.

Example From the set of all clinics in California show:

• Total number of patients seen in 2011 (in California)

• Average number of patients seen in 2011 (in California)

Question Retrieve a set of Values aggregated by a given level in a hierarchy.

Example From each State in the United States show:

• Total number of patients seen in clinics in that state 2011

• Average number of patients seen in clinics in that state in 2011

2.4.4 Data Dictionary Concept and usages

The data dictionary is an stand alone service that hosts data type definitions in order to allow user to share them.

This project is part of the mangrove project.

It provides

• A simple format to define any kind of data by providing a basic type, tags and contraints.

• An unique ID to refence a definition in external services.

• HTTP API to get the definition from external service.

• A python wrapper around the HTTP API than provide contraints checking and type casting.

• A replication system to synchronize several data dictionaries together.

• A versioning system ensuring that updating definitions doesn’t break others references.

Use cases

• You share common types among several systems and keep them up to date and in sync.

• You expect the user to define data himself.

• You store data in a schemaless data base and want to attach a type, a meaning or constraints to it.

18 Chapter 2. Design and Technical Documents

Page 23: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

What it’s not

• An semantic data base. No complexe defintion nor RDF/SPARQL magic in there.

• A user interface to let the user enter data types. You have to provide it.

• A bullet proof solution for anything. This is work in progress and design to solve a very specific problem, not astandard or anything to solve anybody’s data type problems.

The data dictionary in the Mangrove project

We need to store data in a schemaless database.

The data is going to be defined by the user and organisations will want to share their definations.

The data is going to be inputs from field agents about any subjects they are studying so the system cannot know inadvance the data type. For example some NGO would be studying school results in India and other would be studyingschool attentance in Africa. They all would want data defination for school and students. The data dictionary isdesigned to hold this defination so that it is reusable across orgnanisation.

Current implementation

• CouchDB database

• couchdb-python

2.5 API SPEC EXAMPLES

2.5.1 Top Level Format:

Each API response shall have a dictonaty with 4 items. When responding via HTTP the status filed shall match theHTTP response status code.

FIELD TYPE DESCRIPTIONstatus int HTTP status 2xx, 3xx, 4xx, 5xxmessage str A string containing a message about the responsenum_results int The number of results <=0results list A list of results of length num_results. If 0, then empty list

An example of an Error response in JSON.

{status: 401,message: "Not Authorized",num_results: 0,results: ()}

An example of an successful created response in JSON.

{status: 201,message: "Data Record Created",num_results: 1,results: (

’_id’: ’ee7c7583-1afe-4985-a1ea-69fd4764552b’,

2.5. API SPEC EXAMPLES 19

Page 24: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

’field1’: ’foo’,’field2’: ’bar’,)

}

An example of an successful search response in JSON.

{status: 200,message: "Search results successful",num_results: 2,results: (

{’_id’: ’ee7c7583-1afe-4985-a1ea-69fd4764552b’,’field1’: ’foo’,’field2’: ’bar’},

{’_id’: ’4d4eb8de-3955-412c-a078-3e846182380b’,’field’: ’milli’,’field2’: ’vanilli’},

)}

2.6 Data Dictionary Expected API

2.6.1 How it is stored in nosql database.

In the mangrove system, this is termed as data dict storage:

{"_id": ""","primitive type": "int","name": "Malaria pills stock","description": "Description of this drug and the stock itself.","version": "2010-10-10 07:06:45.45646","tags": [

"health","medicine","drug","malaria","pill"

],"constraints": {

"gt": "0","lt": "10"

},}

2.6.2 How it is refereneced in external service.

In Mangrove system, the data is stored in datastore. Each data instance holds a reference to its type.:

20 Chapter 2. Design and Technical Documents

Page 25: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

type {"uuid" : "b4cd35d9f04887da905c051b894568","version" : "2010-10-10 07:06:45.45646"

}

2.6.3 Python wrapper API

For querying the data dict, you can use a Python restful wrapper that provides filtering, type casting and constraintvalidations.

For example:

In settings.py:

DATABASE_NAME = ’datadict’SERVER_ADDRESS = ’http://localhost’SERVER_PORT = ’5984’

Create a data type:

dt = DataType(name="test", contraints={’gt’:4}, tags=[’foo’, ’bar’],type="int", description="Super dupper type")

dt.save()

or:

DataType.create(name="test", contraints={’gt’:4}, tags=[’foo’, ’bar’],type="int", description="Super a type")

Searching for datatype with tags:

DataType.with_tags(’foo’, ’bar’)

Getting datatype:

dt = DataType.get(id, version)

validating data:

try:dt.validate(value)

except dt.ValidationError as e:for error in e.errors:

print error

casting data:

dt.to_python(value)dt.to_json(value)dt.to_xform(value)

2.7 Querying couch by hierarchy and time

2.7.1 The Problem :

Doing aggregation by hierarchy and time in couchdb. For example :

2.7. Querying couch by hierarchy and time 21

Page 26: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

Our aim is to be able to give results for the following types of queries:

1)Total population in Country/State/City wise for all months. 2)Monthly population country/state/city wise. Forexample Total Population in the state of Maharashtra in March.

2.7.2 The Data:

The population would be stored as couchdb document in the following format. (It is simplified for the purpose of theillustration) The basic document structure is as follows:

{"_id": "Entity name","path": [

"India","MH","Pune"],

"population": 20,"month": "feb"

}

MH - Maharashtra is a state Pune is a city

We have written a map-reduce function to aggregate data by multilevel location hierarchy and time. The “path” fieldindicates the location hierarchy tree for the entity. Month is the time value. It will be a proper date - we have takenmonth for the purpose of the spike.

2.7.3 The Map-Reduce:

The map function is as follows:

function(doc){for (i in doc.path){

emit([i,doc.path[i],doc.month], doc.population);}

}

The reduce function is _sum

2.7.4 The Output:

The sample output will be as follows:(when reduced to level 2 in couchdb):

{["2", "Pune", 7] : 150["2", "Pune", 3] : 80["2", "Pune", 2] : 100["1", "TN", 2] : 120["1", "MH", 7] : 150["1", "MH", 2] : 100["0", "India", 7] : 150["0", "India", 2] : 220

}

TN-TamilNadu is a state It gives month-wise aggregates.(The third key is the month 7-July,2-Feb etc. The second keyis the label for the state)

22 Chapter 2. Design and Technical Documents

Page 27: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

At level 1 - it gives totals for all months:

{["2", "Pune"] : 330["1", "TN"] : 320["1", "MH"] : 330["0", "India"] : 650

}

2.8 Setting up the ‘DataWinners’ Web App

• Pre-requisites:

1. Install python 2.7 apt-get install python2.7

2. Install couchdb apt-get install couchdb

3. Install python2.7-dev apt-get install python2.7-dev

4. Install subversion (SVN) apt-get install subversion

5. Install virtualenv apt-get install virtualenv

6. Install python-setuptools apt-get install python-setuptools

• Environment Setup:

1. Create virtual environment virtualenv --no-site-packages --python=python2.7<foldername>

2. Go inside folder <foldername> cd <foldername>

3. Clone git repository git clone https://github.com/mangroveorg/mangrove.git

4. Go to mangrove folder cd mangrove

5. Switch to develop branch git checkout develop

6. Check the status git status

7. Go out of folder <foldername> cd ../..

8. Run requirement.pip file pip install -E <foldername> -r <foldername>/mangrove/requirements.pip

• Execute Environment:

1. Activate virtual environment source <foldername>/bin/activate

2. Run server python <foldername>/mangrove/src/web/manage.py runserver

• Access URLs:

1. Website URL: http://localhost:8000/login

2. Couchdb URL: http://localhost:5984/_utils

2.9 Setting Up the POSTGIST and importing location data

In order to get postgis configured and import location data - this is what you need to do:

• Install Spatial Database PostgreSQL 8.4 (with PostGIS 1.5),

• Install Geospatial Libraries¶

2.8. Setting up the ‘DataWinners’ Web App 23

Page 28: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

Mangrove Documentation, Release 0.1

• Create Spatial Database Template for PostGIS

• Create spatial database using the template

• Import the shape files in the spatial database

2.9.1 Details

• Install Spatial Database PostgreSQL (with PostGIS) and Geospatial Libraries

$ sudo apt-get install postgresql$ sudo apt-get install binutils gdal-bin postgresql-8.4-postgis \postgresql-server-dev-8.4 python-psycopg2 python-setuptools

for details visit the url https://help.ubuntu.com/community/PostgreSQL

• Create Spatial Database Template for PostGIS:

follow the instructions from the url: https://docs.djangoproject.com/en/dev/ref/contrib/gis/install/#spatialdb-templateuse Debian/Ubuntu create_template_postgis-debian.sh

• Create spatial database using the template:

follow the instructions from the url: https://docs.djangoproject.com/en/dev/ref/contrib/gis/tutorial/#setting-up$ createdb -T template_postgis geodjango

• Import the shape files in the spatial database:

Clone the [email protected]:mangroveorg/shape_files.git to a folder which is at the same level as the mangrove repository.E.g. /home/user/code/mangrove and /home/user/code/shape_filesrun python manage.py loadshapes

24 Chapter 2. Design and Technical Documents

Page 29: Mangrove Documentation · Mangrove Documentation, Release 0.1 1.4Setting Up the development environment In order to get your machine setup to start contributing to mangrove - this

CHAPTER

THREE

INDICES AND TABLES

• genindex

• modindex

• search

25