Pandas Mongo

29
Pandas Mongo Release 0.1.0 May 05, 2020

Transcript of Pandas Mongo

Page 1: Pandas Mongo

Pandas MongoRelease 0.1.0

May 05, 2020

Page 2: Pandas Mongo
Page 3: Pandas Mongo

Contents

1 Overview 1

2 Quick Start 3

3 Installation 53.1 Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2 Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4 Installation 7

5 Quick Start 9

6 Reading dataframes from MongoDB using aggregation 11

7 Reference 137.1 pdmongo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

8 Contributing 158.1 Bug reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158.2 Documentation improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158.3 Feature requests and feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158.4 Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

9 Authors 17

10 Changelog 1910.1 0.1.0 (2020-05-05) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1910.2 0.0.2 (2020-05-04) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1910.3 0.0.1 (2020-04-30) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1910.4 0.0.0 (2020-03-22) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

11 Indices and tables 21

Python Module Index 23

Index 25

i

Page 4: Pandas Mongo

ii

Page 6: Pandas Mongo

Pandas Mongo, Release 0.1.0

2 Chapter 1. Overview

Page 7: Pandas Mongo

CHAPTER 2

Quick Start

Writing a pandas DataFrame to a MongoDB collection:

import pdmongo as pdmimport pandas as pd

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})df = pdm.read_mongo("MyCollection", [], "mongodb://localhost:27017/mydb")df.to_mongo(df, collection, uri)

Reading a MongoDB collection into a pandas DataFrame:

import pdmongo as pdmdf = pdm.read_mongo("MyCollection", [], "mongodb://localhost:27017/mydb")print(df)

3

Page 8: Pandas Mongo

Pandas Mongo, Release 0.1.0

4 Chapter 2. Quick Start

Page 9: Pandas Mongo

CHAPTER 3

Installation

pip install pdmongo

You can also install the in-development version with:

pip install https://github.com/pakallis/python-pandas-mongo/archive/master.zip

3.1 Documentation

https://python-pandas-mongo.readthedocs.io/

3.2 Development

To run the all tests run:

tox

Note, to combine the coverage data from all the tox environments run:

Windowsset PYTEST_ADDOPTS=--cov-appendtox

OtherPYTEST_ADDOPTS=--cov-append tox

5

Page 10: Pandas Mongo

Pandas Mongo, Release 0.1.0

6 Chapter 3. Installation

Page 11: Pandas Mongo

CHAPTER 4

Installation

At the command line:

pip install pdmongo

7

Page 12: Pandas Mongo

Pandas Mongo, Release 0.1.0

8 Chapter 4. Installation

Page 13: Pandas Mongo

CHAPTER 5

Quick Start

Writing a pandas DataFrame to a MongoDB collection:

import pdmongo as pdmimport pandas as pd

df = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})df = pdm.read_mongo("MyCollection", [], "mongodb://localhost:27017/mydb")df.to_mongo(df, collection, uri)

Reading a MongoDB collection into a pandas DataFrame:

import pdmongo as pdmdf = pdm.read_mongo("MyCollection", [], "mongodb://localhost:27017/mydb")print(df)

9

Page 14: Pandas Mongo

Pandas Mongo, Release 0.1.0

10 Chapter 5. Quick Start

Page 15: Pandas Mongo

CHAPTER 6

Reading dataframes from MongoDB using aggregation

You can use an aggregation query to filter/transform data in MongoDB before fetching them into a data frame.

Reading a collection from MongoDB into a pandas DataFrame by using an aggregation query:

import pdmongo as pdmquery = [

{"$match": {

'A': 1}

}]df = pdm.read_mongo("MyCollection", query, "mongodb://localhost:27017/mydb")print(df)

The query accepts the same arguments as method aggregate of pymongo package.

11

Page 16: Pandas Mongo

Pandas Mongo, Release 0.1.0

12 Chapter 6. Reading dataframes from MongoDB using aggregation

Page 17: Pandas Mongo

CHAPTER 7

Reference

7.1 pdmongo

pdmongo.read_mongo(collection: str, query: List[Dict[str, Any]], db: Union[str, py-mongo.database.Database], index_col: Union[str, List[str], None] = None,extra: Optional[Dict[str, Any]] = None, chunksize: Optional[int] = None) →pandas.core.frame.DataFrame

Read MongoDB query into a DataFrame.

Returns a DataFrame corresponding to the result set of the query. Optionally provide an index_col parameter touse one of the columns as the index, otherwise default integer index will be used.

Parameters

• collection (str) – Mongo collection to select for querying

• query (list) – Must be an aggregate query. The input will be passed to pymongo .aggregate

• db (pymongo.database.Database or database string URI) – The database to use

• index_col (str or list of str, optional, default: None) – Column(s) to set as in-dex(MultiIndex).

• extra (dict, optional, default: None) – List of parameters to pass to aggregate method.

• chunksize (int, default None) – If specified, return an iterator where chunksize is the numberof docs to include in each chunk.

Returns Dataframe

pdmongo.to_mongo(frame: pandas.core.frame.DataFrame, name: str, db: Union[str, py-mongo.database.Database], if_exists: Optional[str] = ’fail’, index: Optional[bool]= True, index_label: Union[str, Sequence[str], None] = None, chunksize:Optional[int] = None) → Union[List[pymongo.results.InsertManyResult], py-mongo.results.InsertManyResult]

Write records stored in a DataFrame to a MongoDB collection.

Parameters

13

Page 18: Pandas Mongo

Pandas Mongo, Release 0.1.0

• frame (DataFrame, Series)

• name (str) – Name of collection.

• db (pymongo.database.Database or database string URI) – The database to write to

• if_exists ({‘fail’, ‘replace’, ‘append’}, default ‘fail’) –

– fail: If table exists, do nothing.

– replace: If table exists, drop it, recreate it, and insert data.

– append: If table exists, insert data. Create if does not exist.

• index (boolean, default True) – Write DataFrame index as a column.

• index_label (str or sequence, optional) – Column label for index column(s). If None isgiven (default) and index is True, then the index names are used. A sequence should begiven if the DataFrame uses MultiIndex.

• chunksize (int, optional) – Specify the number of rows in each batch to be written at a time.By default, all rows will be written at once.

14 Chapter 7. Reference

Page 19: Pandas Mongo

CHAPTER 8

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

8.1 Bug reports

When reporting a bug please include:

• Your operating system name and version.

• Any details about your local setup that might be helpful in troubleshooting.

• Detailed steps to reproduce the bug.

8.2 Documentation improvements

Pandas Mongo could always use more documentation, whether as part of the official Pandas Mongo docs, in docstrings,or even on the web in blog posts, articles, and such.

8.3 Feature requests and feedback

The best way to send feedback is to file an issue at https://github.com/pakallis/python-pandas-mongo/issues.

If you are proposing a feature:

• Explain in detail how it would work.

• Keep the scope as narrow as possible, to make it easier to implement.

• Remember that this is a volunteer-driven project, and that code contributions are welcome :)

15

Page 20: Pandas Mongo

Pandas Mongo, Release 0.1.0

8.4 Development

To set up python-pandas-mongo for local development:

1. Fork python-pandas-mongo (look for the “Fork” button).

2. Clone your fork locally:

git clone [email protected]:pakallis/python-pandas-mongo.git

3. Create a branch for local development:

git checkout -b name-of-your-bugfix-or-feature

Now you can make your changes locally.

4. When you’re done making changes run all the checks and docs builder with tox one command:

tox

5. Commit your changes and push your branch to GitHub:

git add .git commit -m "Your detailed description of your changes."git push origin name-of-your-bugfix-or-feature

6. Submit a pull request through the GitHub website.

8.4.1 Pull Request Guidelines

If you need some code review or feedback while you’re developing the code just make the pull request.

For merging, you should:

1. Include passing tests (run tox)1.

2. Update documentation when there’s new API, functionality etc.

3. Add a note to CHANGELOG.rst about the changes.

4. Add yourself to AUTHORS.rst.

8.4.2 Tips

To run a subset of tests:

tox -e envname -- pytest -k test_myfeature

To run all the test environments in parallel (you need to pip install detox):

detox

1 If you don’t have all the necessary python versions available locally you can rely on Travis - it will run the tests for each change you add in thepull request.

It will be slower though . . .

16 Chapter 8. Contributing

Page 21: Pandas Mongo

CHAPTER 9

Authors

• Pavlos Kallis - https://pakallis.github.com

17

Page 22: Pandas Mongo

Pandas Mongo, Release 0.1.0

18 Chapter 9. Authors

Page 23: Pandas Mongo

CHAPTER 10

Changelog

10.1 0.1.0 (2020-05-05)

• Added static typing

• Added mypy to travis CI

• Removed unecessary params

10.2 0.0.2 (2020-05-04)

• Dropped support for pypy3

10.3 0.0.1 (2020-04-30)

• Added read_mongo and basic support for reading MongoDB collections into pandas dataframes

• Added to_mongo and basic support for writing pandas dataframes in MongoDB collections

10.4 0.0.0 (2020-03-22)

• First release on PyPI.

19

Page 24: Pandas Mongo

Pandas Mongo, Release 0.1.0

20 Chapter 10. Changelog

Page 25: Pandas Mongo

CHAPTER 11

Indices and tables

• genindex

• modindex

• search

21

Page 26: Pandas Mongo

Pandas Mongo, Release 0.1.0

22 Chapter 11. Indices and tables

Page 27: Pandas Mongo

Python Module Index

ppdmongo, 13

23

Page 28: Pandas Mongo

Pandas Mongo, Release 0.1.0

24 Python Module Index

Page 29: Pandas Mongo

Index

Ppdmongo (module), 13

Rread_mongo() (in module pdmongo), 13

Tto_mongo() (in module pdmongo), 13

25