Download - Databases Portfolio - ssa.group · products categories, manufacturers and sellers, price changes during any period, comparison of prices in different companies, analysis of price

Transcript

Databases Portfolio

Project

DescriptionOur Universal Web Crawler carries out the Internet surfing of more than 40 web portals of different trade companies and inputs data about their

products into the database. Information for each product contains its code, model, description, all technical features listed on the site, price (if it

is showed on the site), measurement unit for bought items count, etc. The ASP.NET technology is used for interface building. The program

language is C# (.NET Framework 3.5). The interface allows setting all parameters of crawling, scheduling and running. It contains the information

about the crawler’s last run. To monitor the process the detail logging is available. You can determine where the crawler’s work was incorrect and

rectify the situation. Our database contains information about several millions of products and this number increases every day. You can look at

the summary data for any category, manufacturer and period as well as the detailed information about goods. With the purpose of data saving

and processing SQL Server 2008 R2 Enterprise is used. There are two databases. The first one is used as an Online Transaction Processing

service and the second is used as a Data Warehouse. The data transfer to the Data Warehouse is implemented by transactional replication.

The SQL Server Agent jobs are extensively used for different purposes such as database maintenance, summary tables filling, notification about

the current state of different processes, etc. The CLR-stored procedures and functions are used for tasks which cannot be realized in

Transact-SQL, for example, using regular expressions, downloading data from outer sources (the Internet or local networks), etc. The SQL Server

Reporting Services are applied to generate a number of reports about both the current state of the system and the macro activities of different

products categories, manufacturers and sellers, price changes during any period, comparison of prices in different companies, analysis of price

index dynamics, etc. The multidimensional structures and data mining models in SQL Server Analysis Service are used to get the main trends in

pricing for different products categories, manufacturers and sellers. The SQL Server Integration Services packages are used very much for

database maintenance and other tasks such as data uploading onto the FTP-server, etc.

Universal web crawling system

Page 2

Confidential InformationSSA LTD Databases Portfolio

www.ssa-outsourcing.com

Web sites, portals

21 3 ...

...

N

Crawling Core Instance 1

XPATH, XML,Regular Expressions,

Proxy Servers, Anonimizers ect.

Logs viewer Database

ReportingServices

Admin panel Logs viewerAdmin panel

Crawling Core Instance M

XPATH, XML,Regular Expressions,

Proxy Servers, Anonimizers ect.

Tools / TechnologiesThe ASP.NET technology is used for interface building.

The program language is C# (.NET Framework 3.5)

With the purpose of data saving and processing SQL Server 2008 R2 Enterprise is used.

Techniques and ApproachesTransformation of the incorrect HTML marker of the document into a well-structured XML document using SgmlReader

Organizing the crawl of all website pages or the given part to retrieve the necessary data

If needed, realizing the input of data into the required fields (e-mail, ZIP-code, login/password, etc.)

Management of the Internet information resources scanning

Automatic process initialization for the execution of the crawlers work

Tuning configuration and management of crawlers work through the user’s interface in addition to providing different types of reports based on

the results of crawlers work

Selection of the necessary information from a given Internet information resource

Handling HTML frames

Handling complex AJAX constructions Handling custom javascript constructions

Handling any exception pages such as 404 – website not found, “Site under reconstruction”, etc.

Escaping black lists

Using anonymizers

Using custom proxy servers

Using Regular expressions and XPath for extraction of textual and graphical information.

DurationMore than 5 years (current)

Team SizePM, 1 .Net developer, 1 Database developer

Cooperation ModelDedicated Team

Customer’s Feedback

Screenshots

“Excellent provider. Very happy with their service, professionalism, and support. Highly recommended.”

Nathan Krol, Stanley, USA

Page 3

Confidential InformationSSA LTD Databases Portfolio

www.ssa-outsourcing.com

Project

DescriptionThe system allows organizing a fully automated cycle of placing bets on any sport on the betfair.com site. In this case the system functionality

description is given based on horse racing.

HRBS carries out all the necessary work related to search and fetching data on all races in Great Britain and Ireland for the nearest few days. It

is made possible due to the use of so-called crawlers or search robots.

The system fetches both race cards and statistical information about horses, trainers and jockeys. Taken into account are also the pedigree and

current real coefficients of every runner, going conditions and courses.

HRBS includes a parser, which enables the system to prepare all statistical info for further decision-making by the predictor.

The predictor evaluates a few tens of parameters for every runner and builds an internal rating for each of them, which allows generating

predictions for both winners and runners that are most likely to lose.

Once the predictions are generated, Free Betfair API can be used for automatic bet placing on betfair.com. There are a great number of various

strategies for generating predictions and bet making. Each of these strategies is realized within a so-called Betfair bot. To ensure the stable work

of bots and monitor the process of bet making as well as to warn about some faults and errors the Bots controller is used.

The bots are fairly configurable. For each of them the user can set stop loss and stop won, limits for the number of runners, types of races,

distance, going conditions, etc.

Horse racing betting system

Page 4

Confidential InformationSSA LTD Databases Portfolio

www.ssa-outsourcing.com

Web Data Sources

Betfair.com

Database

Data analyser

Results predictor

Webmonitor

Crawling controle panel

Horse pedigreecrawler

Racing cardscrawler

Statisicscrawler

Resultscrawler / RRS

reader

Betfair API

Betfair bots / robots

21 3 ... N Botscontroller

The system also contains a component allowing to check the races results in real-time mode and return the stakes results – won or loss.

The work of the whole system is logged and can be controlled in real-time mode with the help of the Web monitor, which in addition to providing

the possibility to collect statistical info about the performance of each bot also allows calculating the profit on each of them. HRBS includes a web

service for connection with other applications like Secret Horse, Horse Reminder, etc.

Page 5

Confidential InformationSSA LTD Databases Portfolio

www.ssa-outsourcing.com

Tools / TechnologiesC++, C#.Net, ASP.NET, MS SQL Server 2008, DotNetNuke

Duration2 years

Team SizePM, database developer, ASP.Net developer, QA engineer

Cooperation ModelTime and Materials

Customer’s Feedback

Screenshots

"We have been very pleased with SSA. They listen to our requirements and provide solutions that meet our needs. We have been very impressed

with their technical knowledge, attention to detail and ability to deliver on schedule."

Company name under NDA