Think php 10, parsing with PHP
-
Upload
pavel-polyakov -
Category
Internet
-
view
994 -
download
1
description
Transcript of Think php 10, parsing with PHP
![Page 2: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/2.jpg)
Что такое парсинг? определение
![Page 3: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/3.jpg)
Хочу ThinkPHP новости на нашем сайте.
проблема
![Page 4: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/4.jpg)
анализ
![Page 5: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/5.jpg)
решение
![Page 6: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/6.jpg)
EXPLODE 1.
![Page 7: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/7.jpg)
…
получаем контент, первоначально делим
![Page 8: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/8.jpg)
EXPLODE
… парсим каждый
заголовок
![Page 9: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/9.jpg)
EXPLODE 2.
+
![Page 10: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/10.jpg)
EXPLODE
…
получаем контент, находим заголовки
![Page 11: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/11.jpg)
EXPLODE …
парсим каждый заголовок
![Page 12: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/12.jpg)
EXPLODE 3.
+
![Page 13: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/13.jpg)
EXPLODE
…
получаем контент
![Page 14: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/14.jpg)
EXPLODE …
…
делаем валидный xml с помощью Tidy
![Page 15: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/15.jpg)
EXPLODE
…
…
ищем заголовки с помощью simplexml
![Page 16: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/16.jpg)
EXPLODE …
парсим заголовки с помощью xml
![Page 17: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/17.jpg)
EXPLODE 4.
+
![Page 18: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/18.jpg)
подключаем phpQuery с помощью composer hfps://code.google.com/p/phpquery/
![Page 19: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/19.jpg)
все оформили в класс
![Page 20: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/20.jpg)
EXPLODE
получаем контент, чиним html
![Page 21: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/21.jpg)
EXPLODE …
используем класс, парсим заголовки
![Page 22: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/22.jpg)
EXPLODE 5.
+
( koloobablo.com )
![Page 23: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/23.jpg)
Поиск в ЕГР (единый государственный реестр юрилических лиц
и физических лиц-‐предпринимателей)
hfp://search.irc.gov.ua/
![Page 24: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/24.jpg)
EXPLODE
пример поиска
![Page 25: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/25.jpg)
EXPLODE
пример поиска
![Page 26: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/26.jpg)
EXPLODE
результат поиска
![Page 27: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/27.jpg)
…
открываем страницу, подключаем jQuery, заполняем форму
![Page 28: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/28.jpg)
EXPLODE
…
… ждем пока не появится капча
![Page 29: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/29.jpg)
EXPLODE …
…
находим капчу на странице
![Page 30: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/30.jpg)
EXPLODE
…
… вырезаем капчу ,
решаем ее, сабмитим форму
![Page 31: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/31.jpg)
EXPLODE
…
… ждем результат, обрабатываем
96
119
![Page 32: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/32.jpg)
EXPLODE
… парсим результат, выводим json
119
143
![Page 33: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/33.jpg)
EXPLODE
используем вместе с php
![Page 34: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/34.jpg)
EXPLODE
результат
![Page 35: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/35.jpg)
EXPLODE
bonus track
hfp://casperjs.org/
PhantomJS, как он
должен быть
![Page 36: Think php 10, parsing with PHP](https://reader031.fdocuments.net/reader031/viewer/2022013115/559434101a28ab961a8b4656/html5/thumbnails/36.jpg)
Спасибо! hfps://github.com/PavelPolyakov/parsing-‐with-‐php
Web: hfp://pavelpolyakov.com E-‐mail: [email protected]
весь код тут
Skype: pavel.polyakov.x1