Files
data-extracter-extesion/readme.md
2018-09-28 14:14:59 +08:00

138 lines
3.3 KiB
Markdown

# DataExtracter Help
----------------------------
DataExtracter helps you quickly extract data from any web pages. All you need to do is:
- Find out the selectors (JQuery selectors) for target data
- Call Extractor methods in `extension backgroud page console`, as introduced bellow.
Where is the extension backgroud page console?
Goto <chrome://extensions/> and click `backgroud page` link of the extension
![](images/extnsion.png)
In the opening window, find `Console`, and type your scripts.
![](images/console.png)
## Qucik Start
Extract current page
```js
new Extractor().task(".list-item", ["a.title", "p.content"]).start();
```
Extract multiple pages (1-10, interval 1)
```js
new Extractor().task(".list-item", ["a.title", "p.content"],"http://sample.com/?pn=${page}", 1, 10, 1).start();
```
Extract multiple urls (list)
```js
new Extractor().task(".list-item", ["a.title", "p.content"],["http://sample.com/abc","http://sample.com/xyz"]).start();
```
Extract specified pages (1,3,5)
```js
new Extractor().task(".list-item", ["a.title", "p.content"], "http://sample.com/?pn=${page}", [1, 3, 5]).start();
```
## Extractor.task() Signitures:
```ts
// a task extracting data from current page
task(itemsSelector:string, fieldSelectors:string[])
// a task extracting data from a range of pages
task(itemsSelector:string, fieldSelectors:string[], urlTemplate:string, from:number, to:number, interval:number)
// a task extracting data from a list of pages
task(itemsSelector:string, fieldSelectors:string, urlTemplate:string, pages:number[])
// a task extracting data from a list of pages
task(itemsSelector:string, fieldSelectors:string[], urls:string[])
// a task extracting data of urls which extracted from last task result
task(itemsSelector:string, fieldSelectors:string[], urls:ExractResult)
```
## Advanced Usage:
### Stop tasks
The only way to stop tasks before its finish, is `Closing the Tab` which runs tasks.
### Extract attributes.
e.g.: link text and target (use 'selector@attribute')
```js
new Extractor().task('.list-item', ['a.title', 'a.title@href']).start();
```
### Use task chain.
e.g.: Collect links from `http://sample.com/abc` & Extract data of each link
```js
new Extractor()
.task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"])
.task('list-item', ["a.title", "p.content"])
.start();
```
### Save result of any task
To a multiple task (chain) Extractor `e`:
```js
e = new Extractor()
e.task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"])
.task('list-item', ["a.title", "p.content"])
.start();
```
User will be asked to save the final result when it finishes.
You may want to save another task's result, other than the final:
```js
// save the result of first task
// that is, a list of urls
e.save(1)
```
Incase you want to save it again, use:
```js
e.save()
```
### Restart tasks
In cases some later task fails, you don't need to restart all task.
Here we have 2 tasks:
```js
e = new Extractor()
e.task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"])
.task('list-item', ["a.title", "p.content"])
.start();
```
Suppose the second task fails, we can restart and continue from the task 2:
```js
e.restart(2);
```
If you'd like restart all task, use:
```js
e.start();
// or
e.restart();
```