update readme

This commit is contained in:
2018-09-28 16:03:21 +08:00
parent a836786846
commit 3f49544b1c

View File

@ -1,25 +1,17 @@
# DataExtracter Help # DataExtracter Help
---------------------------- ----------------------------
DataExtracter helps you quickly extract data from any web pages. All you need to do is: DataExtracter helps you quickly extract data from any web pages.
All you need to do is:
- Find out the selectors (JQuery selectors) for target data - Find out the selectors (JQuery selectors) for target data
- Call Extractor methods in `extension backgroud page console`, as introduced bellow. - Type scripts in the console of `extension backgroud page`, as introduced bellow.
Where is the extension backgroud page console?
Goto <chrome://extensions/> and click `backgroud page` link of the extension
![](images/extnsion.png)
In the opening window, find `Console`, and type your scripts.
![](images/console.png) ![](images/console.png)
## Qucik Start ## Qucik Start
Extract current page Extract current page
```js ```js
new Extractor().task(".list-item", ["a.title", "p.content"]).start(); new Extractor().task(".list-item", ["a.title", "p.content"]).start();
@ -43,7 +35,7 @@ Extract specified pages (1,3,5)
new Extractor().task(".list-item", ["a.title", "p.content"], "http://sample.com/?pn=${page}", [1, 3, 5]).start(); new Extractor().task(".list-item", ["a.title", "p.content"], "http://sample.com/?pn=${page}", [1, 3, 5]).start();
``` ```
## Extractor.task() Signitures: ## Extractor.task() Signitures
```ts ```ts
// a task extracting data from current page // a task extracting data from current page
@ -58,13 +50,17 @@ task(itemsSelector:string, fieldSelectors:string[], urls:string[])
task(itemsSelector:string, fieldSelectors:string[], urls:ExractResult) task(itemsSelector:string, fieldSelectors:string[], urls:ExractResult)
``` ```
## Advanced Usage: ## Advanced Usage
### Stop tasks ### Stop Tasks
The only way to stop tasks before its finish, is `Closing the Tab` which runs tasks. Tasks wait for their target elements' appearance, given some elements were loaded asynchronously.
### Extract attributes. But if you typed wrong selectors, the task waits forever for elements which don't exists.
The only way to stop tasks before its finish, is `Closing the host tab`.
### Extract Attributes.
e.g.: link text and target (use 'selector@attribute') e.g.: link text and target (use 'selector@attribute')
@ -72,9 +68,9 @@ e.g.: link text and target (use 'selector@attribute')
new Extractor().task('.list-item', ['a.title', 'a.title@href']).start(); new Extractor().task('.list-item', ['a.title', 'a.title@href']).start();
``` ```
### Use task chain. ### Use Task Chain.
e.g.: Collect links from `http://sample.com/abc` & Extract data of each link e.g.: Collect links from `http://sample.com/abc`, then, Extract data of each link
```js ```js
new Extractor() new Extractor()
@ -83,7 +79,7 @@ new Extractor()
.start(); .start();
``` ```
### Save result of any task ### Save Result of Any Task
To a multiple task (chain) Extractor `e`: To a multiple task (chain) Extractor `e`:
@ -96,21 +92,21 @@ e.task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"])
User will be asked to save the final result when it finishes. User will be asked to save the final result when it finishes.
You may want to save another task's result, other than the final:
```js
// save the result of first task
// that is, a list of urls
e.save(1)
```
Incase you want to save it again, use: Incase you want to save it again, use:
```js ```js
e.save() e.save()
``` ```
### Restart tasks You may want to save another task's result, other than the final:
```js
// save the result of first task
// to the example above, that is a list of urls
e.save(1)
```
### Restart Tasks
In cases some later task fails, you don't need to restart all task. In cases some later task fails, you don't need to restart all task.