update documents
This commit is contained in:
BIN
images/console.png
Normal file
BIN
images/console.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 36 KiB |
BIN
images/extnsion.png
Normal file
BIN
images/extnsion.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 24 KiB |
@ -21,12 +21,11 @@
|
||||
<div class="alert alert-info small">
|
||||
<!-- <h6>Usage:</h6> -->
|
||||
<p>
|
||||
<b>Open console</b> and
|
||||
<b>switch to Data Extracter</b>, then call the
|
||||
<b>extract</b> function.
|
||||
<b>Open console of extesion backgroud page </b> and
|
||||
type your scripts.
|
||||
</p>
|
||||
<p>
|
||||
<img src="demo.png" alt="" style="max-width: 489px; width: 100%; border-radius: 5px">
|
||||
<img src="../images/console.png" alt="" style="max-width: 489px; width: 100%; border-radius: 5px">
|
||||
</p>
|
||||
|
||||
</div>
|
||||
@ -41,19 +40,21 @@
|
||||
<div class="row">
|
||||
<div class="col">
|
||||
<div class="alert alert-success small">
|
||||
<p>
|
||||
<b>View Help</b>:
|
||||
<br>extract()
|
||||
</p>
|
||||
<p>
|
||||
<b>Extract current page</b>:
|
||||
<br>extract("list-item", ["a.title", "p.content"])
|
||||
<br>new Extractor().task(".list-item", ["a.title", "p.content"]).start();
|
||||
</p>
|
||||
<p>
|
||||
<b>Extract multiple pages (1-10, interval 1)</b>:
|
||||
<br>extract("list-item", ["a.title", "p.content"], "http://sample.com/?pn=${page}", 1, 10, 1)
|
||||
<br>new Extractor().task(".list-item", ["a.title", "p.content"],
|
||||
"http://sample.com/?pn=${page}", 1, 10, 1).start();
|
||||
|
||||
</p>
|
||||
<p>
|
||||
<b>Full document (Right click - Open in new tab):</b>
|
||||
<br>
|
||||
<a href="https://git.jebbs.co/jebbs/data-extracter-extesion">https://git.jebbs.co/jebbs/data-extracter-extesion</a>
|
||||
</p>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
138
readme.md
Normal file
138
readme.md
Normal file
@ -0,0 +1,138 @@
|
||||
# DataExtracter Help
|
||||
----------------------------
|
||||
|
||||
DataExtracter helps you quickly extract data from any web pages. All you need to do is:
|
||||
|
||||
- Find out the selectors (JQuery selectors) for target data
|
||||
- Call Extractor methods in `extension backgroud page console`, as introduced bellow.
|
||||
|
||||
Where is the extension backgroud page console?
|
||||
|
||||
Goto <chrome://extensions/> and click `backgroud page` link of the extension
|
||||
|
||||

|
||||
|
||||
In the opening window, find `Console`, and type your scripts.
|
||||
|
||||

|
||||
|
||||
## Qucik Start
|
||||
|
||||
|
||||
|
||||
Extract current page
|
||||
```js
|
||||
new Extractor().task(".list-item", ["a.title", "p.content"]).start();
|
||||
```
|
||||
|
||||
Extract multiple pages (1-10, interval 1)
|
||||
|
||||
```js
|
||||
new Extractor().task(".list-item", ["a.title", "p.content"],"http://sample.com/?pn=${page}", 1, 10, 1).start();
|
||||
```
|
||||
|
||||
Extract multiple urls (list)
|
||||
|
||||
```js
|
||||
new Extractor().task(".list-item", ["a.title", "p.content"],["http://sample.com/abc","http://sample.com/xyz"]).start();
|
||||
```
|
||||
|
||||
Extract specified pages (1,3,5)
|
||||
|
||||
```js
|
||||
new Extractor().task(".list-item", ["a.title", "p.content"], "http://sample.com/?pn=${page}", [1, 3, 5]).start();
|
||||
```
|
||||
|
||||
## Extractor.task() Signitures:
|
||||
|
||||
```ts
|
||||
// a task extracting data from current page
|
||||
task(itemsSelector:string, fieldSelectors:string[])
|
||||
// a task extracting data from a range of pages
|
||||
task(itemsSelector:string, fieldSelectors:string[], urlTemplate:string, from:number, to:number, interval:number)
|
||||
// a task extracting data from a list of pages
|
||||
task(itemsSelector:string, fieldSelectors:string, urlTemplate:string, pages:number[])
|
||||
// a task extracting data from a list of pages
|
||||
task(itemsSelector:string, fieldSelectors:string[], urls:string[])
|
||||
// a task extracting data of urls which extracted from last task result
|
||||
task(itemsSelector:string, fieldSelectors:string[], urls:ExractResult)
|
||||
```
|
||||
|
||||
## Advanced Usage:
|
||||
|
||||
### Stop tasks
|
||||
|
||||
The only way to stop tasks before its finish, is `Closing the Tab` which runs tasks.
|
||||
|
||||
### Extract attributes.
|
||||
|
||||
e.g.: link text and target (use 'selector@attribute')
|
||||
|
||||
```js
|
||||
new Extractor().task('.list-item', ['a.title', 'a.title@href']).start();
|
||||
```
|
||||
|
||||
### Use task chain.
|
||||
|
||||
e.g.: Collect links from `http://sample.com/abc` & Extract data of each link
|
||||
|
||||
```js
|
||||
new Extractor()
|
||||
.task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"])
|
||||
.task('list-item', ["a.title", "p.content"])
|
||||
.start();
|
||||
```
|
||||
|
||||
### Save result of any task
|
||||
|
||||
To a multiple task (chain) Extractor `e`:
|
||||
|
||||
```js
|
||||
e = new Extractor()
|
||||
e.task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"])
|
||||
.task('list-item', ["a.title", "p.content"])
|
||||
.start();
|
||||
```
|
||||
|
||||
User will be asked to save the final result when it finishes.
|
||||
|
||||
You may want to save another task's result, other than the final:
|
||||
|
||||
```js
|
||||
// save the result of first task
|
||||
// that is, a list of urls
|
||||
e.save(1)
|
||||
```
|
||||
|
||||
Incase you want to save it again, use:
|
||||
|
||||
```js
|
||||
e.save()
|
||||
```
|
||||
|
||||
### Restart tasks
|
||||
|
||||
In cases some later task fails, you don't need to restart all task.
|
||||
|
||||
Here we have 2 tasks:
|
||||
|
||||
```js
|
||||
e = new Extractor()
|
||||
e.task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"])
|
||||
.task('list-item', ["a.title", "p.content"])
|
||||
.start();
|
||||
```
|
||||
|
||||
Suppose the second task fails, we can restart and continue from the task 2:
|
||||
|
||||
```js
|
||||
e.restart(2);
|
||||
```
|
||||
|
||||
If you'd like restart all task, use:
|
||||
|
||||
```js
|
||||
e.start();
|
||||
// or
|
||||
e.restart();
|
||||
```
|
||||
Reference in New Issue
Block a user