diff --git a/images/console.png b/images/console.png new file mode 100644 index 0000000..a00eed5 Binary files /dev/null and b/images/console.png differ diff --git a/images/extnsion.png b/images/extnsion.png new file mode 100644 index 0000000..1c55dac Binary files /dev/null and b/images/extnsion.png differ diff --git a/popup/tip.html b/popup/tip.html index 941f391..8db15e7 100644 --- a/popup/tip.html +++ b/popup/tip.html @@ -21,12 +21,11 @@

- Open console and - switch to Data Extracter, then call the - extract function. + Open console of extesion backgroud page and + type your scripts.

- +

@@ -41,19 +40,21 @@
-

- View Help: -
extract() -

Extract current page: -
extract("list-item", ["a.title", "p.content"]) +
new Extractor().task(".list-item", ["a.title", "p.content"]).start();

Extract multiple pages (1-10, interval 1): -
extract("list-item", ["a.title", "p.content"], "http://sample.com/?pn=${page}", 1, 10, 1) +
new Extractor().task(".list-item", ["a.title", "p.content"], + "http://sample.com/?pn=${page}", 1, 10, 1).start();

+

+ Full document (Right click - Open in new tab): +
+ https://git.jebbs.co/jebbs/data-extracter-extesion +

diff --git a/readme.md b/readme.md new file mode 100644 index 0000000..19a4f50 --- /dev/null +++ b/readme.md @@ -0,0 +1,138 @@ +# DataExtracter Help +---------------------------- + +DataExtracter helps you quickly extract data from any web pages. All you need to do is: + +- Find out the selectors (JQuery selectors) for target data +- Call Extractor methods in `extension backgroud page console`, as introduced bellow. + +Where is the extension backgroud page console? + +Goto and click `backgroud page` link of the extension + + ![](images/extnsion.png) + +In the opening window, find `Console`, and type your scripts. + + ![](images/console.png) + +## Qucik Start + + + +Extract current page +```js +new Extractor().task(".list-item", ["a.title", "p.content"]).start(); +``` + +Extract multiple pages (1-10, interval 1) + +```js +new Extractor().task(".list-item", ["a.title", "p.content"],"http://sample.com/?pn=${page}", 1, 10, 1).start(); +``` + +Extract multiple urls (list) + +```js +new Extractor().task(".list-item", ["a.title", "p.content"],["http://sample.com/abc","http://sample.com/xyz"]).start(); +``` + +Extract specified pages (1,3,5) + +```js +new Extractor().task(".list-item", ["a.title", "p.content"], "http://sample.com/?pn=${page}", [1, 3, 5]).start(); +``` + +## Extractor.task() Signitures: + +```ts +// a task extracting data from current page +task(itemsSelector:string, fieldSelectors:string[]) +// a task extracting data from a range of pages +task(itemsSelector:string, fieldSelectors:string[], urlTemplate:string, from:number, to:number, interval:number) +// a task extracting data from a list of pages +task(itemsSelector:string, fieldSelectors:string, urlTemplate:string, pages:number[]) +// a task extracting data from a list of pages +task(itemsSelector:string, fieldSelectors:string[], urls:string[]) +// a task extracting data of urls which extracted from last task result +task(itemsSelector:string, fieldSelectors:string[], urls:ExractResult) +``` + +## Advanced Usage: + +### Stop tasks + +The only way to stop tasks before its finish, is `Closing the Tab` which runs tasks. + +### Extract attributes. + +e.g.: link text and target (use 'selector@attribute') + +```js +new Extractor().task('.list-item', ['a.title', 'a.title@href']).start(); +``` + +### Use task chain. + +e.g.: Collect links from `http://sample.com/abc` & Extract data of each link + +```js +new Extractor() + .task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"]) + .task('list-item', ["a.title", "p.content"]) + .start(); +``` + +### Save result of any task + +To a multiple task (chain) Extractor `e`: + +```js +e = new Extractor() +e.task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"]) + .task('list-item', ["a.title", "p.content"]) + .start(); +``` + +User will be asked to save the final result when it finishes. + +You may want to save another task's result, other than the final: + +```js +// save the result of first task +// that is, a list of urls +e.save(1) +``` + +Incase you want to save it again, use: + +```js +e.save() +``` + +### Restart tasks + +In cases some later task fails, you don't need to restart all task. + +Here we have 2 tasks: + +```js +e = new Extractor() +e.task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"]) + .task('list-item', ["a.title", "p.content"]) + .start(); +``` + +Suppose the second task fails, we can restart and continue from the task 2: + +```js +e.restart(2); +``` + +If you'd like restart all task, use: + +```js +e.start(); +// or +e.restart(); +``` \ No newline at end of file