update documents
This commit is contained in:
		
							
								
								
									
										138
									
								
								readme.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										138
									
								
								readme.md
									
									
									
									
									
										Normal file
									
								
							| @ -0,0 +1,138 @@ | ||||
| # DataExtracter Help | ||||
| ---------------------------- | ||||
|  | ||||
| DataExtracter helps you quickly extract data from any web pages. All you need to do is: | ||||
|  | ||||
| - Find out the selectors (JQuery selectors) for target data | ||||
| - Call Extractor methods in `extension backgroud page console`, as introduced bellow. | ||||
|  | ||||
| Where is the extension backgroud page console? | ||||
|  | ||||
| Goto <chrome://extensions/> and click `backgroud page` link of the extension | ||||
|  | ||||
|   | ||||
|  | ||||
| In the opening window, find `Console`, and type your scripts. | ||||
|  | ||||
|   | ||||
|  | ||||
| ## Qucik Start | ||||
|  | ||||
|  | ||||
|  | ||||
| Extract current page | ||||
| ```js | ||||
| new Extractor().task(".list-item", ["a.title", "p.content"]).start(); | ||||
| ``` | ||||
|  | ||||
| Extract multiple pages (1-10, interval 1) | ||||
|  | ||||
| ```js | ||||
| new Extractor().task(".list-item", ["a.title", "p.content"],"http://sample.com/?pn=${page}", 1, 10, 1).start(); | ||||
| ``` | ||||
|  | ||||
| Extract multiple urls (list) | ||||
|  | ||||
| ```js | ||||
| new Extractor().task(".list-item", ["a.title", "p.content"],["http://sample.com/abc","http://sample.com/xyz"]).start(); | ||||
| ``` | ||||
|  | ||||
| Extract specified pages (1,3,5) | ||||
|  | ||||
| ```js | ||||
| new Extractor().task(".list-item", ["a.title", "p.content"], "http://sample.com/?pn=${page}", [1, 3, 5]).start(); | ||||
| ``` | ||||
|  | ||||
| ## Extractor.task() Signitures: | ||||
|  | ||||
| ```ts | ||||
| // a task extracting data from current page | ||||
| task(itemsSelector:string, fieldSelectors:string[]) | ||||
| // a task extracting data from a range of pages | ||||
| task(itemsSelector:string, fieldSelectors:string[], urlTemplate:string, from:number, to:number, interval:number) | ||||
| // a task extracting data from a list of pages | ||||
| task(itemsSelector:string, fieldSelectors:string, urlTemplate:string, pages:number[]) | ||||
| // a task extracting data from a list of pages | ||||
| task(itemsSelector:string, fieldSelectors:string[], urls:string[]) | ||||
| // a task extracting data of urls which extracted from last task result | ||||
| task(itemsSelector:string, fieldSelectors:string[], urls:ExractResult) | ||||
| ``` | ||||
|  | ||||
| ## Advanced Usage: | ||||
|  | ||||
| ### Stop tasks | ||||
|  | ||||
| The only way to stop tasks before its finish, is `Closing the Tab` which runs tasks. | ||||
|  | ||||
| ### Extract attributes.  | ||||
|  | ||||
| e.g.: link text and target (use 'selector@attribute') | ||||
|  | ||||
| ```js | ||||
| new Extractor().task('.list-item', ['a.title', 'a.title@href']).start(); | ||||
| ``` | ||||
|  | ||||
| ### Use task chain.  | ||||
|  | ||||
| e.g.: Collect links from `http://sample.com/abc` & Extract data of each link | ||||
|  | ||||
| ```js | ||||
| new Extractor() | ||||
|     .task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"]) | ||||
|     .task('list-item', ["a.title", "p.content"]) | ||||
|     .start(); | ||||
| ``` | ||||
|  | ||||
| ### Save result of any task | ||||
|  | ||||
| To a multiple task (chain) Extractor `e`: | ||||
|  | ||||
| ```js | ||||
| e = new Extractor() | ||||
| e.task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"]) | ||||
|     .task('list-item', ["a.title", "p.content"]) | ||||
|     .start(); | ||||
| ``` | ||||
|  | ||||
| User will be asked to save  the final result when it finishes. | ||||
|  | ||||
| You may want to save another task's result, other than the final: | ||||
|  | ||||
| ```js | ||||
| // save the result of first task | ||||
| // that is, a list of urls | ||||
| e.save(1) | ||||
| ``` | ||||
|  | ||||
| Incase you want to save it again, use: | ||||
|  | ||||
| ```js | ||||
| e.save() | ||||
| ``` | ||||
|  | ||||
| ### Restart tasks | ||||
|  | ||||
| In cases some later task fails, you don't need to restart all task. | ||||
|  | ||||
| Here we have 2 tasks: | ||||
|  | ||||
| ```js | ||||
| e = new Extractor() | ||||
| e.task('.search-list-item', ['.item a@href'], ["http://sample.com/abc"]) | ||||
|     .task('list-item', ["a.title", "p.content"]) | ||||
|     .start(); | ||||
| ``` | ||||
|  | ||||
| Suppose the second task fails, we can restart and continue from the task 2: | ||||
|  | ||||
| ```js | ||||
| e.restart(2); | ||||
| ``` | ||||
|  | ||||
| If you'd like restart all task, use: | ||||
|  | ||||
| ```js | ||||
| e.start(); | ||||
| // or | ||||
| e.restart(); | ||||
| ``` | ||||
		Reference in New Issue
	
	Block a user