For most users, this button is the primary way in which new files are added to a fusker collection. The process of adding a URL to a fusker collection consists of the following steps:
How Image Surfer Pro processes data from a webpage depends upon what is currently displayed in the IE window. Image Surfer Pro treats the content displayed in the IE window as one of three types of webpage:
NOTE: The button can also be used to initiate a Directed Search when processing an ISP Form. However, in this section of the user's manual we will cover only how the data is collected from the currently displayed webpage and leave the directed search to the ISP Form section of the user's manual.
The process of extraction involves not only determining what data from the webpage might be relevant but the assignment of an Image Surfer Pro data type to the URLs. In most cases the assignment is relatively straight forward (any MP4 file is a video etc.), but depending upon the processing being done the assignment may become more complex.
Direct Image DisplayInternet Explorer will directly display image files as easily as it displays webpages. It can sometimes be difficult to determine if you are seeing a webpage or an image. Extraction in this case is rather obvious... the URL of the displayed image is treated as an ISP Image.
ISP FormsThe button will check the header information of the webpage to determine if it is an Image Surfer Pro generated form. Each selection box on an ISP Form is processed from the top of the page to the bottom. Each Add box which has been checked will add a URL to a list for assimilation.
In most cases the wording of the selection box makes it clear which data type the URL will be assigned. However, the data type assigned to all generic (i.e. non-image, non-video, non-frame) links is determined by the drop down list at the top of the table where the link is displayed.
The setting of the drop down box at the top of the table will be used for all of the generic Add Page To Collection selection boxes within the table but will not be used if the link is known to be a video or image. If you wish to add links as different types from within the same table, select Unknown. Selecting Unknown as the data type will then prompt you individually during assimilation for each of the selected links from the table.
Generic WebpagesIf the page displayed is neither an image nor an ISP Form, data extraction gets much more complex. Image Surfer Pro breaks apart the HTML code of the website and generates lists of possibly relevant data. The following lists of information are created from the HTML code.
Video Files |
File references with the following extensions are assigned the Video data type: .mp4 .flv .wmv .mov .mpg .mpeg .avi .asf In addition, the HTML5 <video> tags are specifically processed for any source identified as MP4. Any such identified video source is held as a video even if the URL file extension is not recognized. |
Frame Data |
File references with the following extensions are assigned the Frame data type: .swf In addition, the data source of <iframe>, <object>, and <embed> tags will be identified as a Frame source if the URL file extension is not recognized as either a video or shockwave flash frame. |
Image Files |
File references with the following extensions are assigned the Image data type: .jpg .jpe .jpeg .gif .bmp .tif .tiff .pic .pct .pict .pcx .pxr .png In addition, any URL found in the src attribute of an <img> tag will be held as an image even if the URL extension is not recognized. Reference to image files are kept in two assimilation lists:
|
Page Reference | The URL of the processed webpage is kept as a possible Page object for assimilation. |
Not all of the extracted content is always assimilated into the fusker collection and how the data is assimilated is again dependant upon the source of the data. The assimilation process can be summarized by the following steps:
All data received from the processing of a directly displayed image or an ISP Form will be assimilated because you expressly selected the data for assemilation. Images selected are not subjected to the {Min embedded image file KBytes for auto collection} configuration on the Images Tab.
When a general webpage is processed your User Preferences are used to determine what data will automatically be assimilated.
Videos | Automatically assimilated based on the {Automatically collect video information found} from the Videos Tab. | |
Frames | Automatically assimilated based on the {Automatically collect frame information found} from the Frames Tab. | |
Images |
Directly Referenced Images Images not visible on the webpage but directly referenced by hyper-links on the webpage are automatically assimilated based on the {Automatically collect direct image links} from the Images Tab. |
|
Embedded (visible) Images Images visible on the webpage will be assimilated if the {Automatically collect embedded image links} from the Images Tab is set to Always and they are larger than the {Min embedded image file KBytes for auto collection} configuration on the Images Tab. |
||
Pages | The webpage URL is never automatically assimilated. |
You may select as many of the different sets as you like when presented with the choice. Options where no
relevant data was extracted will be disabled.
If the {Automatically collect embedded image links} setting on User Preferences Images Tab was not set to Always it is possible that Embedded Image information was extracted but not automatically added. Even after embedded images are selected as a group for assimilation, each of the images is validated against the {Min embedded image file KBytes for auto collection} configuration on the Images Tab. The original webpage option will always be available even if no other option is. |
Each extracted URL that is chosen for assimilation is made into a single branch ISP Tree consisting of a blank collection segment, a domain access segment, potentially several directory segments, and a file segment of the type determined during extraction.
For example consider the file reference:
http://www.rexwallpapers.com/images/wallpapers/celebs/sarah/sarah_michelle_gellar_1.jpg
This image reference converted to an ISP Tree would have 7 segments under a collection segment.
The domain access segment (http:) represents the protocol used to access the reference. This segment is determined when the URL information is extracted by either the or the URL Capture Bar.
Below the domain access segment are 5 directory segments. The top most of these segments is called the Domain or Root Directory segment. These segments indicate where on the web this file is stored. In this case the file is stored at Rex Wallpapers under the directory images/wallpapers/celebs/sarah
The final segment is always a file segment (sarah_michelle_gellar_1.jpg). Extraction will have defined the URL as an image because the file extension of (.jpg) was recognized as the JPEG image format.
The extraction process identifies media files as simply Image, Video, or Frame, but when a tree is formed from the URL, the exact form of the final file segment will be more percise:
Extraction | Icon | Description |
---|---|---|
Image |
Image Any recognized image extension as well as URLs found as an <img> "src" value. |
|
Video |
MP4 Video Any URL with an MP4 extension or found as the MP4 identified source in a <video> tag. |
|
Flash Video Any URL with an FLV extension. |
||
Window Media Any URL with an extension which would be recognized by Windows Media Player. Specifically those with one of the following extensions: .wmv .mov .mpg .mpeg .avi .asf. |
||
Frame |
Shockwave Flash Any URL with an SWF extension. |
|
Raw Frame The data source of <iframe>, <object>, and <embed> tags which were not recognized as one of the other media types. |
||
Page |
Web Pages Any URL, typically reserved for use with Normal pages where HTML would be used such as Links from ISP Form tables. These links will only be added to your fusker collection after asking how you wish to treat an unknown URL type. |
If the assimilated URL was found by processing a directly displayed image, the file segment of the single branch ISP Tree would have Auto Ranging applied to it. URLs extracted from any other type of page will not be auto ranged.
Tree MergingOnce the single branch ISP Tree is formed it is merged with your existing fusker collection. The merger process can be a bit complicated but the results are usually intuitive. New directory path information will appear as a new branch in the fusker tree. List and numeric fusks are not automatically generated or expanded at the directory level.
The merger process will equalize the roll up level of the new fusker tree to match that of the branches it is compared to in the existing fusker tree. This can help propagate your structural preferences automatically as you build your fusker collection. In addition the user preference {Force split segments based on string size} will be used when the new tree branch is merged with an existing tree branch which has Split Directories.
At the file segment level the setting of the {Auto combine individual files into fusked files} configuration for each file type governs how file segments are merged. If it is enabled, the new file segment will combine information with the first file segment of the same type found within the same path to form an optimally fusked file segment. This includes any information added by the auto ranging done when the single branch ISP Tree was formed. The Optimization Process is applied after combining the information such that duplicate references are removed and the final segment may be either list or numerically fusked.
The user preference {Auto combine individual video files into fusked videos} applies to all extracted video types individually. For example, MP4 Videos will only merge with other MP4 Videos, Windows Media videos with other Windows Media videos, and Flash Videos with other Flash Videos. However there is only a single user preference associated with auto merging for videos. Similarly Shockwave Flash frames will only merge with other Shockwave Flash frames and Raw Frames with other Raw Frames, but there is only one preference associated with auto merging Frames, {Auto combine individual frames into fusked frames}. Images of all file types can be merged into a single Image segment.
As the single branch ISP Trees are merged into your fusker collection the selection is constantly moved to the last merged file segment. When the last URL from the last group has been assimilated into the fusker collection, a specific view of the last segment is generated. The resulting webpage looks similar to an Expanded view of the file segment except the entire path of the last URL is maintained and used to chose the correct iteration step at each fusked segment in the path. This should guarantee the content shown includes the last file assimilated.
The visualization may or may not include all of the information added. Information could be added which had different domain access protocols, different domains, different directory paths, or just different file types.
Processing Tab: Auto Range
The six inputs in the Auto Ranging Configuration block of the processing tab all play a significant
role in how images directly displayed in the browser display are added to the fusker collection.
They determine whether or not Auto Range Fusking is performed, how large the numerical range of files
will be, and where the range starts relative to the image file being processed.
Processing Tab: Structure Propagation
When incoming directory or file segments are compared to a split directory segment in the current fusker collection,
if the URL text of the split directory matches, the incoming segment will be split during the matching process. However,
if the text does not match, the user configuration {Force split segments based on string size} is used. The
incoming segment will remain unsplit if the configuration is not enabled and will split after the same number of characters
as the existing segment if it is enabled.
Images Tab: Image Collection
There are three controls for image collection. The first, {Automatically collect direct image links}
deals with images that are not visible on the webpage but which were directly referenced by hyper-links on
the processed page. The second two, {Automatcially collect embedded image lnks} and {Min embedded
image file Kbytes for auto collection} are applied to images visible on the processed page. They
determine whether or not the images are extracted, automatically collected, and the minimum file size required
for these images to be added to the fusker collection.
Images Tab: Auto Optimize: If the {Auto combine individual Images into fusked images} configuration is checked, any images being added to the same directory in the fusker collection will be grouped into a fusked file. The form of the fusked file will be optimized and may be either a list or numeric fusk.
Videos Tab: Video Collection
The {Automatically collect video information found} configuration determines whether or not video
file references found during the Directed Search are automatically added to the fusker collection. If selected
all references to MP4, Flash, and Windows Media video files will be added the the fusker collection automatically.
Videos Tab: Auto Optimize
If the {Auto combine individual Videos into fusked Videos} configuration is checked, any videos being
added to the same directory in the fusker collection will be grouped into a fusked file with other videos of
the same type (e.g. FLV files group together but MP4 files do not group with FLV files). The form of the fusked
file will be optimized and may be either a list or numeric
fusk.
Frames Tab: Frame Collection
The {Automatically collect frame information found} configurations determines whether or not Shockwave
Flash (SWF) file references and raw data found in <iframe>, <embed>, and <object> tags during
the directed search is automatically added to the fusker collection.
Frames Tab: Auto Optimize
If the {Auto combine individual Frames into fusked Frames} configuration is checked, any frame
information being added to the same directory in the fusker collection will be grouped into a fusked file with
other frames of the same type (either SWF or Raw). The form of the fusked file will be
optimized and may be either a list or numeric fusk.
Pages Tab: Auto Optimize
If the {Auto combine individual Pages into fusked Pages} configuration is checked, any page URL
being added to the same directory in the fusker collection will be grouped into a fusked file with other pages.
The form of the fusked file will be optimized and may be either
a list or numeric fusk.
Processing Image Galleries:
In the Free Version of Image Surfer Pro the
button will only process a directly displayed image file. It will not process an ISP Form or general webpage.
If the displayed page is not a direct image file, you will be given the option of adding the webpage to the
fusker collection as a page URL.
Use Limitation:
The Free Version of Image Surfer Pro will only allow you to use the Process Webpage button a limited number
of times per browsing session. The primary use of the free version of Image Surfer Pro is to visualize
existing fusker collection files and provide a limited feel for the ability to modify those fusker collections.
Building extensive fusker collections with the free version will be difficult.
Examples of using the button are separated in to several pages: