Results
Understanding and utilizing scrape results.
Scout is opinionated about how to keep track of your web crawls and scrape results. As a result, you don't have to worry about schema or keep track of deeply nested objects in your scripts.
At the end of a script call Scout outputs the results as JSON to stdout. This can then be parsed by any JSON parser of your choice.
Usage
Every time you call the scrape
keyword, Scout appends your payload to the URL it was retrieved from. This allows you to keep track of where results were scraped, as well as allowing you to logically group data into objects as you choose.
Example
Below is an example Scout script that goes to the ScoutLang GitHub page and scrapes the top level file structure:
goto "https://github.com/maxmindlin/scout-lang"
for row in $$"table tr td[colspan='1']" do
scrape {
text: row |> textContent(),
link: $(row)"a" |> href(),
}
end
And here is the result payload:
{
"results": {
"https://github.com/maxmindlin/scout-lang": [
{
"link": "https://github.com/maxmindlin/scout-lang/tree/main/.github/workflows",
"text": ".github/workflows"
},
{
"link": "https://github.com/maxmindlin/scout-lang/tree/main/assets",
"text": "assets"
},
{
"link": "https://github.com/maxmindlin/scout-lang/tree/main/examples",
"text": "examples"
},
{
"link": "https://github.com/maxmindlin/scout-lang/tree/main/scout-interpreter",
"text": "scout-interpreter"
},
{
"link": "https://github.com/maxmindlin/scout-lang/tree/main/scout-lexer",
"text": "scout-lexer"
},
{
"link": "https://github.com/maxmindlin/scout-lang/tree/main/scout-parser",
"text": "scout-parser"
},
{
"link": "https://github.com/maxmindlin/scout-lang/tree/main/src",
"text": "src"
},
{
"link": "https://github.com/maxmindlin/scout-lang/blob/main/.gitignore",
"text": ".gitignore"
},
{
"link": "https://github.com/maxmindlin/scout-lang/blob/main/Cargo.lock",
"text": "Cargo.lock"
},
{
"link": "https://github.com/maxmindlin/scout-lang/blob/main/Cargo.toml",
"text": "Cargo.toml"
},
{
"link": "https://github.com/maxmindlin/scout-lang/blob/main/LICENSE-APACHE",
"text": "LICENSE-APACHE"
},
{
"link": "https://github.com/maxmindlin/scout-lang/blob/main/LICENSE-MIT",
"text": "LICENSE-MIT"
},
{
"link": "https://github.com/maxmindlin/scout-lang/blob/main/README.md",
"text": "README.md"
}
]
}
}