Skip to content

Tool for web scraping using the go library Goquery by PuerkitoBio

Notifications You must be signed in to change notification settings

Leonelcode/GoCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

GoCrawler

GoCrawler is a golang tool for scraping data of websites using the golang library Goquery.

Features

  • Function to get all links related to a DOM element.
  • Function to obtain the metadata of a web.
  • Function to extract some element of the DOM.

Using Goquery

package main

import (
    // import standard libraries
    "fmt"
    "log"

    // import third party libraries
    "github.com/PuerkitoBio/goquery"
)

func postScrape() {
    doc, err := goquery.NewDocument("https://es.wikipedia.org/wiki/Vanuatu")
    if err != nil {
        log.Fatal(err)
    }

    // use CSS selector found with the browser inspector
    // for each, use index and item
    doc.Find(".mw-body-content").Each(func(index int, item *goquery.Selection) {
        title := item.Text()
        linkTag := item.Find("p")
        link, _ := linkTag.Attr("href")
        fmt.Printf("Post #%d: %s - %s\n", index, title, link)
    })
}

func main() {
    postScrape()
}

Credits

License

GNU

About

Tool for web scraping using the go library Goquery by PuerkitoBio

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages