Getting Help
When working with Python Discord's infrastructure, you may encounter issues or need assistance. This guide outlines the various ways to get help and who to contact for different types of problems.
Emergency Procedures
Critical Infrastructure Issues
For critical incidents affecting production services:
- Immediate Response: Post in the
#dev-oops
Discord channel with@here
mention - Assessment: Briefly describe the issue and impact
- Escalation: If no response within 15 minutes, contact DevOps team leads directly
After-Hours Emergencies
For emergencies outside normal hours:
- Contact the on-call DevOps team member (rotation schedule in
#dev-oops
pinned messages) - If unable to reach on-call person, escalate to DevOps team leads
- Document the incident for post-mortem analysis
Getting Help by Problem Type
Infrastructure Questions
For questions about services, configurations, or deployments:
- Primary: Ask in
#dev-oops
Discord channel - Secondary: Open a GitHub issue on the infra repository
- Resources: Check the Knowledge base first
Service-Specific Issues
For problems with specific services:
Service Type | Primary Contact | Resources |
---|---|---|
Kubernetes Cluster | #dev-oops channel |
Common Queries |
PostgreSQL Database | #dev-oops channel |
PostgreSQL Queries |
Email Services | #dev-oops channel |
Email Documentation |
LDAP/Authentication | #dev-oops channel |
LDAP Documentation |
DNS Issues | #dev-oops channel |
DNS configuration in infra repo |
Monitoring/Alerting | #dev-oops channel |
Loki Queries |
Access and Permissions
For access issues or permission requests:
- Check the Access Table for current permissions
- Request access in
#dev-oops
channel with: - What access you need
- Why you need it
- How long you need it (if temporary)
Self-Help Resources
Before asking for help, check these resources:
Documentation Sections
- Runbooks: Step-by-step guides for common tasks
- Common Queries: Pre-built queries for troubleshooting
- Post-mortems: Learn from past incidents
- Service Documentation: Detailed service information
Quick Troubleshooting
- Check service status: Look at monitoring dashboards
- Review recent changes: Check recent deployments or configuration changes
- Search past incidents: Look through post-mortems for similar issues
- Verify access: Ensure you have the necessary permissions
Troubleshooting Workflows
Service Down or Unresponsive
- Immediate: Check if it's a known incident (
#dev-oops
announcements) - Investigate:
- Check Kubernetes pod status
- Review application logs
- Verify dependencies (database, external services)
- Escalate: If unable to resolve in 30 minutes, ask for help
Performance Issues
- Gather data: Collect metrics and logs showing the performance problem
- Check resources: Review CPU, memory, and network usage
- Recent changes: Identify any recent deployments or configuration changes
- Document: Include specific metrics when asking for help
Configuration Problems
- Verify syntax: Check YAML/configuration file syntax
- Compare with working examples: Look at similar working configurations
- Test in staging: If available, test changes in non-production environment
- Rollback plan: Have a rollback strategy before making changes
:phone: Contact Information
Discord Channels
#dev-oops
: Primary channel for all infrastructure discussions#admin-chat
: For administrative and governance discussions
GitHub
- Infra Repository Issues: For bugs, feature requests, and documentation improvements
When to Use Each Channel
Type of Request | Discord #dev-oops |
GitHub Issue |
---|---|---|
Urgent issues | ||
Quick questions | ||
Bug reports | ||
Feature requests | ||
Documentation improvements | ||
Complex discussions |
When Asking for Help
Information to Include
Always provide:
- What you're trying to do: Clear description of your goal
- What you expected: What should have happened
- What actually happened: What went wrong (include error messages)
- Environment: Which service/system is affected
- Recent changes: Any recent modifications that might be related
For service issues, also include:
- Timestamps of when the issue started
- Affected services or users
- Current impact level
- Steps already taken to troubleshoot
Example Help Request
🚨 PostgreSQL Connection Issues
**Goal**: Deploy new bot update to production
**Expected**: Bot should connect to database normally
**Actual**: Getting connection timeout errors
**Environment**: Production bot deployment
**Started**: ~2:30 PM UTC
**Impact**: Bot is offline, affecting all Discord functionality
**Error**: `connection to server at "postgres.pydis.svc.cluster.local" (10.2.3.4), port 5432 failed: timeout expired`
**Already tried**:
- Checked pod logs (show connection attempts)
- Verified database pod is running
- No recent config changes
**Need help**: Investigating why connections are timing out
Learning Resources
For New Team Members
- Start with Onboarding documentation
- Review DevOps Rules
- Explore Available Tools
- Shadow experienced team members during incidents
Skill Development
- Kubernetes: Practice with local clusters, review runbooks
- PostgreSQL: Study common queries and maintenance procedures
- Monitoring: Learn Grafana, Prometheus, and Loki query languages
- Infrastructure as Code: Understand our Ansible and Kubernetes manifests
Frequently Asked Questions
"I'm new to the team, where do I start?"
- Read the Onboarding guide
- Get required access from the Access Table
- Join the
#dev-oops
Discord channel - Introduce yourself and ask for a team overview
"I made a mistake, what should I do?"
- Don't panic: Mistakes happen and are learning opportunities
- Assess impact: Determine if it's affecting production services
- Communicate: Post in
#dev-oops
immediately if there's any impact - Document: Help create a post-mortem to prevent future occurrences
"I'm not sure if this is urgent, should I ask for help?"
When in doubt, ask! It's better to ask for help unnecessarily than to let a small issue become a major incident.
Contributing to This Guide
Found something unclear or have suggestions for improvement?
- Open an issue on the infra repository
- Suggest changes via pull request
- Bring it up in
#dev-oops
for discussion
Remember: Good documentation helps everyone work more effectively!